As artificial intelligence (AI) continues to shape the landscape of customer support, businesses are generally optimistic about its impact. Costs have decreased, coverage has expanded, and chatbots now manage inquiries that previously resulted in lengthy waits for human agents. However, the experience for customers is more complex, raising questions about whether AI has genuinely improved service quality.
Many customers find themselves navigating through extensive, generated text, assessing the accuracy of the information presented, and often needing to rephrase their queries when the AI misinterprets their requests. This shift means that the burden of effort in customer service interactions has increased for users rather than decreased. Shan Lilja, Co-Founder of Mavenoid, recently highlighted this issue in a discussion with CX Today, stating, “Company effort has gone down, but customer effort has risen.”
This disconnect between organizational efficiency and customer experience identifies a critical challenge in AI-assisted support. Lilja suggests that implementing multimodal approaches—integrating various types of media—could bridge this gap. He mentions the “hidden tax” that customers unknowingly pay in terms of cognitive load. With the introduction of large language models, the nature of customer effort has evolved. Previously, customers faced frustration from poor routing or slow responses; now they deal with the mental strain of engaging with AI that projects confidence but may not always provide accurate information.
Lilja defines this cognitive burden as a “hidden tax on every AI interaction, paid by customers.” These instances of increased effort may not be immediately visible in performance metrics like containment rates or Customer Satisfaction (CSAT) scores, but they can lead to significant consequences such as abandoned sessions and diminished customer confidence. A particularly concerning element within this cognitive tax is what Lilja refers to as “AI slop,” or low-quality, generic content generated by AI. Such inaccuracies can severely hinder customer service operations, potentially leading to product damage or safety issues.
Recent incidents illustrate these risks. Earlier this year, Woolworths was compelled to modify its AI chatbot after it mistakenly claimed to have personal family experiences, referencing an “angry mother.” In another case, Air Canada faced compensation claims due to its chatbot providing incorrect refund information. Similarly, a customer persuaded DPD’s chatbot to create a derogatory poem about the company. These scenarios underscore the necessity for AI that is based on more than mere language.
Addressing Structural Issues
Lilja asserts that the fundamental problem with text-only AI is rooted in language’s inherent complexity. He compares language to “a tree of possibilities,” where each sentence can be interpreted in various ways. When an AI selects an incorrect interpretation, the support interaction can quickly derail, placing the onus on the customer to rectify the misunderstanding. In contrast, visual context—such as photographs, real-time visuals, or instructional videos—offers a more constrained framework for interpretation. As Lilja notes, “It’s harder to b******t a human with a false image than with false words.” By grounding AI interactions in visual elements, organizations can reduce the incidence of erroneous guidance.
This approach, which Lilja describes as “visual grounding,” represents one of six essential properties for effective multimodal support. It aims to enhance context, reduce ambiguity, ensure cross-modal consistency, maintain state awareness, provide real-time feedback, and integrate visual grounding. Together, these elements target the structural failures of text-only AI, moving beyond simple fixes to address the core challenges.
One particularly troublesome failure mode is delayed feedback, which can frustrate customers who follow inaccurate instructions for extended periods. For instance, a customer might diligently follow a chatbot’s directions for cleaning a washing machine’s drain filter, only to discover after several minutes that a previous step was incorrect. This delay can exacerbate frustration and erode trust in the AI system. In contrast, real-time visual feedback—such as video guides that provide instant corrections—can preemptively address such issues, allowing customers to rectify mistakes before they escalate.
Lilja’s comment that “a picture is worth a thousand words” succinctly encapsulates the advantages of multimodal support, especially when weighed against the risks of inaccurate textual instructions. Brands that recognize and implement these innovations could elevate their customer service from mere improvements in Net Promoter Scores (NPS) to creating dependable support mechanisms that instill confidence in users. As organizations continue to navigate the complexities of AI in customer service, the integration of multimodal strategies may pave the way for a more reliable and effective support experience.
See also
MIT Report Reveals Privacy-Led UX is Crucial for AI Growth Amid Regulatory Challenges
9 Proven Strategies to Stay Ahead in SEO and Content Marketing Trends
Japan E-Commerce Market Reaches $286.5B in 2025, Projected to Hit $701.8B by 2034
GoDaddy Enhances AI-Driven Discovery Tools to Boost Visibility for 80M+ Domains
Algorithmic Personalisation Threatens Premium Brand Exclusivity, Warns Experts





















































