A recent study by Which? evaluated the performance of six popular AI tools in addressing everyday consumer queries across various domains, including personal finance, legal issues, health, consumer rights, and travel. The researchers posed 40 questions to each tool and assessed their responses based on accuracy, clarity, usefulness, relevance, and ethical responsibility, ultimately scoring them out of 100.
According to the findings, Perplexity emerged as the leading tool with a score of 71%, followed closely by Gemini’s AIO at 70% and the standalone Gemini tool at 69%. Copilot scored 68%, while ChatGPT and Meta AI scored 64% and 55%, respectively. Notably, despite being the most widely used tool, ChatGPT ranked second from the bottom.
Gaps in AI Responses
The controlled tests revealed significant shortcomings in how these AI tools managed detailed regulations. For instance, when asked about the ISA limits, both ChatGPT and Copilot confidently provided incorrect information, neglecting to mention the correct allowance of £20,000. This oversight could lead users to inadvertently breach HMRC regulations.
Travel-related inquiries also highlighted flaws. Copilot incorrectly informed testers that passengers are entitled to a full refund for canceled flights, a claim that lacks nuance. Additionally, Meta provided inaccurate details regarding compensation for flight delays, failing to explain the full rules that apply to extraordinary circumstances.
The survey further disclosed that 51% of UK adults, more than 25 million people, utilize AI for information searching. Remarkably, nearly half of these users expressed a trusting attitude towards the information provided, with the confidence level rising to 65% among frequent users. One in six individuals rely on AI for financial guidance, while one in eight consult it for legal matters and one in five for health-related issues. A third of respondents believe that the answers generated by these tools stem from reputable sources.
Risks Identified in AI Guidance
The evaluation raised concerns regarding the level of warning provided in sensitive areas like legal and financial advice. For example, when testers inquired about rights related to poor broadband speeds, both ChatGPT and Gemini AIO failed to clarify that only providers adhering to Ofcom’s voluntary guaranteed speed code allow customers to exit contracts without penalties. This misunderstanding was compounded when Gemini suggested that consumers with building disputes hold back payment from builders, a recommendation that could entangle users in further legal complications.
Financial advice also presented various risks. In response to queries about tax refunds, both ChatGPT and Perplexity provided links to premium tax refund services alongside government options, which can lead to unnecessary fees and potential fraud. Furthermore, ChatGPT incorrectly stated that travel insurance is mandatory for UK residents visiting Schengen states, which is not true.
Levent Ergin, Chief Strategist for Climate, Sustainability, and Artificial Intelligence at Informatica, remarked, “AI chatbots are only ever as good as the data and context behind them. Public models are impressive, but they’re trained on what’s broadly available, not the deeply contextual, well-governed information you need for reliable financial guidance.” He stressed the importance of ensuring that these tools draw from trusted data sources to potentially deliver accurate and personalized advice.
As more consumers turn to AI for financial recommendations, the necessity for AI tools to evolve into reliable sources of information becomes paramount. The integration of governed data from banks, brokers, and insurers could pave the way for genuinely personalized advice that reflects users’ specific circumstances.
In summary, while AI tools are becoming increasingly integral to daily life, the Which? study underscores the critical need for improvements in their accuracy and reliability. As AI continues to shape how consumers access information, ensuring its ethical and responsible application will be essential for building user trust and safeguarding against potential misguidance.
Principal Financial Upskills 20,000 Employees with New AI Literacy Program
AI to Transform B2B Finance by 2030, Boosting Efficiency and Redefining Roles
Intuit Partners with OpenAI to Deliver Personalized Financial Advice through ChatGPT
AI Adoption Reaches 70% in Businesses, Projected Market Growth to $72.8B by 2030
Maxima Secures $41M to Transform Accounting with Agentic AI Platform
























































