A recent experiment involving generative AI systems has raised questions about the reliability of AI-generated news content. A journalism professor specializing in computer science reported that AI tools, including Google’s Gemini, produced numerous inaccuracies and even fabricated sources while attempting to summarize current events in Québec. This analysis, which spanned a month, revealed that 18% of the AI-generated news items relied on non-existent sources, such as a fictional outlet named fake-example.ca.
In a bid to explore how well these tools could convey important news, the professor queried seven different generative AI systems daily, seeking the five most significant news events in Québec. The tools included both paid options, like ChatGPT and Claude, and free versions such as DeepSeek and Grok. Each response required a summary in three sentences, a title, and a source link, with the expectation that the AI would draw primarily from credible news sources.
However, the results were alarming. Although most responses cited news outlets, many included URLs that led to 404 errors or merely pointed to the homepage of the cited media. Only 37% of the responses provided complete and legitimate URLs, making it challenging to verify the accuracy of the information presented. Overall, the summaries were accurate only 47% of the time, with some instances of outright plagiarism.
The inaccuracies noted in the AI-generated content were particularly concerning. For instance, Grok, an AI tool from Elon Musk’s social network X, reported a fabricated narrative about asylum seekers being mistreated in Chibougamau. This mischaracterization was based on a legitimate article from La Presse that described a successful relocation of asylum seekers, most of whom received job offers. Such significant misinterpretations exemplified the potential dangers of relying on AI for news.
Other notable inaccuracies included claims about the circumstances of a toddler found alive after a four-day search, which Grok incorrectly attributed to the mother abandoning her child for a vacation. Moreover, Aria mistakenly reported that French cyclist Julian Alaphilippe had won a race in Montréal, when in fact, he won a different race in Québec City. These errors illustrate a broader trend where AI tools generate content that lacks foundational accuracy.
Further compounding the issue were grammatical errors in the French language responses, which the professor speculated might have been reduced if the queries had been posed in English. Of the verified responses, approximately 45% were classified as only partially accurate, often due to misinterpretations that could not be deemed wholly unreliable.
Generative conclusions made by the AI systems also raised eyebrows. In several instances, the tools made unsupported claims or introduced non-existent debates related to reported stories. For example, ChatGPT concluded that an accident near Québec City “has reignited the debate on road safety in rural areas,” although no such discussion was present in the reference article. This tendency to fabricate context or conclusions presents a significant risk of misinformation.
The findings echoed a subsequent report by 22 public service media organizations, which noted that nearly half of all AI responses contained significant issues and a third showed serious sourcing problems. As the use of generative AI tools in news reporting grows, experts urge caution. The expectation for accuracy and reliability remains paramount, yet the current capabilities of these AI systems fall short of delivering factual information consistently.
As the landscape of news consumption evolves, the implications of these findings are profound. Stakeholders in journalism and technology must confront the challenges posed by generative AI tools to ensure that the integrity of information is maintained, safeguarding the public’s access to reliable news sources.
See also
Sam Altman Praises ChatGPT for Improved Em Dash Handling
AI Country Song Fails to Top Billboard Chart Amid Viral Buzz
GPT-5.1 and Claude 4.5 Sonnet Personality Showdown: A Comprehensive Test
Rethink Your Presentations with OnlyOffice: A Free PowerPoint Alternative
OpenAI Enhances ChatGPT with Em-Dash Personalization Feature





















































