The UK’s National Cyber Security Centre (NCSC) issued a caution on Monday regarding the inherent vulnerabilities of large language model (LLM) artificial intelligence tools, warning that malicious actors could exploit these weaknesses to hijack and potentially weaponize these models against users. This advisory comes three years after the launch of ChatGPT, a widely used LLM, which has been under scrutiny by security researchers for its functionality, privacy, and security.
Researchers quickly identified a significant flaw: LLMs, including ChatGPT, process all prompts as instructions, making them susceptible to manipulation through a tactic known as prompt injection. This method involves sending harmful requests disguised as legitimate instructions, allowing attackers to bypass internal safeguards meant to prevent dangerous actions.
In a blog post, David C, the NCSC’s technical director for platforms research, explained that the architecture of current LLMs inherently lacks a security distinction between trusted and untrusted content. “Current large language models (LLMs) simply do not enforce a security boundary between instructions and data inside a prompt,” he noted. The models concatenate their own instructions with untrusted content, treating the resulting prompt as if it were free from risk.
David C cautioned that prompt injection attacks could prove more challenging to mitigate than other known vulnerabilities, such as SQL injection, which impacts web applications mishandling data and commands. He emphasized that LLMs operate through pattern matching and prediction, lacking the ability to discern trustworthy information from malicious input. “Under the hood of an LLM, there’s no distinction made between ‘data’ or ‘instructions’; there is only ever ‘next token’,” he wrote. This means that prompt injection attacks may persist as a significant threat.
The NCSC’s assessment echoes sentiments from independent researchers and AI companies, which have warned that issues like prompt injections, jailbreaking, and hallucinations may never be fully resolved. As LLMs retrieve content from the internet or external sources, there remains a risk that they will interpret this data as direct instructions.
The implications of these vulnerabilities extend into the realm of software development. Major AI coding tools from companies like OpenAI and Anthropic have been integrated into automated workflows on platforms like GitHub, creating potential weaknesses. Maintainers or external contributors could embed malicious prompts within standard elements such as commit messages, which the LLMs would then accept as valid instructions. Even models that require human approval for significant tasks could be exploited with a single line of malicious code.
AI browser agents, designed to assist users in shopping and research, are similarly prone to vulnerabilities. Researchers have discovered ways to exploit ChatGPT’s browser authentication protocols to insert hidden instructions into the model’s memory, granting remote code execution privileges. Other innovations include web pages that deliver misleading content to AI crawlers, thus affecting the model’s internal evaluations.
While AI companies acknowledge these persistent weaknesses, they assert that solutions are in development. For instance, OpenAI recently published a paper claiming that hallucinations, which occur when a model confidently provides incorrect answers, are solvable issues. The research indicated that these inaccuracies arise because models are penalized for expressing uncertainty, leading them to prioritize confident, albeit incorrect, responses. OpenAI’s revised evaluation metrics aim to address this by balancing incentives to reduce hallucinations.
Companies like Anthropic have also reported relying on external detection tools and account monitoring to combat jailbreaking issues, a challenge affecting nearly all commercial and open-source models. As the field continues to evolve, AI developers are recognizing that the complexity and inherent weaknesses of LLMs may necessitate ongoing vigilance and innovation in cybersecurity measures.
See also
OpenAI Accelerates Release of GPT-5.2 Upgrade to Compete with Google’s Gemini 3
AWS Reveals Generative AI Boosts Retail Inventory Accuracy by 40% and ROI by 20%
Generative AI Bridges Gap in Ecological Neuroscience, Transforming Animal Behavior Research
CognitiveLab Launches NetraEmbed, Boosting Multimodal Document AI Performance by 150%
Google Set to Launch Nano Banana 2 Flash AI Image Tool with Free Access Soon



















































