Upwind has released findings from its recent research presented at the RSA Conference, revealing that malicious Large Language Model (LLM) prompts can be detected with approximately 95% precision. The study highlighted the effectiveness of Nvidia technology, achieving sub-millisecond inference suited for real-time traffic, a critical consideration as enterprises increasingly integrate generative AI into their operations.
As generative AI utilization surges—Gartner forecasts that over 80% of companies will employ generative AI APIs or applications in production this year—the landscape of application security is evolving. Upwind emphasizes that the interface itself, particularly natural language, is becoming the new attack surface. Unlike traditional security threats that exploit code vulnerabilities or malformed packets, LLM threats are embedded in the language, allowing malicious actors to manipulate meaning and intent.
With the adoption of these models in enterprise workflows, new categories of threats have emerged, including prompt injection, jailbreaks, data exfiltration, and social engineering. According to Upwind, existing security measures are ill-equipped to address these novel challenges, necessitating a rethinking of security models to account for the unique nature of LLM threats.
Mose Hassan, VP of Research & Innovation at Upwind, remarked, “LLMs don’t just process input, they interpret intent. That changes the security model entirely. Organizations aren’t just trying to block bad code anymore; they have to stop attempts that twist language and manipulate systems.” He added that their collaboration with Nvidia demonstrates the feasibility of implementing effective security measures in live production environments without incurring significant delays or costs.
To tackle these challenges, Upwind has developed a three-stage architecture designed specifically for production environments, addressing concerns such as latency, cost, false-positive tolerance, and explainability. The system operates in three distinct stages, starting with LLM traffic identification. This initial phase utilizes a lightweight classifier to filter traffic, determining whether a request is LLM-bound. Notably, this stage operates in under a millisecond with an accuracy of 99.88%, ensuring semantic analysis is only applied when necessary.
In the second stage, the focus shifts to semantic threat detection. Once a request is identified as heading to an LLM, the challenge becomes assessing its potential maliciousness. The team employed Nvidia’s nv-embedcode-7b-v1 model, which proved effective at differentiating between normal and malicious prompts, such as indirect jailbreaks and prompt injections. This stage achieved a detection accuracy of 94.53% while maintaining inference times well under 0.1 milliseconds, demonstrating that robust AI security can operate efficiently at scale.
The final stage involves selective LLM validation, where only high-risk or ambiguous cases are escalated to the Nvidia Nemotron-3-Nano-30B model, integrated with Nvidia NeMo Guardrails. This acts as a reasoning layer to validate findings, reduce false positives, and provide explanations aligned with security frameworks. By selectively escalating requests, the system enhances throughput while increasing decision confidence.
Moreover, Upwind emphasizes that detection alone is insufficient in contemporary cloud environments, where flagged prompts represent only a fragment of a larger security landscape. By embedding LLM threat detection directly into Upwind’s runtime and cloud visibility platform, malicious prompts are surfaced not merely as isolated model outputs but as actionable security events within a broader cloud ecosystem.
As the adoption of AI accelerates, the emergence of language-based threats is increasingly becoming a tangible operational challenge. The findings from Upwind’s collaboration with Nvidia illustrate that organizations need not compromise innovation for security. This research opens the door for more resilient security measures, enabling enterprises to navigate the complexities of modern AI integration successfully.
See also
Anthropic’s Claims of AI-Driven Cyberattacks Raise Industry Skepticism
Anthropic Reports AI-Driven Cyberattack Linked to Chinese Espionage
Quantum Computing Threatens Current Cryptography, Experts Seek Solutions
Anthropic’s Claude AI exploited in significant cyber-espionage operation
AI Poisoning Attacks Surge 40%: Businesses Face Growing Cybersecurity Risks


















































