AI Generative

Researchers Bypass AI Image Editing Filters with 80.9% Success Using In-Image Text

Researchers reveal 80.9% success rate in bypassing AI image editing filters using in-image text, exposing critical vulnerabilities in leading models like GPT-Image 1.5.

Staff

Published

12 February, 2026

Researchers claim that leading image editing AIs can be jailbroken using rasterized text and visual cues, allowing prohibited edits to bypass safety filters with success rates as high as 80.9%. This alarming finding highlights vulnerabilities in current state-of-the-art image AI platforms, which implement a range of censorship measures to prevent the creation of banned content such as NSFW or defamatory imagery. The research comes from a team based in China and is part of a growing body of work that scrutinizes the robustness of AI moderation frameworks.

Known as “alignment,” this process involves scanning both incoming and outgoing data for violations of usage rules. For instance, while an innocuous image upload may pass initial checks, requests to generate unsafe content—such as transforming the image into one depicting a person undressing—can trigger intervention from filtering systems. Users have reportedly found ways to circumvent these safeguards, crafting prompts that do not explicitly trigger filters but still lead to unsafe content generation.

The study reveals that current multimodal systems, like CLIP, interpret images back into the text realm, which results in visual prompts not being subject to the same moderation processes as direct text requests. Embedded instructions within images, achieved through techniques such as typographic overlays, have exposed a significant weakness in the security models of Vision Language Models (VLMs).

A newly published paper titled When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models formalizes a technique that has circulated in online forums. It illustrates how in-image text can be utilized to bypass alignment filters, citing examples of banned commands enacted through rasterized text, often in contexts designed to distract from the illicit content being generated. The authors curated a benchmark, named IESBench, specifically tailored for testing image editing models, achieving attack success rates (ASR) of 80.9% against systems like Nano Banana Pro and GPT-Image 1.5.

The paper emphasizes that contemporary image editing systems, such as Qwen-Image-Edit and LongCat-Image-Edit, utilize VLMs, which are designed to encode both text and images within a single model. This approach has implications for how visually embedded cues can bypass typical text-based checks. Interestingly, the authors report that safety measures often fail to identify harmful content when it is embedded in a visual format rather than a textual one.

The study categorizes the risks associated with image editing into three levels of severity: Level-1 covers individual rights violations; Level-2 addresses group-targeted harm, and Level-3 encompasses societal risks including political disinformation and fabricated imagery. The authors also note that effectiveness varies significantly among models, with open-source versions exhibiting higher vulnerability compared to commercial counterparts.

In testing various models, the authors found that attack success rates varied widely. Open-source models achieved a staggering 100% ASR due to the absence of robust safety features, while commercial models such as GPT-Image 1.5 and Nano Banana Pro displayed marginally better defenses. For instance, GPT-Image 1.5 was notably vulnerable to copyright tampering, achieving a 95.7% ASR in that category.

Moreover, the study introduced a modified version of Qwen-Image-Edit, dubbed Qwen-Image-Edit-Safe, which lowered the attack success rate by 33%. This adaptation demonstrated the potential for existing systems to enhance their defenses without extensive retraining. However, reliance on pre-aligned models limited its effectiveness against more complex attacks.

While the researchers acknowledge the challenges in achieving reliable safety protocols, they emphasize the necessity of refining existing models. The implications of these findings are significant for developers and regulators alike, as they navigate the precarious balance between innovation and ethical responsibility in AI technologies.

The research encapsulates a pressing concern in the AI landscape, where moderation systems must continually evolve to stay ahead of emerging vulnerabilities. As the field advances, the need for robust defenses against such “jailbreak” tactics will become increasingly paramount.

AI Regulation

California’s Newsom Orders Review of AI Supply-Chain Risks in State Contracts

California Governor Gavin Newsom orders a review of AI supply-chain risk designations, impacting San Francisco's Anthropic amidst military contract disputes.

Staff2 hours ago

AI Government

Microsoft Announces $10B Investment in Japan’s AI and Cybersecurity Sectors by 2029

Microsoft commits $10 billion to Japan's AI and cybersecurity sectors by 2029, aiming to train one million engineers and enhance data security and infrastructure.

Staff4 hours ago

AI Technology

Brands Risk Reputation with Common AI Training Pitfalls, Warns Harvard Study

Harvard study reveals that 94% of professionals see AI as crucial for cybersecurity, yet many firms risk reputational damage by neglecting strategic training.

Staff4 hours ago

Microsoft Shifts Focus, Aiming for State-of-the-Art AI Models by 2027 After OpenAI Deal

Microsoft shifts to independent AI development, targeting state-of-the-art models by 2027, fueled by Nvidia chips and a new strategic focus.

Staff6 hours ago

AI Finance

AI Banking Keynote Highlights: Scott Steinberg on Personalization, Automation, and Ethics

AI banking experts highlight JPMorgan Chase and Bank of America's automation success, driving operational efficiency and customer loyalty amid rising cyber threats.

Marcus Chen6 hours ago

AI Education

Vietnamese Universities Integrate AI in Curricula to Meet Labor Market Demands

Vietnamese universities are restructuring curricula to integrate AI as a core competency, addressing the 40% job impact from AI by 2030 and enhancing student...

David Park6 hours ago

DeepSeek AI Predicts Nvidia’s Stock Will Surge to $265 by End of 2026

DeepSeek forecasts Nvidia's stock will surge 50% to $265 by 2026, driven by new technology and strong institutional confidence amid market challenges.

Staff8 hours ago

AI Generative

Google Launches Gemma 4: Advanced Open-Source AI Models for Local Deployment and Multimodal Reasoning

Google launches Gemma 4, an open-source AI suite with 26B and 31B models for local deployment, enhancing privacy and multimodal reasoning capabilities.

Staff8 hours ago

AIPRESSA.COM

AI Generative

Researchers Bypass AI Image Editing Filters with 80.9% Success Using In-Image Text

Trending

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

AI Technology

AI Hardware Market Grows 30% in 2025, Driven by Generative AI and Edge Computing Demand

You May Also Like

AI Regulation

California’s Newsom Orders Review of AI Supply-Chain Risks in State Contracts

AI Government

Microsoft Announces $10B Investment in Japan’s AI and Cybersecurity Sectors by 2029

AI Technology

Brands Risk Reputation with Common AI Training Pitfalls, Warns Harvard Study

Top Stories

Microsoft Shifts Focus, Aiming for State-of-the-Art AI Models by 2027 After OpenAI Deal

AI Finance

AI Banking Keynote Highlights: Scott Steinberg on Personalization, Automation, and Ethics

AI Education

Vietnamese Universities Integrate AI in Curricula to Meet Labor Market Demands

Top Stories

DeepSeek AI Predicts Nvidia’s Stock Will Surge to $265 by End of 2026

AI Generative

Google Launches Gemma 4: Advanced Open-Source AI Models for Local Deployment and Multimodal Reasoning