AI Generative

Researchers Bypass AI Image Editing Filters with 80.9% Success Using In-Image Text

Researchers reveal 80.9% success rate in bypassing AI image editing filters using in-image text, exposing critical vulnerabilities in leading models like GPT-Image 1.5.

Staff

Published

3 hours ago

Researchers claim that leading image editing AIs can be jailbroken using rasterized text and visual cues, allowing prohibited edits to bypass safety filters with success rates as high as 80.9%. This alarming finding highlights vulnerabilities in current state-of-the-art image AI platforms, which implement a range of censorship measures to prevent the creation of banned content such as NSFW or defamatory imagery. The research comes from a team based in China and is part of a growing body of work that scrutinizes the robustness of AI moderation frameworks.

Known as “alignment,” this process involves scanning both incoming and outgoing data for violations of usage rules. For instance, while an innocuous image upload may pass initial checks, requests to generate unsafe content—such as transforming the image into one depicting a person undressing—can trigger intervention from filtering systems. Users have reportedly found ways to circumvent these safeguards, crafting prompts that do not explicitly trigger filters but still lead to unsafe content generation.

The study reveals that current multimodal systems, like CLIP, interpret images back into the text realm, which results in visual prompts not being subject to the same moderation processes as direct text requests. Embedded instructions within images, achieved through techniques such as typographic overlays, have exposed a significant weakness in the security models of Vision Language Models (VLMs).

A newly published paper titled When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models formalizes a technique that has circulated in online forums. It illustrates how in-image text can be utilized to bypass alignment filters, citing examples of banned commands enacted through rasterized text, often in contexts designed to distract from the illicit content being generated. The authors curated a benchmark, named IESBench, specifically tailored for testing image editing models, achieving attack success rates (ASR) of 80.9% against systems like Nano Banana Pro and GPT-Image 1.5.

The paper emphasizes that contemporary image editing systems, such as Qwen-Image-Edit and LongCat-Image-Edit, utilize VLMs, which are designed to encode both text and images within a single model. This approach has implications for how visually embedded cues can bypass typical text-based checks. Interestingly, the authors report that safety measures often fail to identify harmful content when it is embedded in a visual format rather than a textual one.

The study categorizes the risks associated with image editing into three levels of severity: Level-1 covers individual rights violations; Level-2 addresses group-targeted harm, and Level-3 encompasses societal risks including political disinformation and fabricated imagery. The authors also note that effectiveness varies significantly among models, with open-source versions exhibiting higher vulnerability compared to commercial counterparts.

In testing various models, the authors found that attack success rates varied widely. Open-source models achieved a staggering 100% ASR due to the absence of robust safety features, while commercial models such as GPT-Image 1.5 and Nano Banana Pro displayed marginally better defenses. For instance, GPT-Image 1.5 was notably vulnerable to copyright tampering, achieving a 95.7% ASR in that category.

Moreover, the study introduced a modified version of Qwen-Image-Edit, dubbed Qwen-Image-Edit-Safe, which lowered the attack success rate by 33%. This adaptation demonstrated the potential for existing systems to enhance their defenses without extensive retraining. However, reliance on pre-aligned models limited its effectiveness against more complex attacks.

While the researchers acknowledge the challenges in achieving reliable safety protocols, they emphasize the necessity of refining existing models. The implications of these findings are significant for developers and regulators alike, as they navigate the precarious balance between innovation and ethical responsibility in AI technologies.

The research encapsulates a pressing concern in the AI landscape, where moderation systems must continually evolve to stay ahead of emerging vulnerabilities. As the field advances, the need for robust defenses against such “jailbreak” tactics will become increasingly paramount.

In this article:AI, AI moderation frameworks, China, NSFW, Researchers

Written By Staff

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

AI Technology

India Hosts AI Summit, Showcasing National Strategy and Major Tech Leaders

India's five-day AI summit in New Delhi aims to showcase the nation's strategy for AI innovation, attracting global tech leaders to drive economic growth...

Staff45 minutes ago

AI Research

MIT’s J-PAL Launches Project AI Evidence to Evaluate AI Solutions Against Poverty

MIT's J-PAL secures funding for Project AI Evidence, launching eight studies to evaluate AI's effectiveness in combating poverty with backing from Google.org and others.

Staff1 hour ago

Seize the Moment: Buy Nvidia and Micron Now as AI Demand Soars Amid Supply Constraints

Nvidia's stock at $186.81 and Micron's at $413.48 present compelling buys as AI demand surges amid tight supply, promising robust growth ahead.

Staff3 hours ago

AI Cybersecurity

Google Warns Cybercriminals Integrating AI into Live Attacks, Explores Gemini Exploitation

Google's Threat Intelligence Group reveals cybercriminals are exploiting its Gemini AI models for real-time malware development, complicating detection and raising security alarms.

Rachel Torres3 hours ago

AI Technology

DOE Unveils 26 AI Challenges to Transform Nuclear Deployment and Energy Systems

DOE launches 26 AI challenges to cut nuclear deployment timelines by 50% and reduce operational costs by over 50% in a revolutionary energy initiative.

Staff7 hours ago

AIPRESSA.COM

AI Generative

Researchers Bypass AI Image Editing Filters with 80.9% Success Using In-Image Text

Trending

AI Cybersecurity

Endpoint Security Market to Reach $23.9B by 2030 with 7.2% CAGR Amid Rising Cyber Threats

Top Stories

Albania Appoints AI Bot Minister Diella Amid Corruption Concerns and EU Membership Goals

AI Government

BigBear.ai Launches Biometric Platform at O’Hare, Acquires Generative AI Ask Sage for $250M

AI Business

Enterprise Architecture Shifts to Strategic Enabler in AI-Driven Business Models

Top Stories

Africa–Middle East AI Collaboration: Building a $1 Trillion Tech Corridor by 2026

You May Also Like

AI Technology

India Hosts AI Summit, Showcasing National Strategy and Major Tech Leaders

AI Research

MIT’s J-PAL Launches Project AI Evidence to Evaluate AI Solutions Against Poverty

Top Stories

Seize the Moment: Buy Nvidia and Micron Now as AI Demand Soars Amid Supply Constraints

AI Cybersecurity

Google Warns Cybercriminals Integrating AI into Live Attacks, Explores Gemini Exploitation

AI Technology

DOE Unveils 26 AI Challenges to Transform Nuclear Deployment and Energy Systems

AI Tools

Spotify Seeks Industry Licensing Deal to Enable AI Remixes Amid Artist Revenue Opportunities

Top Stories

Quantum-AI Integration: Key Strategies for Business Leaders to Navigate Costs and Cybersecurity Risks

Top Stories

Microsoft AI CEO Mustafa Suleyman Predicts White-Collar Jobs Fully Automated in 12 Months