Connect with us

Hi, what are you looking for?

AI Generative

Researchers Bypass AI Image Editing Filters with 80.9% Success Using In-Image Text

Researchers reveal 80.9% success rate in bypassing AI image editing filters using in-image text, exposing critical vulnerabilities in leading models like GPT-Image 1.5.

Researchers claim that leading image editing AIs can be jailbroken using rasterized text and visual cues, allowing prohibited edits to bypass safety filters with success rates as high as 80.9%. This alarming finding highlights vulnerabilities in current state-of-the-art image AI platforms, which implement a range of censorship measures to prevent the creation of banned content such as NSFW or defamatory imagery. The research comes from a team based in China and is part of a growing body of work that scrutinizes the robustness of AI moderation frameworks.

Known as “alignment,” this process involves scanning both incoming and outgoing data for violations of usage rules. For instance, while an innocuous image upload may pass initial checks, requests to generate unsafe content—such as transforming the image into one depicting a person undressing—can trigger intervention from filtering systems. Users have reportedly found ways to circumvent these safeguards, crafting prompts that do not explicitly trigger filters but still lead to unsafe content generation.

The study reveals that current multimodal systems, like CLIP, interpret images back into the text realm, which results in visual prompts not being subject to the same moderation processes as direct text requests. Embedded instructions within images, achieved through techniques such as typographic overlays, have exposed a significant weakness in the security models of Vision Language Models (VLMs).

A newly published paper titled When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models formalizes a technique that has circulated in online forums. It illustrates how in-image text can be utilized to bypass alignment filters, citing examples of banned commands enacted through rasterized text, often in contexts designed to distract from the illicit content being generated. The authors curated a benchmark, named IESBench, specifically tailored for testing image editing models, achieving attack success rates (ASR) of 80.9% against systems like Nano Banana Pro and GPT-Image 1.5.

The paper emphasizes that contemporary image editing systems, such as Qwen-Image-Edit and LongCat-Image-Edit, utilize VLMs, which are designed to encode both text and images within a single model. This approach has implications for how visually embedded cues can bypass typical text-based checks. Interestingly, the authors report that safety measures often fail to identify harmful content when it is embedded in a visual format rather than a textual one.

The study categorizes the risks associated with image editing into three levels of severity: Level-1 covers individual rights violations; Level-2 addresses group-targeted harm, and Level-3 encompasses societal risks including political disinformation and fabricated imagery. The authors also note that effectiveness varies significantly among models, with open-source versions exhibiting higher vulnerability compared to commercial counterparts.

In testing various models, the authors found that attack success rates varied widely. Open-source models achieved a staggering 100% ASR due to the absence of robust safety features, while commercial models such as GPT-Image 1.5 and Nano Banana Pro displayed marginally better defenses. For instance, GPT-Image 1.5 was notably vulnerable to copyright tampering, achieving a 95.7% ASR in that category.

Moreover, the study introduced a modified version of Qwen-Image-Edit, dubbed Qwen-Image-Edit-Safe, which lowered the attack success rate by 33%. This adaptation demonstrated the potential for existing systems to enhance their defenses without extensive retraining. However, reliance on pre-aligned models limited its effectiveness against more complex attacks.

While the researchers acknowledge the challenges in achieving reliable safety protocols, they emphasize the necessity of refining existing models. The implications of these findings are significant for developers and regulators alike, as they navigate the precarious balance between innovation and ethical responsibility in AI technologies.

The research encapsulates a pressing concern in the AI landscape, where moderation systems must continually evolve to stay ahead of emerging vulnerabilities. As the field advances, the need for robust defenses against such “jailbreak” tactics will become increasingly paramount.

Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Technology

India's five-day AI summit in New Delhi aims to showcase the nation's strategy for AI innovation, attracting global tech leaders to drive economic growth...

AI Research

MIT's J-PAL secures funding for Project AI Evidence, launching eight studies to evaluate AI's effectiveness in combating poverty with backing from Google.org and others.

Top Stories

Nvidia's stock at $186.81 and Micron's at $413.48 present compelling buys as AI demand surges amid tight supply, promising robust growth ahead.

AI Cybersecurity

Google's Threat Intelligence Group reveals cybercriminals are exploiting its Gemini AI models for real-time malware development, complicating detection and raising security alarms.

AI Technology

DOE launches 26 AI challenges to cut nuclear deployment timelines by 50% and reduce operational costs by over 50% in a revolutionary energy initiative.

AI Tools

Spotify explores AI remix capabilities amid artist revenue potential, urging industry partners to establish licensing frameworks after removing 75 million spam tracks.

Top Stories

Quantum AI integration promises to cut AI deployment costs by up to 50% while enhancing energy efficiency, positioning leaders to secure competitive advantages.

Top Stories

Microsoft AI CEO Mustafa Suleyman warns that white-collar jobs, including lawyers and accountants, could be fully automated within 12 to 18 months.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.