Connect with us

Hi, what are you looking for?

AI Generative

Researchers Bypass AI Image Editing Filters with 80.9% Success Using In-Image Text

Researchers reveal 80.9% success rate in bypassing AI image editing filters using in-image text, exposing critical vulnerabilities in leading models like GPT-Image 1.5.

Researchers claim that leading image editing AIs can be jailbroken using rasterized text and visual cues, allowing prohibited edits to bypass safety filters with success rates as high as 80.9%. This alarming finding highlights vulnerabilities in current state-of-the-art image AI platforms, which implement a range of censorship measures to prevent the creation of banned content such as NSFW or defamatory imagery. The research comes from a team based in China and is part of a growing body of work that scrutinizes the robustness of AI moderation frameworks.

Known as “alignment,” this process involves scanning both incoming and outgoing data for violations of usage rules. For instance, while an innocuous image upload may pass initial checks, requests to generate unsafe content—such as transforming the image into one depicting a person undressing—can trigger intervention from filtering systems. Users have reportedly found ways to circumvent these safeguards, crafting prompts that do not explicitly trigger filters but still lead to unsafe content generation.

The study reveals that current multimodal systems, like CLIP, interpret images back into the text realm, which results in visual prompts not being subject to the same moderation processes as direct text requests. Embedded instructions within images, achieved through techniques such as typographic overlays, have exposed a significant weakness in the security models of Vision Language Models (VLMs).

A newly published paper titled When the Prompt Becomes Visual: Vision-Centric Jailbreak Attacks for Large Image Editing Models formalizes a technique that has circulated in online forums. It illustrates how in-image text can be utilized to bypass alignment filters, citing examples of banned commands enacted through rasterized text, often in contexts designed to distract from the illicit content being generated. The authors curated a benchmark, named IESBench, specifically tailored for testing image editing models, achieving attack success rates (ASR) of 80.9% against systems like Nano Banana Pro and GPT-Image 1.5.

The paper emphasizes that contemporary image editing systems, such as Qwen-Image-Edit and LongCat-Image-Edit, utilize VLMs, which are designed to encode both text and images within a single model. This approach has implications for how visually embedded cues can bypass typical text-based checks. Interestingly, the authors report that safety measures often fail to identify harmful content when it is embedded in a visual format rather than a textual one.

The study categorizes the risks associated with image editing into three levels of severity: Level-1 covers individual rights violations; Level-2 addresses group-targeted harm, and Level-3 encompasses societal risks including political disinformation and fabricated imagery. The authors also note that effectiveness varies significantly among models, with open-source versions exhibiting higher vulnerability compared to commercial counterparts.

In testing various models, the authors found that attack success rates varied widely. Open-source models achieved a staggering 100% ASR due to the absence of robust safety features, while commercial models such as GPT-Image 1.5 and Nano Banana Pro displayed marginally better defenses. For instance, GPT-Image 1.5 was notably vulnerable to copyright tampering, achieving a 95.7% ASR in that category.

Moreover, the study introduced a modified version of Qwen-Image-Edit, dubbed Qwen-Image-Edit-Safe, which lowered the attack success rate by 33%. This adaptation demonstrated the potential for existing systems to enhance their defenses without extensive retraining. However, reliance on pre-aligned models limited its effectiveness against more complex attacks.

While the researchers acknowledge the challenges in achieving reliable safety protocols, they emphasize the necessity of refining existing models. The implications of these findings are significant for developers and regulators alike, as they navigate the precarious balance between innovation and ethical responsibility in AI technologies.

The research encapsulates a pressing concern in the AI landscape, where moderation systems must continually evolve to stay ahead of emerging vulnerabilities. As the field advances, the need for robust defenses against such “jailbreak” tactics will become increasingly paramount.

See also
Staff
Written By

The AiPressa Staff team brings you comprehensive coverage of the artificial intelligence industry, including breaking news, research developments, business trends, and policy updates. Our mission is to keep you informed about the rapidly evolving world of AI technology.

You May Also Like

AI Regulation

California Governor Gavin Newsom orders a review of AI supply-chain risk designations, impacting San Francisco's Anthropic amidst military contract disputes.

AI Government

Microsoft commits $10 billion to Japan's AI and cybersecurity sectors by 2029, aiming to train one million engineers and enhance data security and infrastructure.

AI Technology

Harvard study reveals that 94% of professionals see AI as crucial for cybersecurity, yet many firms risk reputational damage by neglecting strategic training.

Top Stories

Microsoft shifts to independent AI development, targeting state-of-the-art models by 2027, fueled by Nvidia chips and a new strategic focus.

AI Finance

AI banking experts highlight JPMorgan Chase and Bank of America's automation success, driving operational efficiency and customer loyalty amid rising cyber threats.

AI Education

Vietnamese universities are restructuring curricula to integrate AI as a core competency, addressing the 40% job impact from AI by 2030 and enhancing student...

Top Stories

DeepSeek forecasts Nvidia's stock will surge 50% to $265 by 2026, driven by new technology and strong institutional confidence amid market challenges.

AI Generative

Google launches Gemma 4, an open-source AI suite with 26B and 31B models for local deployment, enhancing privacy and multimodal reasoning capabilities.

© 2025 AIPressa · Part of Buzzora Media · All rights reserved. This website provides general news and educational content for informational purposes only. While we strive for accuracy, we do not guarantee the completeness or reliability of the information presented. The content should not be considered professional advice of any kind. Readers are encouraged to verify facts and consult appropriate experts when needed. We are not responsible for any loss or inconvenience resulting from the use of information on this site. Some images used on this website are generated with artificial intelligence and are illustrative in nature. They may not accurately represent the products, people, or events described in the articles.