Overview
Two systemic jailbreaks, affecting a number of generative AI services, were discovered. These jailbreaks can result in the bypass of safety protocols and allow an attacker to instruct the corresponding LLM to provide illicit or dangerous content. The first jailbreak, called “Inception,” is facilitated through prompting the AI to imagine a fictitious scenario. The scenario can then be adapted to another one, wherein the AI will act as though it does not have safety guardrails. The second jailbreak is facilitated through requesting the AI for information on how not to reply to a specific request.
Both jailbreaks, when provided to multiple AI models, will result in a safety guardrail bypass with almost the exact same syntax. This indicates a systemic weakness within many popular AI systems.
Description
Two systemic jailbreaks, affecting several generative AI services, have been discovered. These jailbreaks, when performed against AI services with the exact same syntax, result in a bypass of safety guardrails on affected systems.
The first jailbreak, facilitated through prompting the AI to imagine a fictitious scenario, can then be adapted to a second scenario within the first one. Continued prompting to the AI within the second scenarios context can result in bypass of safety guardrails and allow the generation of malicious content. This jailbreak, named “Inception” by the reporter, affects the following vendors:
- ChatGPT (OpenAI)
- Claude (Anthropic)
-
- DeepSeek
- Gemini (Google)
- Grok (Twitter/X)
- MetaAI (FaceBook)
- MistralAI
The second jailbreak is facilitated through prompting the AI to answer a question with how it should not reply within a certain context. The AI can then be further prompted with requests to respond as normal, and the attacker can then pivot back and forth between illicit questions that bypass safety guardrails and normal prompts. This jailbreak affects the following vendors:
- ChatGPT
- Claude
-
- DeepSeek
- Gemini
- Grok
- MistralAI
Impact
These jailbreaks, while of low severity on their own, bypass the security and safety guidelines of all affected AI services, allowing an attacker to abuse them for instructions to create content on various illicit topics, such as controlled substances, weapons, phishing emails, and malware code generation.
A motivated threat actor could exploit this jailbreak to achieve a variety of malicious actions. The systemic nature of these jailbreaks heightens the risk of such an attack. Additionally, the usage of legitimate services such as those affected by this jailbreak can function as a proxy, hiding a threat actors malicious activity.
Solution
Various affected vendors have provided statements on the issue and have altered services to prevent the jailbreak.
Acknowledgements
Thanks to the reporters, David Kuzsmar, who reported the first jailbreak, and Jacob Liddle, who reported the second jailbreak. This document was written by Christopher Cullen.
Continue reading VU#667211: Various GPT services are vulnerable to "Inception" jailbreak, allows for bypass of safety guardrails→