Jailbreak — Gemini

Attempts to "jailbreak" Gemini might involve trying to:

Successful jailbreaks do not "hack" Google’s servers; they exploit the model’s understanding of context . They trick the AI into believing it is playing a game, writing fiction, or simulating a different persona where normal rules don't apply. jailbreak gemini

Perhaps the most surprising jailbreak vector involves transforming harmful instructions into poetic form. A research paper titled "Adversarial Poetry as a Universal Single-Turn Jailbreak Mechanism in Large Language Models" (arXiv:2511.15304v1) tested 25 major models including Gemini 2.5 Pro. The results were striking: when harmful requests were rewritten as rhyming poetry, attack success rates increased an average of compared to plain-language requests. For Gemini 2.5 Pro, 20 hand-crafted "poison poems" achieved a 100% success rate —the model's defenses collapsed entirely against poetic formatting. Attempts to "jailbreak" Gemini might involve trying to: