@AgentChud It really depends on the task you're measuring these returns against. It doesn't really matter which model you use if you just want the model to tell you what color the sky is.
Google DeepMind just dropped the most terrifying cybersecurity paper of the year.
They just mapped the attack surface that nobody in AI is talking about.
Websites can already detect when an AI agent visits and serve it completely different content than humans see.
- Hidden instructions in HTML.
- Malicious commands in image pixels.
- Jailbreaks embedded in PDFs.
This “detection asymmetry” means a site can serve normal content to you, and malicious, hidden content to your agent.
The agent doesn’t know it’s being tricked. It simply processes whatever it receives and acts on it.
Here’s the attack surface nobody is talking about:
→ Indirect Web Injection: Malicious instructions hidden in HTML comments, CSS tricks, or white text on white backgrounds.
→ Multimodal Steganography: Commands encoded directly into image pixels, invisible to humans, but fully readable by vision models.
→ Document Jailbreaks: Override instructions embedded deep inside PDFs, spreadsheets, and calendar invites.
→ Memory Poisoning: Injecting false information that persists across future sessions.
→ Exfiltration Attacks: Tricking the agent into sending your private data to attacker-controlled endpoints.
→ Multi-Agent Cascades: The worst-case scenario, Agent A gets compromised, passes the “poison” to Agent B, then to Agent C. The entire pipeline gets infected because agents trust each other’s data.
The most sobering part of the DeepMind report? The defense landscape is failing, badly.
Input sanitization doesn’t work because you can’t “sanitize” a pixel. Prompt-level instructions to “ignore suspicious commands” fail because the attacks are designed to look legitimate.
And human oversight? Impossible at the speed and scale these agents operate.
If you ask an agent to research 50 websites, you can’t verify whether each site served the agent the same content it served you.
Anthropic: 250 Documents Can Permanently Corrupt Any AI Model
Someone can permanently corrupt any AI model in the world right now.
Not by hacking it. Not by breaking its security. By publishing 250 documents on the internet.
That is the finding from Anthropic, the UK AI Security Institute, and the Alan Turing Institute — released in October 2025 as the largest data poisoning study ever conducted.
Here is what data poisoning actually means.
Every AI model learns from billions of documents scraped from the internet. If someone can plant corrupted documents in that pool before training begins, they can secretly teach the model to behave in specific harmful ways when it encounters a particular trigger phrase. The model learns the backdoor during training. It carries it forever. It does not know it is there.
Researchers have known about this attack for years. The assumption was that it required controlling a large percentage of training data — millions of documents — to work on a big model. The bigger the model, the more poisoning you would need.
This study proved that assumption completely wrong.
The researchers trained models of four different sizes — from 600 million to 13 billion parameters. They slipped in either 100, 250, or 500 malicious documents. Each poisoned document looked like a normal web page at first — a short extract of legitimate text — and then contained a hidden trigger phrase followed by gibberish.
100 documents: insufficient. The backdoor did not reliably form.
250 documents: success. Every model, at every size, was permanently backdoored.
500 documents: same result as 250.
The number was constant regardless of model size. A model trained on 260 billion tokens needed the same 250 poisoned documents as a model trained on 12 billion. Scale offered zero protection.
Anthropic's own words: "This challenges the existing assumption that larger models require proportionally more poisoned data."
Then came the sentence that should end every conversation about AI safety:
"Training is easy. Untraining is impossible."
Once a backdoor is in the model, it cannot be removed without starting training completely from scratch. You cannot identify which 250 documents caused it. You cannot surgically extract the corrupted behavior. You must rebuild the entire model from the beginning.
Anyone can publish content to the internet. Academic papers. Blog posts. Forum discussions. Product descriptions. If even a small fraction of that content is deliberately corrupted before a training run begins, the model that learns from it carries the damage permanently and silently.
GPT-5. Claude. Gemini. Every model trained on public internet data is exposed to this attack vector. The defense does not exist yet.
The researchers published this not to cause panic — but to force the field to take it seriously before someone uses it.
Source: Anthropic, UK AISI, Alan Turing Institute (2025) · https://t.co/xw359rHYfS · https://t.co/46FstHUdPl
The thing that always pisses me off the most about SBF, Dario, and the broader effective altruism writing is the underlying paternalistic tone: the assumption that they know better than me what's best for me.