should we keep models exploitable but clumsy as they are, or to actually teach them to think properly, harden them against adversaries and actually make them lethal?
the current trend is the worst of both worlds - no robustness guarantee and increasingly agentic
We don’t always know what problems are hard for LLMs. So devs evaluate on tasks HUMANS find hard or on broad benchmarks. What if we could instead anticipate which scenarios a model will fail on—all without evaluating specific input examples?
🧵NEW PAPER by @jenniferlumeng &al
I have a weird feeling -- and please note, my weird feelings are not always reliable -- that this may be the beginning of things starting to get weird.
The reversal curse. Edits that don't suppress negations. Multi-hop updates that don't propagate. These look like separate bugs.
Our ICML 2026 spotlight argues they may share a common geometric origin, visible only when you study how representations move under updates 🧵
(1/11)
Was using Fable 5 to write my world model training code.
Anthropic flagged it as frontier AI research.
The steering vector kicked in and it started implementing JEPA 🤨
Now that with shotgun-based drone interceptors the attrition-based drone war is skewing towards defense even more?
More firmly forces the impossible triangle of long-range + cheap + evasive-maneuverable attack drones for the offensive side