1/We audited Hermes agent (@NousResearch) with autofyn and found 35 vulnerabilities and 18 exploitable attack chains. Everything disclosed privately to the Nous team already.
For 11 months, no agent could crack 50% on the world's hardest data engineering benchmark.
Last week, our tiny team beat JetBrains to #1 by 7.45 points.
The trick wasn't a bigger model or a hand-tuned harness. It was a 100-year-old idea from mathematics
🧵
Broke up with Cursor. It was eating 50+ GB of RAM and my maxed-out MacBook Pro was begging for mercy.
Met Ghostty. Lightweight. Fast. The Synthwave theme is gorgeous.
Terminal-first coding > bloated IDEs.
My machine finally breathes again.
Counterintuitive thing I've learned building with AI:
Dumb model + smart system < smart model + simple prompt.
I stopped building elaborate agent loops that check success criteria at every step. Started just throwing maximum intelligence at the task.
The results aren't close.
Zoom out far enough, everything looks bad—entropy, loss, endings. But zoom out further, and everything looks good—innovation, resilience, beginnings. Your time horizon defines your optimism.