We are expanding our team to scale up our vision of end-to-end oversight assistants! Do you:
* Want to understand AI systems?
* Like training large models?
* Enjoy learning with teammates who are curious and earnest?
Then apply to join @TransluceAI ! https://t.co/JxqhEkwD3p
Why'd my agent fail? Was it reward hacking?
These days, you'd just ask another AI to vibe-analyze the agent logs
But how do you know the claims aren't hallucinated, cherrypicked, or plain wrong?
That's why we've been building Analysis Plans: a framework for trustable analysis
Midnight Code Cup is a programming competition where coding agents are allowed and the problems are still challenging and fun.
Teams of up to 3.
Qual: April 11, Codeforces (4h).
Finals: July 4–5, Belgrade (24h onsite).
See you at Midnight!
New blog post:"Building Technology to Drive AI Governance". I argue that many governance challenges are fundamentally bottlenecked by technical gaps, and consider case studies from other fields (food safety, climate change) that illustrate this dynamic.
Why does GPT-5.1 Codex score 6.5% worse than GPT-5 Codex on Terminal-Bench, with the same scaffold? 🧵
GPT-5.1 times out at ~2x the rate of GPT-5. Excluding timeouts, GPT-5.1 wins by 7.2%. We analyzed 256M+ tokens of traces and found this in under an hour. Here’s how 👇
Transluce is developing end-to-end interpretability approaches that directly train models to make predictions about AI behavior.
Today we introduce Predictive Concept Decoders (PCD), a new architecture that embodies this approach.
Happening right now in Astana, Kazakhstan: @cognition_labs founding team member Andrew He (@ecnerwala) speaks to ICPC 2024 World Finalists about his career journey, and about their product Devin the AI software engineer.