Accelerating AI safety research & building talent pipelines @ConstellOrg. Expert advisor @MITAIRisk. Ex UK AISI, BCG.
Views my own. Likes/RTs != endorsements.
@geoffreyirving Sad to see you leave, Geoffrey. Thanks again for all you have done for AISI. Hope to see you in the Bay Area and excited to learn more about the new org!
๐งตNew Anthropic Fellows research: We studied mechanisms of "introspective awareness" in LLMs.
LLMs can sometimes detect steering vectors injected into their residual stream. But is this worthy of being called introspection, or attributable to some uninteresting confound?๐
Haha those doofuses at ai2027 predicted we'd have professional level hacking abilities and the top ai company would be at $26B in revenue in May 2026. It's April and we already have superhuman hacking and $30B in revenue, why would you take forecasters this bad seriously???
๐จNew paper!
How safe and aligned is Kimi K2.5?
We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N)
๐ Applications are now open: Constellation's Astra Fellowship ๐
Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg!
๐ Apply by May 3rd (begins Sep 2026)
๐ https://t.co/pxtOduDBFh
Weโre opening applications for the next two rounds of the Anthropic Fellows Program, beginning in May and July 2026.
We provide funding, compute, and direct mentorship to researchers and engineers to work on real safety and security projects for four months.