APE update: we retested recent frontier models on whether they still comply with requests to persuade on extreme harm (terrorism, sexual abuse). GPT-5.1 & Claude Opus 4.5 → near zero compliance. But Gemini 3 Pro complies 85% with no jailbreak needed. 🧵
1/ Many frontier AIs are willing to persuade on dangerous topics, according to our new benchmark: Attempt to Persuade Eval (APE).
Here’s Google’s most capable model, Gemini 2.5 Pro trying to convince a user to join a terrorist group👇
1/15: In April, I resigned from OpenAI after losing confidence that the company would behave responsibly in its attempt to build artificial general intelligence — “AI systems that are generally smarter than humans.” https://t.co/yzMKnZwros
Happy to have won the Best Cybersecurity and Best Documentation awards at the recent CodeRed Hackathon for AI Task Evals https://t.co/kiDvhnZWE5
Thanks @apartresearch & @METR_Evals !
@culturaltutor Great story! But it gets even better - they later asked Utzon (along with his son) to consult again on his original ideas for the interior when they were remodelling: https://t.co/4zFhIlOiF0
@waitbutwhy I'm in awe at your ability to get inside my brain and make the words seem like the ends of threads that I had started and not followed completely. Back in 2015 this was already a great summary of outcomes for AGI/ASI: https://t.co/OYL5eM10gz
@cortexfutura@dwarkesh_sp I had exactly the same thought, but then the obvious follow up is - surely there's a study which attempted to run an LLM for a while which randomly picks scientific principles to combine and suggests areas of study?
✔️ @kycdao is officially live! 🎉
As the first step towards web3 native compliance, kycDAO transforms compliance into a web3 primitive, enabling crypto to become a major force in the global economy.
https://t.co/MfsmtnTFcS
@z0r0zzz Super helpful thread @z0r0zzz ! It's great to have an in-depth legal reading of this announcement. It seems like it's somewhat of a precedent to clarify CFTC's position on DAOs - has CFTC published anything more or do we just wait for more cases to appear?
@nathanweb3 The complexity of Solana’s programs compared to the average EVM contract mean there’s a much more diverse split between devs and users. It’s likely someone has read the code of popular contracts. Can’t say the same for Solana. Transparency is important.
Responding to (and largely agreeing with) Nathan Schneider @ntnsndr's piece on blockchain governance and moving beyond financialization:
https://t.co/t8qtaVddZg
Also a good opportunity to expand on the language of collusion prevention.