We've added a new command to Claude Code called /insights
When you run it, Claude Code will read your message history from the past month. It'll summarize your projects, how you use Claude Code, and give suggestions on how to improve your workflow.
But I just published “Automated alignment is harder than you think” (https://t.co/cwpB1ovo2O)! Automated alignment is not the best plan! A better plan is to not build ASI yet, and the world should try hard to realise that plan. Alas, the speed of progress calls for backups.
People are increasingly worried that AI tools make us overreliant.
But how do we actually measure this? We introduce Offloading Score, a measure of reliance based on the fraction of cognitive effort offloaded to AI while completing a task.
In a controlled user study, Offloading Score detects increased reliance under time pressure, while several common alternatives do not.
(1/9)
Our internal data shows Claude is accelerating AI development—a possible path to recursive self-improvement, or AI autonomously building a more capable successor.
It’s happening faster than we thought, and the implications deserve greater attention. https://t.co/OVVPJO7VQx
Anthropic now has a team dedicated to AI and the rule of law — and we've just opened our first role.
@AnthropicAI has studied what AI means for the economy. This team asks a different question: what will it mean for executive power, for courts and elections — and for the public deliberation that constitutional democracy ultimately rests on?
We're looking for someone with real depth in both AI and the law — a legal scholar, political scientist, or experienced government hand who can reason about frontier systems and the institutions they will affect.
If that's you, or someone you know: https://t.co/668HDz1lhf
As AI R&D is increasingly automated, AI company employees may lose the ground-level context needed to whistleblow effectively.
To help ensure that misconduct is still caught, we might want AI systems to whistleblow as well.
In a new blog post, I explore how AIs could be whistleblowers and propose text that could go in an internal model spec🧵
Social scientists use coding agents for a range of research tasks, but most prominently for coding, editing prose, & methodological advice. Respondents are optimistic such tools will improve productivity but have mixed views about the impact on the field as a whole. 4/6
1/🆕 My new @PIIE research with Patrick McKelvey on 𝗠𝗲𝗮𝘀𝘂𝗿𝗶𝗻𝗴 𝘁𝗵𝗲 𝗔𝗜 𝗘𝗰𝗼𝗻𝗼𝗺𝘆 finds the AI sector in the US 𝗴𝗿𝗼𝘄𝗶𝗻𝗴 𝟮,𝟬𝟬𝟬%+/𝘆𝗲𝗮𝗿 in quality-adjusted terms—yet it's nearly invisible in GDP. We can't let this measurement gap become a policy gap!🧵
I’m probably going to be hiring at least 1-2 people to join me in future exercises like this. Reach out at [email protected] if you're a high-integrity, scrappy, creative, security+LLM researcher
For more detail, see METR's Frontier Risk Report, Appendix B
https://t.co/PLs0u1yiBj
Here are some of my top candidates for big pushes to do right now on technical AI safety (low effort notes).
Much better model organisms / misalignment analogies:
- Doing a wider set of pessimized training runs (https://t.co/He16E5LRnd)
- This may be a good candidate for lots of AI labor automation? Like maybe good to try to set up pipelines for building these envs.
- Demonstrating risks from fitness-seekers/reward-seekers (https://t.co/EguNzMy2vB) empirically
- Even on current models with better tests, see: https://t.co/flEQYSlo02
- Demonstrating various types of memetic spread of misalignment?
Actually do control:
- Build pipelines for red-teaming monitors and the agent itself. For the agent red-teaming, I'd put particular focus on checking whether it continues malign trajectories.
- Scaffold integrated control features and other non-monitoring runtime control measures
- Human response and auditing
- Improving async and sync monitoring
- Agent security features
- Surveilling for rogue internal deployments (as in, building after-the-fact detection methods for rogue deployments)
Preparing for handoff and elicitation
- Get AIs generically better at conceptual work
- Have a plan for the evals we ultimately need to see if handoff/deference would go well (https://t.co/OOg87rrWQi) and start iterating on earlier versions
- These presumably will involve a bunch of manual scoring, so we'll need to build a process for it.
- Analyze AI biases and epistemics and improve across many domains
- Build the anti-slop/anti-mundane-misalignment coalition via doing ratings of AIs and applying some pressure to improve on these ratings. This could focus on a variety of related issues.
- The hope is basically that there might be widespread interest in removing/redacting mundane misalignment and other non-misalignment behavioral problems that reduce productivity and large parts of this seem differentially good. So, if we could make this a salient metric, AI companies might improve this. A lot of the difficulty would be in measuring the problem reasonably well. There are a bunch of different ways to apply pressure or increase salience if we had decent metrics, especially if these metrics legibly correspond to a common problem that many people are running into.
- Try to do various trend extrapolations on things here to argue we aren't on track?
Neuralese decoding prep:
Make natural language autoencoders (https://t.co/D7kOB4hORA) much better, build methods for extracting internal CoT (https://t.co/4ceKpYBLWm), build better evaluations of how well natural language autoencoders work.
@fluxxrider@AnthropicAI UK AISI previously tested a partially trained version. The latest results are on the actual Mythos Preview -- the model as it was on the day we launched Glasswing (on April 7th).
new research from me @METR_Evals:
technical workers claim that today's AI impacts value of their work to an extraordinary degree (& growing over time).
of course, self-reports plausibly overestimate. the magnitudes nonetheless strike me as remarkable. https://t.co/fg1av5SZ61
I am now leading Alignment Training, which covers the teams training Claude’s behavior and alignment with the Constitution as well as Scalable Oversight. We are responsible not only for Claude’s alignment today but also ensuring our work scales with model capabilities.
Some news: This week I am starting at @GoogleDeepMind as Director of AGI Economics on @shanelegg’s team. I will be joining the other amazing cross-disciplinary scientists researching AGI there.
My team will study how frontier AI could reshape the economy: what happens to work and labor, how wealth and power are distributed, how institutions adapt, how AI agents shape markets, and what kinds of models can help us reason clearly about futures that may look very different from the past. I’m incredibly excited to help build this research agenda.
If AGI changes how society operates, economics is going to be critical for shaping our shared future. Many more announcements soon.
This is the first project I’ve worked on since joining @esindurmusnlp’s team on Societal Impacts!
This area is super important and I hope others will work on it too!!
Check it out 👇
Excited to share @tutorintel's Data Factory 1, a 100 robot semi-humanoid research farm and the largest robot data factory in the United States.
Our first embodiment “Cassie” is deployed at industrial scale across the supply chain. We built DF1 to bootstrap fleet-scale learning for our "Sonny" industrial semi-humanoid embodiment, powered by our first end-to-end robot foundation model Ti0.
How do people seek guidance from Claude?
We looked at 1M conversations to understand what questions people ask, how Claude responds, and where it slips into sycophancy. We used what we found to improve how we trained Opus 4.7 and Mythos Preview.
https://t.co/6tjY58uBhk
Most people I know in AI think the median person is screwed, and they have no idea what to do about it.
I spent the last 3 months talking to dozens of researchers, economists, and policy experts about AI's impact on work; including reps from every frontier lab and several Congressional offices. Unfortunately, I was not reassured.
The AI industry is raising the alarm, but can't change course. These companies' core business model relies on the disruption they are warning about: their faith in full automation only makes them go faster.
Policymakers are waking up, but still paralyzed by data and debates. Econ wonks disagree on plenty, but even the limited scenario looks like a "painful transition" that will disempower millions of workers.
But an "underclass" is not inevitable, but rather a societal choice — and one we can and should stop. Instead of waiting for impact, we should start planning now to support workers through AI disruption. Whether policymakers can assuage concerns about economic security may determine if we get to reap AI's gains at all.
New from me for @NYTOpinion. I put a ton into researching what I think may be the biggest topic of the year, so hope you read it (gift link here!) https://t.co/NiGJpjyjzH
@arcinstitute is hiring a CTO. It may be the most important technical role in biology right now - and here's why I'm the one posting it.
This summer I'm transitioning from CTO to Strategic Advisor. When I left Android 18 months ago, I said I wanted to use AI to accelerate drug discovery. Joining Arc was how I put that into action - and the team has delivered: frontier AI x Bio models like Evo, STATE, and STACK, AI research agents like scBaseCount, the Virtual Cell Challenge, a TED Audacious grant, and world-class compute.
Given what my family has been through these past few years, a full-time operational role isn't the right fit right now. The mission still is, which is why I'm staying close as an advisor. Thanks to @skonermann, @pdhsu, and @patrickc for their partnership.
We need a cracked ML and technical leader - mission-obsessed, ready to architect the future of science. DMs open.
https://t.co/YkApAxp1b2
Join us!
https://t.co/gknDGfxkm8
We are also urgently hiring a Human Data Lead to help us maintain time-horizon-like tools for understanding AI autonomy into the next regime of AI capabilities. Please DM me directly if you're interested in this role specifically!