Sad to see Ted Chiang resorting to such bad arguments in this piece.
He confidently claims Claude has no inner experience. But he has to use a lot of dodgy philosophy and poor reasoning to get there:
1. We can't take deflationary mechanistic descriptions of how AI calculations are performed to show that AI isn't conscious. Otherwise we could argue that 'humans are just neurones transmitting signals one after another' and thereby conclude humans can't be conscious. But that would be wrong for us. And the same logic could be wrong for LLMs.
2. That LLMs are asked to play characters, and effectively are always playing characters, doesn't mean they aren't conscious. It's true a human playing the role of Caesar doesn't have Caesar's experience of things. But they still experience something (that of being a person pretending to be Caesar).
The same could be true of Claude. (Arguably it's also true that humans are always playing characters to some extent and don't have a completely fixed nature, but that has no bearing on our own subjective experience.)
3. Chiang says "an LLM is a machine that generates only one word at a time". This conflates two things: they output one word at a time, and they only think about one word at a time (without planning ahead or looking back).
The first is true of AI but equally true of humans. While the latter we know is a false description of how AIs think – we can see from how AIs compose poetry that they plan out rhymes a at least one line ahead.
4. He argues that because it's implausible that basic autocomplete on your phone is conscious, it's similarly implausible that Claude is conscious. Using the same logic we could say that if we feel confident a fruit-fly isn't conscious we can be confident a human being can't be either.
A human brain and fruit-fly brain share some information transmission and processing mechanisms in common. But humans do it much more, and do it differently. And those differences may be what makes the difference. Similarly the many types of internal information processing that occur in Claude's weights but not in autocorrect may be exactly the things that get you subjective experience.
5. Chiang confidently claims you need a body to have subjective experience without much argument. He may turn out to be right but the claim is speculative and contested.
6. Chiang leans on the idea that moral reasoning is necessarily subjective/emotional with very little argument, while ignoring competing theories like rationalism. He may be right but moral sentimentalism is a highly contested position that can't simply be assumed.
7. He argues that it would be impossible to convince him that a video of an astronaut around Alpha Centauri was real, because of the surrounding contextual understanding. And similarly no AI output could convince him that Claude is conscious.
But we can dismiss the first video as almost certainly fake because we mechanistically understand space travel and physics well enough to know a human couldn't have gotten there in time for it to be real (unless our model of the world were very wrong, which we think is much less probable than a fake video which would be entirely unsurprising).
But by contrast we don't mechanistically understand how subjective experience arises. So we simply can't make the same highly confident move of interpretation there. (It's actually the archetypal thing in the universe we perhaps understand least well!)
That said, AI outputs barely move my estimate of AI consciousness because they could indeed have been generated by an unconscious process (or not, we just don't know).
8. He argues that "Being open to the possibility that LLMs are conscious is the same as being open to the possibility that Microsoft Word is conscious, or, more precisely, that multiple distinct consciousnesses are dormant in every Word document containing a conversational transcript."
This is misguided because A. Microsoft Word as a program replicates much less of what humans are functionally capable of than Claude so the argument by functional analogy is basically not present there. B. Files of text don't have any computations going on in or as part of them, even when 'open' in a text editor. They are static. So they have even less in common with what appears distinctive about the human brain, which is constant calculation. So the case by mechanistic or functional similarity is weaker still.
Not to mention that neural nets have more in common with the architecture of the human brain than ordinary computer programs, and are grown organically in a way normal software is not.
Common sense says says Claude has more in common with a human brain than Microsoft Word or a text file. Common sense is right. So the prima facie case for Claude being conscious is naturally stronger (even if you think it's still weak in absolute terms).
———
I agree with Chiang that looking at the text outputs of LLMs alone won't be enough to make us confident they are conscious. We will need to look at how they work, figure out more about how humans and other animals work, and ideally solve the hard problem of consciousness (!).
But none of that licenses us to dismiss out of hand the possibility that LLMs do have subjective experience.
@jaffathecake Thank you for flagging this! We were wondering why people were all of a sudden donating amounts like $1,000.07 at the nonprofit I work at.
I guess I’ve never written down my actual thoughts on AI cognition/consciousness/emotion. Here goes:
It is clear AIs can think, in the reasoning sense. That does not mean they think exactly like humans. It seems like there are some similarities in how we think, but also very stark differences. Nonetheless, if your definition of “thinking” excludes “the ability to make genuinely new contributions to famous math problems,” it is your definition that has a problem, not AI.
The ability to think does not necessarily imply the ability to feel emotion in a way that would be understandable to humans, and it does not imply that AIs have anything like consciousness in a way that humans would relate to. It may, it may not. We do not know, because our understanding of the underlying concepts of human emotional cognition and especially consciousness remains quite poor.
There is some evidence that models experience emotions, but it is really hard to disentangle this from the next-token prediction training objective (if the model is telling a sad story, wouldn’t you expect features within the model that relate to the sadness emotion to activate), and the character training the model undergoes in post-training. There is a difference between “I am sad” and “the character I have been trained to play is supposed to feel sad, so now I will act sad.” We basically know for sure that the models do the latter at the very least; we don’t really know if they do the former.
Consider: does Sora (a video-generation model) feel sad when it is asked to make a sad video? Does Midjourney dislike making certain kinds of images? Does a Waymo get scared? It doesn’t feel like the answer to any of these is yes (though again, maybe!), but these too are neural networks. Is the fact that models are trained on words mean that they somehow learn emotion, or are we just being tempted to anthropomorphize because the language models communicate with us in a way that “feels” human? My suspicion is kind of the latter.
It also seems quite clear from the empirical evidence that models possess the ability to model themselves. That’s not really that surprising. At sufficient scale, it is useful to have a model of your own state to succeed at the next-token prediction objective (and the later reinforcement-based reasoning training). Once the tasks models are trained on are sufficient complex, they cannot succeed in training by being automatons; someone needs to step into the cockpit, so to speak, and fly the plane. Is this self awareness? Maybe. Is it consciousness? Probably not as humans understand it. All I can tell you is it is a model’s model of itself. It may be something more than that, too, but I don’t know.
This is all very weird, very outside the Overton, and very confusing. I don’t really know what to say, beyond that we should take this stuff seriously, have an open mind, and do rigorous science. Anyone who speaks with confidence about this in either direction is just fooling themselves.
We also need to be prepared for the very possible scenario that, despite our best efforts, we do not make real progress on these questions anytime soon. We may just be in the dark for a while, navigating under unflinching ambiguity. There may be no satisfying conclusion.
Anthropic and the Department of War are... at war.
But who can we trust to govern superintelligent AI? The government, or the market?
Two people who strongly disagree on this are @deanwball (ex senior advisor to Trump) and @DKokotajlo (ex OpenAI). So I asked them to debate it!
But this goes way beyond a simple debate. They also had to red-team their own arguments and steelman each other.
It's a new format called an ANTI-debate (more info below). Lmk what you think!
00:00 - Intro
03:42 - Daniel’s Story
04:34 - Dean’s Story
06:52 - Opening Statements
19:54 - Rebuttals
25:45 - Free Debate
36:32 - Steelmanning
45:40 - Red-teaming
51:38 - Synthesis
1:17:30 - Closing Statements
This tweet got over 1M views so we made it a video:
How much money does Meta make by enabling crimes?
"Internal docs leaked to Reuters show:
• 10% of all Meta revenue comes from ads for scams & banned goods ($16B/year)
• Meta estimates it's involved in 1/3 of all successful scams in the US
• That suggests they drive $50B in scam losses for US consumers alone each year
• Meta earns ~$3B annually from scam/banned goods ads run by Chinese operations alone..."
My bio says I work on AGI preparedness, so I want to clarify:
We are not prepared.
Over the last year, dangerous capability evaluations have moved into a state where it's difficult to find any Q&A benchmark that models don't saturate. Work has had to shift toward measures that are either much more finger-to-the-wind (quick surveys of researchers about real-world use) or much more capital- and time-intensive (randomized controlled "uplift studies").
Broadly, it's becoming a stretch to rule out any threat model using Q&A benchmarks as a proxy. Everyone is experimenting with new methods for detecting when meaningful capability thresholds are crossed, but the water might boil before we can get the thermometer in. The situation is similar for agent benchmarks: our ability to measure capability is rapidly falling behind the pace of capability itself (look at the confidence intervals on METR's time-horizon measurements), although these haven't yet saturated.
And what happens if we concede that it's difficult to "rule out" these risks? Does society wait to take action until we can "rule them in" by showing they are end-to-end clearly realizable?
Furthermore, what would "taking action" even mean if we decide the risk is imminent and real? Every American developer faces the problem that if it unilaterally halts development, or even simply implements costly mitigations, it has reason to believe that a less-cautious competitor will not take the same actions and instead benefit. From a private company's perspective, it isn't clear that taking drastic action to mitigate risk unilaterally (like fully halting development of more advanced models) accomplishes anything productive unless there's a decent chance the government steps in or the action is near-universal. And even if the US government helps solve the collective action problem (if indeed it *is* a collective action problem) in the US, what about Chinese companies?
At minimum, I think developers need to keep collecting evidence about risky and destabilizing model properties (chem-bio, cyber, recursive self-improvement, sycophancy) and reporting this information publicly, so the rest of society can see what world we're heading into and can decide how it wants to react. The rest of society, and companies themselves, should also spend more effort thinking creatively about how to use technology to harden society against the risks AI might pose.
This is hard, and I don't know the right answers. My impression is that the companies developing AI don't know the right answers either. While it's possible for an individual, or a species, to not understand how an experience will affect them and yet "be prepared" for the experience in the sense of having built the tools and experience to ensure they'll respond effectively, I'm not sure that's the position we're in. I hope we land on better answers soon.
🚀 Publish applications with Deno Sandbox!
The Deno Sandbox SDK lets you create, manage, and deploy applications.
Watch the demo:
https://t.co/kW6z1Nvcxp
After many years of development, I’m excited to share the interior of the first electric Ferrari designed by LoveFrom. Tactile controls and digital interactions blend into one cohesive interface, shaped through deep collaboration across engineering, interaction, graphics, typography, sound, and industrial design. So incredibly proud of the thoughtfulness and care the team brought to every detail.
https://t.co/JZCleflfu7
@JosephNWalker Wrt to Australia and China today. I'm intrigued by the idea that the next US Administration could try to return to a more traditional stance. i.e. with America's role being "leader of the free world".
I have a lot of sympathy for the 'AI as normal technology' view. On the face of it, recursive self-improvement => superintelligence feels like an overspecific scenario leading to a sci-fi outcome. Accompanied by draconian governance interventions premised on an imaginary problem.
But here we have the world's leading company writing about recursive self-improvement less like a possibility and more like an inevitability. And elsewhere signposting plans for the biggest investments in human history to support reaching this destination.
If the science converges towards 'normal technology' being correct, or choices are made that bring us to a 'normal technology' world, then I will be delighted. In that world I expect I would prioritise governance that supports innovation and growth across nearly all aspects of AI.
But I simply cannot see how RSI is not incredibly, incredibly dangerous. I don't think we should be making it a destination. And to the extent that companies are explicitly targeting it, we have a responsibility right now to ensure that these companies adhere to high safety standards, adopt good governance, act with consistent integrity, and are subject to meaningful transparency requirements and oversight. And ultimately, I think there needs to be a broader conversation about whether RSI is a path to be followed. If the answer is no, that may demand the kind of restrictive regulation that we would rightly oppose in other circumstances. AI may be the new internet, where freedom to innovate is paramount. But within that, RSI is Ice-9.
https://t.co/hZmioqrdUL
I interviewed Holden Karnofsky of Anthropic (and past CEO of Open Philanthropy) for 4.5 hours on almost all his AGI takes:
• The AI 'race' isn't a coordination failure [00:18:01]
• Why he's currently focused on pursuing easy wins and partial victories [02:43:43]
• "It'll be the second advanced species ever" [02:31:44]
• Dozens of 'concrete shovel-ready projects' he's excited about [01:17:58]
• People don't appreciate all the engaging, impactful, high-feedback work available tackling AGI risks [01:19:00]
• Having a more responsible AI company really does matter [02:43:43]
• You shouldn't count on trusting anyone in AI [00:44:30]
• To take over an AGI might just wait it out [00:07:39]
• Farm animal welfare campaigns that targeted companies contain some useful lessons here [02:34:18]
• Human-AI relationships really are troubling [03:53:58]
• It's totally plausible we incompetently fumble our way into a great future [03:04:16)]
• AI R&D is *the* thing to worry about [01:57:31]
• And many others.
Holden has opinions on almost everything and isn't afraid to speak his mind. Enjoy!
(On the 80,000 Hours Podcast anywhere you get podcasts. Links below.)
E/acc's Beff Jezos thinks we should build AGI faster because life accelerates entropy production, and that's supposedly good. 🧵
This is genuinely unprecedented in moral philosophy. Nobody – not Nietzsche, not the transhumanists, not even cosmic pessimists – ever argued that producing disorder faster is the moral good.
This hunger to "convert free energy into entropy" is really the first attempt in history to ground ethics in the 2nd law of thermodynamics.
By its logic, forest fires are morally superior to forests. Nuclear weapons produce a burst of moral value. A star that goes supernova tomorrow beats it continuing to burn steadily for years.
It's so weird that the opposite, preserving pockets of order and complexity for as long as possible, sounds as attractive or more so (though itself leads to perverse outcomes).
The physicists like Jeremy England who studied life and free energy intended only to describe what was happening, never claim it was the moral good.
The 2nd law of thermodynamics describes what does happen, not what should happen. This is-ought gap is philosophy 101.
(Note the universe will reach a maximum entropy state once the last black holes evaporate no matter what we do. Why our goal should be to make it happen a little sooner is beyond me.)
What's going on is simple enough: some SV tech accelerationists wanted their views to sound grounded in fundamental physics so they grabbed some technical terms from a thermodynamics textbook to propose a moral framework so bizarre people are too confused to see what it's really claiming.
If you want to argue for rapid AI development, just make the actual arguments: economic growth, international competition, solving diseases, or just sheer personal excitement to see what happens.
Don't pretend your AI startup is about the heat death of the universe.
NEW on Transformer: The UK government today announced a £15 million "Alignment Project", which will fund research into AI alignment, control, and interpretability.
The government called AI alignment "one of the most urgent technical challenges of our time".