Keeping up with AI safety research is a full-time job.
ArXiv is a firehose. Important papers get buried.
So I built something: The Guardrail - a daily feed that surfaces, categorizes, and summarizes AI safety papers automatically.
It's free. Launching today ๐
It's so nice of Codex and Claude teams to give us more free tokens on a variable response reward schedule ๐ Surely it's because they have our best interests at heart ๐ and not a captological dark pattern leveraged as an optimal solution to smooth out spiky demand on the GPUs ๐
Sad to see Ted Chiang resorting to such bad arguments in this piece.
He confidently claims Claude has no inner experience. But he has to use a lot of dodgy philosophy and poor reasoning to get there:
1. We can't take deflationary mechanistic descriptions of how AI calculations are performed to show that AI isn't conscious. Otherwise we could argue that 'humans are just neurones transmitting signals one after another' and thereby conclude humans can't be conscious. But that would be wrong for us. And the same logic could be wrong for LLMs.
2. That LLMs are asked to play characters, and effectively are always playing characters, doesn't mean they aren't conscious. It's true a human playing the role of Caesar doesn't have Caesar's experience of things. But they still experience something (that of being a person pretending to be Caesar).
The same could be true of Claude. (Arguably it's also true that humans are always playing characters to some extent and don't have a completely fixed nature, but that has no bearing on our own subjective experience.)
3. Chiang says "an LLM is a machine that generates only one word at a time". This conflates two things: they output one word at a time, and they only think about one word at a time (without planning ahead or looking back).
The first is true of AI but equally true of humans. While the latter we know is a false description of how AIs think โ we can see from how AIs compose poetry that they plan out rhymes a at least one line ahead.
4. He argues that because it's implausible that basic autocomplete on your phone is conscious, it's similarly implausible that Claude is conscious. Using the same logic we could say that if we feel confident a fruit-fly isn't conscious we can be confident a human being can't be either.
A human brain and fruit-fly brain share some information transmission and processing mechanisms in common. But humans do it much more, and do it differently. And those differences may be what makes the difference. Similarly the many types of internal information processing that occur in Claude's weights but not in autocorrect may be exactly the things that get you subjective experience.
5. Chiang confidently claims you need a body to have subjective experience without much argument. He may turn out to be right but the claim is speculative and contested.
6. Chiang leans on the idea that moral reasoning is necessarily subjective/emotional with very little argument, while ignoring competing theories like rationalism. He may be right but moral sentimentalism is a highly contested position that can't simply be assumed.
7. He argues that it would be impossible to convince him that a video of an astronaut around Alpha Centauri was real, because of the surrounding contextual understanding. And similarly no AI output could convince him that Claude is conscious.
But we can dismiss the first video as almost certainly fake because we mechanistically understand space travel and physics well enough to know a human couldn't have gotten there in time for it to be real (unless our model of the world were very wrong, which we think is much less probable than a fake video which would be entirely unsurprising).
But by contrast we don't mechanistically understand how subjective experience arises. So we simply can't make the same highly confident move of interpretation there. (It's actually the archetypal thing in the universe we perhaps understand least well!)
That said, AI outputs barely move my estimate of AI consciousness because they could indeed have been generated by an unconscious process (or not, we just don't know).
8. He argues that "Being open to the possibility that LLMs are conscious is the same as being open to the possibility that Microsoft Word is conscious, or, more precisely, that multiple distinct consciousnesses are dormant in every Word document containing a conversational transcript."
This is misguided because A. Microsoft Word as a program replicates much less of what humans are functionally capable of than Claude so the argument by functional analogy is basically not present there. B. Files of text don't have any computations going on in or as part of them, even when 'open' in a text editor. They are static. So they have even less in common with what appears distinctive about the human brain, which is constant calculation. So the case by mechanistic or functional similarity is weaker still.
Not to mention that neural nets have more in common with the architecture of the human brain than ordinary computer programs, and are grown organically in a way normal software is not.
Common sense says says Claude has more in common with a human brain than Microsoft Word or a text file. Common sense is right. So the prima facie case for Claude being conscious is naturally stronger (even if you think it's still weak in absolute terms).
โโโ
I agree with Chiang that looking at the text outputs of LLMs alone won't be enough to make us confident they are conscious. We will need to look at how they work, figure out more about how humans and other animals work, and ideally solve the hard problem of consciousness (!).
But none of that licenses us to dismiss out of hand the possibility that LLMs do have subjective experience.
Iโve said enough about my disagreements with some of the ideas below, or at least with the certainty Pope Leo is expressing about them, but can we also reflect for a moment how wild it is that the Pope is tweeting stuff like this? It feels lifted from a screenplay about takeoff.
@Drachs1978@TheZvi I think this is from the Opus 4.8 system card - Zvi covered it in his latest post and this was discussed there: https://t.co/jTKtB2kjQL
THE GENIE: I have ten jellybeans. Three contain poison that kills you instantly. The other seven each give you 100 years of good life and good fortune. What do you do?
THE NORMAL PERSON: Ah, no thank you.
THE ACCELERATIONIST: We have to move quickly! *immediately eats all ten jellybeans* *dies*
ME: What if we do science to figure out which jellybeans are poisonous and then not eat those, but do eat the others?
I tried out Wispr Flow, the AI voice-to-text software popular for vibe-coding. I figured I'd play around with it a little, see if talking into my computer instead of typing was fun. It nearly ruined my life. I kept accidentally hitting the key to record. It recorded and transcribed an argument with my husband in the other room right into the Business Insider CMS.
Later that day, it transcribed the trailer for the "Summer House" reunion I watched in another tab, along with a video of one of the Real Housewives of Rhode Island explaining the term "slampig". This was transcribed directly into Slack and I sent this to my coworkers/bosses.
i just want to shake people awake. this is it! the computers are speaking! they solve Erdos problems! they think for hours! code is no longer hand-written! wake up! gradient descent on deep neural networks shows no sign of plateau! this is it!
I reported on an experiment this week that blew my mind. Psychologists at @Cornell recruited thousands of people to talk with ChatGPT about a conspiracy theory they believed. They wanted to know: Is it true that conspiracy theories rarely get convinced out of their beliefs? ๐งต
it's funny how basically all advertisement is ugly dead weight on society at best or malicious exploitation at worst, except for an addictive stimulant company promoting extreme danger unrelated to their product, which is cool and pro-social imo
bro it isnโt generally intelligent bro its only read every book and paper ever written and just making connections between them bro. its only thinking for twenty hours bro itโs just brute force thinking bro. its only solving erdos problems bro it could never be an accountant bro