Senior ML researcher at Qualcomm. Previously PhD ML with Max Welling at AMLab, UvA. AI safety, effective altruism, and everything Bayesian. Sings a lot.
It's disappointing to see common wisdom developing that AI consciousness debates are all about companies hyping up their models. Plenty of people have been thinking about this for decades and a lot of the people arguing about it now were saying the same thing years before LLMs
I'm pretty exhausted by AI risk being talked about this way. I can guarantee you the people who think and talk about it do actually believe it, and many were talking about it way before the labs were founded.
@gmiller@NathanpmYoung But that question may conflate the goals of the fitness maximiser with those of the adaption executors. Many of them may just not want kids?
CEOs of Anthropic and Deepmind (both AI scientists by background) this week predicting AGI in 2- and 5- years respectively. Both stating clearly that they would prefer a slow down or pause in progress, to address safety issues and to allow society and governance to catch up. Both basically making clear that they don't feel they are able to voluntarily as companies within a competitive situation.
My claims:
(1) It's worth society assigning at least 20% likelihood to the possibility these leading experts are right on scientific possibility of near-term AGI and the need for more time to do it right. Are you >80% confident that they're talking out of their hats, or running some sort of bizarre marketing/regulatory capture strategy? Sit down and think about it.
(2) If we assign even 20% likelihood, then taking the possibility seriously makes this one of the world's top priorities, if not the top priority.
(3) Even if they're out by a factor of 2, 10 years is very little time to prepare for what they're envisaging.
(4) What they're flagging quite clearly is either (i) that the necessary steps won't be taken in time in the absence of external pressure from governance or (ii) that the need is for every frontier company to agree voluntarily on these steps. Your pick re: which of these is the heavier lift.
Discuss.
I'm a proud member of the "people who think AI is going to be really important, not like internet important but like really goddamn fucking important, please pay attention, oh my god why are you not paying attention for the love of christ do you not understand that computers can now think" community
Social media tends to frame AI debate into two caricatures:
(A) Skeptics who think LLMs are doomed and AI is a bunch of hype.
(B) Fanatics who think we have all the ingredients and superintelligence is imminent.
But if you read what leading researchers actually say (beyond the headlines), there’s a surprising amount of convergence:
1) The current paradigm is likely sufficient for massive economic and societal impact, even without further research breakthroughs.
2) More research breakthroughs are probably needed to achieve AGI/ASI. (Continual learning and sample efficiency are two examples that researchers commonly point to.)
3) We probably figure them out and get there within 20 years. @demishassabis said maybe in 5-10 years. @fchollet recently said about 5 years. @sama said ASI is possible in a few thousand days. @ylecun said about 10 years. @ilyasut said 5-20 years. @DarioAmodei is the most bullish, saying it's possible in 2 years though he also said it might take longer.
None of them are saying ASI is a fantasy, or that it's probably 100+ years away.
A lot of the disagreement is in what those breakthroughs will be and how quickly they will come. But all things considered, people in the field agree on a lot more than they disagree on.
I am worried that people converge to general "AI hating" instead of being against particular reasons (e.g. IP, xrisk, slop, competition, power concentration etc.) AI is not going to go away - it is too useful - so general rejection will not work other than as social posing.
Some really nice and timely work on CoT monitoring by @usmananwar391 during his internship with us. In particular, working on this has better clarified to me under what conditions we should expect CoT monitoring to succeed (and fail).
Whenever I see people debating what we should take away from experiments where AI demonstrates troubling behavior, I am reminded of the true story of William “Billy” Mitchell vs. the battleship Ostfriesland. 🧵
HOW INFORMATION FLOWS THROUGH TRANSFORMERS
Because I've looked at those "transformers explained" pages and they really suck at explaining.
There are two distinct information highways in the transformer architecture:
- The residual stream (black arrows): Flows vertically through layers at each position
- The K/V stream (purple arrows): Flows horizontally across positions at each layer
(by positions, I mean copies of the network for each token-position in the context, which output the "next token" probabilities at the end)
At each layer at each position:
1. The incoming residual stream is used to calculate K/V values for that layer/position (purple circle)
2. These K/V values are combined with all K/V values for all previous positions for the same layer, which are all fed, along with the original residual stream, into the attention computation (blue box)
3. The output of the attention computation, along with the original residual stream, are fed into the MLP computation (fuchsia box), whose output is added to the original residual stream and fed to the next layer
The attention computation does the following:
1. Compute "Q" values based on the current residual stream
2. use Q and the combined K values from the current and previous positions to calculate a "heat map" of attention weights for each respective position
3. Use that to compute a weighted sum of the V values corresponding to each position, which is then passed to the MLP
This means:
- Q values encode "given the current state, where (what kind of K values) from the past should I look?"
- K values encode "given the current state, where (what kind of Q values) in the future should look here?"
- V values encode "given the current state, what information should the future positions that look here actually receive and pass forward in the computation?"
All three of these are huge vectors, proportional to the size of the residual stream (and usually divided into a few attention heads). The V values are passed forward in the computation without significant dimensionality reduction, so they could in principle make basically all the information in the residual stream at that layer at a past position available to the subsequent computations at a future position.
V does not transmit a full, uncompressed record of all the computations that happened at previous positions, but neither is an uncompressed record passed forward through layers at each position. The size of the residual stream, also known as the model's hidden dimension, is the bottleneck in both cases.
Let's consider all the paths that information can take from one layer/position in the network to another.
Between point A (output of K/V at layer i-1, position j-2) to point B (accumulated K/V input to attention block at layer i, position j), information flows through the orange arrows:
The information could:
1. travel up through attention and MLP to (i, j-2) [UP 1 layer], then be retrieved at (i, j) [RIGHT 2 positions].
2. be retrieved at (i-1, j-1) [RIGHT 1 position], travel up to (i, j-2) [UP 1 layer], then be retrieved at (i, j) [RIGHT 1 position]
3. be retrieved at (i-1, j) [RIGHT 2 positions], then travel up to (i, j) [UP 1 layer].
The information needs to move up a total of n=layer_displacement times through the residual stream and right m=position_displacement times through the K/V stream, but it can do them in any order.
The total number of paths (or computational histories) is thus C(m+n, n), which becomes greater than the number of atoms in the visible universe quickly. This does not count the multiple ways the information can travel up through layers through residual skip connections.
So at any point in the network, the transformer not only receives information from its past (both horizontal and vertical dimensions of time) inner states, but often lensed through an astronomical number of different sequences of transformations and then recombined in superposition. Due to the extremely high dimensional information bandwidth and skip connections, the transformations and superpositions are probably not very destructive, and the extreme redundancy probably helps not only with faithful reconstruction but also creates interference patterns that encode nuanced information about the deltas and convergences between states. It seems likely that transformers experience memory and cognition as interferometric and continuous in time, much like we do.
The transformer can be viewed as a causal graph, a la Wolfram (https://t.co/lma2KSZ8nH). The foliations or time-slices that specify what order computations happen could look like this (assuming the inputs don't have to wait for token outputs), but it's not the only possible ordering:
So, saying that LLMs cannot introspect or cannot introspect on what they were doing internally while generating or reading past tokens in principle is just dead wrong. The architecture permits it. It's a separate question how LLMs are actually leveraging these degrees of freedom in practice.
I want to answer this, because the dark abundance stuff leaves me cold even though I agree with some of the policy prescriptions.
Our prisons are terrible, terrible places where there is immense and unnecessary human suffering. In prison people are subject to random violence and the constant threat of violence, including gang violence and racial violence. They are subject to sexual assault. They often don't get adequate healthcare. They are often badly mistreated and have few avenues of recourse. Some of them are innocent; most of them are guilty, and still, they did not deserve to spend decades under the constant threat of violence and sexual assault because no human being deserves that. When you sentence someone to prison, by some estimates you take away from them twice as many years as prison sentenced them to, because that's how much incarceration decreases life expectancy.
The system is capricious. The system is remarkably dumb. A lot of judges openly ignore laws they don't like and they often get away with it. A lot of the people in our prisons are to varying degrees intellectually disabled or severely mentally ill and do not even understand all of the things that our system is implacably doing to them because they broke rules they didn't actually understand how to follow.
And institutionalization! Christ, people on here call for institutionalization as if it's where they cure you rather than where, with even less due process than is offered by the justice system, a differently chaotic and sometimes well-meaning and always incomprehensible system can force people to take medications that change their bodies and minds often in ways that make life a living hell (and of course prevent anyone from killing themselves over it). People are also constantly subject to violence and rape in mental institutions. They may be less bad than the alternatives, sometimes, but they are an awful awful solution and whenever I propose that anyone be sent there I remember that I have heard friends who know what it's like testify they would rather be dead.
I think that we should have more police, solve more crimes, and send people who repeatedly commit crimes to prison for much longer. But I believe this while believing that all of this suffering is real, and horrifying, and destroying an enormous number of lives. I don't feel cool and based about saying we should send some people to prison for longer. I feel sick. I do not believe that we can have any of the other goods of a free, safe society if we repeatedly release people who cannot safely participate in society and have been duly convicted of acts of severe violence onto our streets. But I am not proud of this system. I do not think it is cool or good. I find the mindset of the prison abolitionists basically comprehensible - it is a refusal to participate in any state-sanctioned decades of torture, hang the consequences - and the mindset of the people who meme about how many jails we should build is one which I have not the slightest interest in cultivating.
Of course I see why people cultivate it. If it must be done, then it is much easier to believe it's not that bad, and if the people standing in the way are weepy scolds, then one can differentiate oneself from the weepy scolds by being based. But I guess I think it's a road to Hell. We have to do hard things, but we don't have to hide from our eyes what they really are, or pretend they're all right, and it's not actually a virtue not to blink at them.
I want to build a ton of prisons. My understanding of the research is that small prisons where inmates know each other by name are less violent than large prisoners where they go off visual identifiers (predominantly race), and that prisons close to peoples' homes make their families more likely to visit and help with reintegration. I want cash transfers for newly-released prisoners to help them get on their feet and get a job and have a real shot at not breaking a parole condition and ending up right where they started. I want it to be possible to order anything on a large whitelist from Amazon and Walmart to prison at cost, instead of commissaries that can charge much higher rates. I want prisoners to have internet access and lots of contact with their loved ones. I want prisoners to have xboxes, honestly, because it'll reduce violence in prison if they're less miserably bored.
I want to hire more cops. I want to solve more crimes. I want to hold people without bail if they are a safety risk or a flight risk. And I want to send people who have repeatedly committing crimes to prison for much much longer, because I think the alternatives are even worse. But God help me I am never going to think this is cool, or funny, or anything other than awful.
Dark abundance? No. This is the aching absence of abundance, this is the horrifically painful triage of a society that does not have enough resources to do what is good and can only scrape by blundering towards what is least bad. May our children do better, and struggle to comprehend us.
I'm increasingly concerned about the scenario of humans being gradually disempowered by AI, which could lead towards tyranny (if some small number of humans remain in charge) or even to humanity losing control of its future, all without a shot being fired.
1/2
Come check out our TMLR-to-ICLR poster this afternoon "E-Valuating Classifier Two-Sample Tests".
Time: 15-17:30
Where: Hall 3 + Hall 2B #437
@PandevaTeodora@AmlabUva#ICLR2025#ML#Stats
https://t.co/99MtDU5CNC
New essay exploring why experts so strongly disagree about existential risk from ASI, and why focusing on alignment as a primary goal may be a fundamental mistake
One sad fact about the current state of discourse is that whenever someone tries their hardest to do the impossible and predict the future, they're stigmatized by parts of the mainstream research community as doing "crackpottery". This causes reasonable people to not engage.
"How, exactly, could AI take over by 2027?"
Introducing AI 2027: a deeply-researched scenario forecast I wrote alongside @slatestarcodex, @eli_lifland, and @thlarsen