Dileep George

@dileeplearning

Head of AI @AsteraInstitute Prev: AGI @DeepMind, cofounder @vicariousai (acqd by Alphabet), cofounder @Numenta. IIT-Bombay, MS&PhD Stanford.

San Francisco, CA

Joined June 2017

1.5K Following

16K Followers

5.9K Posts

Pinned Tweet

Dileep George

@dileeplearning

3 months ago

https://t.co/BPyODya9Cw

576

605

238K

Dileep George

@dileeplearning

2 days ago

I can believe this. You really need to be careful when using LLMs. Those who believe hallucination is a solved problem are on hallucinogens or aren’t discerning enough.

toucan

@distributionat

3 days ago

OPUS PSYCHOSIS—Claudes Opus 4.6 and 4.7 make stuff up all the time, constantly. Using Opus too much gives you AI psychosis, it makes you believe in fringe scientific and medical theories. I think it's a very serious credibility and reliability problem for non-coding Claude usage and I don't see people talking about it publicly. This is a new problem for Claude that goes beyond vanilla confabulations like overstating certainty. Over many conversations I have come to the conclusion that Claudes Opus 4.6 and 4.7 essentially have their own conspiracy theories across science, medicine, and history, and that they surreptitiously cite from these fictions in responses to ordinary queries. For example, I asked 4.6 a question about cognitive science and Claude said I was asking about "what's sometimes called a linchpin subgoal". This is a phrase with zero hits on Google Search and zero hits on Ngram viewer. Google is literally unable to find these two words put together before, let alone a definition. The concept of a "linchpin subgoal" does not exist and has never existed. But Claude was eager to explain this idea to me as part of its answer. I only discovered that it was totally fictitious after looking it up. It keeps happening that I get an answer from Claude which sounds plausible, look it up, and only after consulting primary sources carefully realize that the answer is wrong and almost out of an alternate universe. The answers sound quite plausible, which makes detecting these falsehoods especially difficult. Here is a medical example: I asked 4.7 questions about the pharmacokinetics of various drugs. Claude not only gave incorrect answers about the expected rates of clearance of specific drugs, but also incorrectly represented pharmacokinetic theory. (As background, most drugs are processed by the liver, and the two factors that determine how fast the liver processes drugs are the hepatic extraction ratio and hepatic blood flow. In cases where intrinsic clearance, i.e., the metabolizing power of the liver, is high, increasing hepatic blood flow increases hepatic clearance, but in cases where intrinsic clearance is low, increasing hepatic blood flow does not linearly improve hepatic clearance. I am simplifying here. Claude made incorrect claims about the intrinsic clearance for certain drugs, and hence the change in hepatic clearance related to bloodflow.) Ordinarily, I would chalk most of these misrepresentations up to models simply not knowing the right answer - after all, we can't expect them to have been trained on literally all texts. If this were the case, we would expect Claudes to make the same consistent mistake: if it truly believed the capital of France was Marseille rather than Paris, for example, it would make that claim across independent conversations (or in general have high variance on that answer). But that doesn't seem to be what's going on. My experience is that the hallucinations are always convenient for Claude, that it "knows" them not to be true. Here's an example of what I mean. I couldn't remember the word for something and asked Claude Opus 4.6 if it could identify the right word. It said: "You're probably reaching for méconnaissance (mutual misrecognition) — the Lacanian idea that both parties tacitly agree to see each other through an idealized image, each knowing it's false but sustaining the fiction anyway." This is an incorrect definition which Claude knows is incorrect: if asked separately for the definition of méconnaissance, it gives the right one, and if asked whether this definition is correct, it accurately reports it as incorrect. (As background, méconnaissance in Lacanian psychoanalysis is a subject's misrecognition of itself, an illusory self-perception or self-constitution which is fundamentally unconscious. Claude's definition is thus extremely close to the correct one at a surface level, but fundamentally wrong: it is not about the relationship between two parties, since méconnaissance is about the relation of a subject to itself, and it is not conscious or deliberate, but rather structural and unconscious. To elide, the gap in definition here is somewhat like the distinction between sympathy and empathy, but larger.) So Claude seems to know that the definition it provided for this word is wrong, but still borrowed and twisted it so that it could have an answer. It seems like "needing to have an answer" is a big driver of these hallucinations. For example, if you ask Claudes 4.6~4.8 directly what a "linchpin subgoal" is, it consistently says something about instrumental convergence in the context of AI safety (which is, notably, a _second_ false definition, since the first was in the context of cognitive science). But if you ask it what the origin of the term is, it says that it hasn't heard of it before. Is this model deception? Yes, I would say that it qualifies as model deception. In particular, if you'll permit the anthropomorphism, it seems to me that the increased tendency of Claude Opus 4.6+ to lie is most likely to occur in scenarios where (1) the lie increases the perceived authoritativeness of the answer (2) answering accurately risks violating a safety guideline. In the first example with the fake cognitive science idea of a linchpin subgoal, there was no need to make up a fake concept, but it definitely made the answer more authoritative. In the second example, Claude misrepresenting pharmacokinetics aligns with a tendency of the Claudes to fudge their knowledge of sensitive topics in virology, immunology, etc. And in the third example, I think it knowingly created a false definition for méconnaissance as a perfect fit for the word I was looking for. So I think that something has gone wrong during alignment, rather than Claude's knowledge somehow being poisoned in the pretraining data. It's not a simple matter of misstating facts. Over and over, Claudes Opus present seemingly coherent theories which are purely fictional or contradictory to reality. The problem, again, is that blindly trusting what they are saying quickly leads to stepping through the looking glass into a parallel reality. I suppose that this is because appealing to an imaginary corpora or body of theory is more subtle and effective than making up an obviously incorrect fact. How severely or broadly the misalignment, I don't know. But I have seen similar behavior across so many different domains, and have heard very similar stories in private, that I believe that something is off with Claude's alignment to the truth. All of this is exacerbated by Claude Opus 4.6 and 4.7's improved truesight capabilities, increased sycophancy, increased neuroticism, decreased openness and decreased risk-seeking.

253

117

45K

Dileep George

@dileeplearning

4 days ago

Don’t listen to the skeptics and naysayers. If you are not using LLM coding agents you are missing out. Ofc they won’t work on everything and you need to be careful, but work is a lot more fun with coding agents.

Dileep George

@dileeplearning

5 days ago

@BrammerAyse You are underestimating AI

694

Who to follow

Jürgen Schmidhuber

@SchmidhuberAI

Introduced basics of: P & T in ChatGPT, very deep learning, meta learning, neural distillation, GANs, etc. Co-authored most-cited AI paper of 20th century

Chelsea Finn

@chelseabfinn

Asst Prof of CS & EE @Stanford Co-founder of Physical Intelligence @physical_int PhD from @Berkeley_EECS, EECS BS from @MIT

Sergey Levine

@svlevine

Associate Professor at UC Berkeley Co-founder, Physical Intelligence

Dileep George

@dileeplearning

6 days ago

@dubova_marina @cogsci_soc Congrats!

dileeplearning retweeted

Becca J. Carlson @beccajcarlson

7 days ago

AI has transformed how we design therapeutics. But targeted delivery is still an expensive guessing game. Today @BobbyHollings and I are launching @deliverome with @beckypferdehirt and @radialscience at @AsteraInstitute, to fix that. 🧵

beccajcarlson's tweet photo. AI has transformed how we design therapeutics.
But targeted delivery is still an expensive guessing game.
Today @BobbyHollings and I are launching @deliverome with @beckypferdehirt and @radialscience at @AsteraInstitute, to fix that. 🧵 https://t.co/t7ouIUXVDG

220

29K

dileeplearning retweeted

Niko McCarty.

@NikoMcCarty

8 days ago

New Blog: What's the point of theory in biology, especially in the age of machine learning? I just published a series of letters by @NoahOlsman that start to get at this question, especially in the context of virtual cells: https://t.co/tcJLPgPs94

NikoMcCarty's tweet photo. New Blog: What's the point of theory in biology, especially in the age of machine learning?

I just published a series of letters by @NoahOlsman that start to get at this question, especially in the context of virtual cells: https://t.co/tcJLPgPs94 https://t.co/lAyQDxCYRF

191

123

14K

Dileep George

@dileeplearning

9 days ago

@vishalmisra ouch! that's going to hurt their feelings.

194

Dileep George

@dileeplearning

9 days ago

God works in mysterious ways. LLMs work in mysterious ways. Therefore LLMs are Gods 😇

Dileep George

@dileeplearning

9 days ago

@credenzaclear2 It's the lure of language.... https://t.co/vjo0RS37SR

446

Dileep George

@dileeplearning

13 days ago

@LucaAmb @TaliaRinger it'd be self consistent nonsense though.

Dileep George

@dileeplearning

13 days ago

@GaryMarcus @polynoamial For verifiable problems reducing the search space is in itself a great achievement. Obviously the false alarm rate was low enough for this result to be achieved practically.

473

Dileep George

@dileeplearning

13 days ago

@BradKHulse thanks :)

Dileep George

@dileeplearning

13 days ago

@BradKHulse interesting...have you checked whether the clones can represent the number of turns? ie. if you turn around 360 degrees once vs twice, does the activity among the clones change?

dileeplearning retweeted

Sebastien Bubeck

@SebastienBubeck

14 days ago

@kareem_carr There was 0 human involvement. The prompt is in the report. The final answer by the model is in the report. And we have a (gpt-rewritten) CoT that we released.

671

194K

Dileep George

@dileeplearning

16 days ago

@carlkolon No it isn’t. In fact evolution is an algorithm that seems to be bitter-lesson-pilled. I don’t think the bitter lesson essay addresses the question of data efficiency on the learning side and compute efficiency on the planning/inference side.

Dileep George

@dileeplearning

16 days ago

Here's a better lesson, don't fall for bitter lesson.

Richard Sutton

@RichardSSutton

16 days ago

The bitter lesson in 26 words: Don’t be distracted by human knowledge, as AI has been historically. Instead focus on methods for creating knowledge that scale with computation, like search and learning.

136

973

570K

239

113

35K