Stefano Soatto

@soatto4

UCLA CS Prof; AWS VP, Agentic AI. Views solely my own.

California

Joined April 2019

381 Following

146 Followers

7 Posts

Stefano Soatto

@soatto4

3 months ago

@pmddomingos Indeed - in the latter case, captured by Hilberg’s Law — Occam’s pertains to induction. LLMs operate transductively: https://t.co/THRewdnHy5

Stefano Soatto

@soatto4

3 months ago

@getjonwithit This is all you need to see it: the connection of speed with algorithmic information (KC of the trained model), and then the scaling of speed (Hilberg’s Law)

soatto4's tweet photo. @getjonwithit This is all you need to see it: the connection of speed with algorithmic information (KC of the trained model), and then the scaling of speed (Hilberg’s Law) https://t.co/DPURk9fjZE

Stefano Soatto

@soatto4

3 months ago

It’s an evocative analogy, but unfortunately it doesn’t work: the prompt is not the “program”: it merely states the task, doesn’t provide instructions to solve it. The algorithmic complexity of datum is the description length of the program that generates it, not of the datum itself. When viewing an LLM as a (stochastic) universal computer, which it is, the role played by the “program length” in a deterministic computer (such as a TM) is played by “proper time”, i.e. the ratio of the length of the chain-of-thought by the probability assigned by the model to that chain-of-thought. Once you properly framed, (which has been done in https://t.co/h3xLPeXfEs) the conclusion is precisely the opposite: algorithmic complexity of human generated data is not only extremely large but growing unbounded, in accordance with Hilberg’s Law.

Stefano Soatto

@soatto4

3 months ago

@getjonwithit

Who to follow

Heng Yang

@hankyang94

Assistant Professor @Harvard SEAS @hseas, Lead the Harvard Computational Robotics Lab. #Robotics, #Optimization, #Control, #Vision, #Learning

Zhengyi “Zen” Luo

@zhengyiluo

Research Scientist, GEAR @NvidiaAI | PhD @CMU_Robotics | Founder @CirkitDesign | CS @penn Opinions are my own.

Toni Rosinol

@RosinolToni

Co-Founder @stackai (YC W23) | PhD @MIT | Enterprise AI trusted by IT | https://t.co/wXwzYfvq4S | e/acc

Stefano Soatto

@soatto4

3 months ago

Congratulations to Charles Bennett for his Turing Prize! https://t.co/UB30V1zDsr His work on logical depth of a datum was the inspiration for the notion of “conceptual depth” of a trained LLM described in the work of AI Agents as Universal Solvers, which in turn was key to identifying the inversion of scaling laws also described there: https://t.co/h3xLPeXNu0 Specifically, the logical depth of a trained datum is the time it takes for a Turing Machine to generate it from a program that is not much more complex than its Kolmogorov complexity. This makes sense for bit-strings but not for a trained generative model. The trained model contains (algorithmic) information, so it could be thought of as “data”, but how long it takes a Turing Machine to generate it is irrelevant. What matters is how long a model takes to generate a token-stream that solves a task. That is proper time. So the “conceptual depth” of a trained model is defined as (loss + proper time), not just computation steps. (Curiously, proper time is vaguely related to relativistic proper time, but that’s a stretch). Once the complexity of the trained model is evaluated using conceptual depth, scaling laws witness an inversion, where more data, compute, energy come at the expense of intelligence, not with it. This is also discussed in the AI Agents as Universal Solvers paper. A trained model conflates memory and computation in the weights, so it can’t be thought as an ordinary computer with separate memory, processor, tape, etc.. The time it takes for an LLM to solve a task with chain-of-thought is stochastic and so is the outcome. Proper time captures that, and the Strands Coding Framework https://t.co/hc9Dvgh5Qb allows users to “program” AI Agents for what they are: stochastic dynamical systems that can perform universal computation.

353

Stefano Soatto

@soatto4

3 months ago

The view of LLMs as universal computers (https://t.co/xbqFcgVlBm — or read about it here https://t.co/h3xLPeXNu0) pits AI Agents against the principle of Occam’s Razor. The compression view of learning only captures statistical information: regularize in order to generalize. But AI Agents don’t generalize, and that’s their power! They memorize and reason. By doing so they achieve generality, not generalization. The governing principle is Hilberg’s Law, and the key theorem connects algorithmic information in the trained model (the bigger the better) to time (the shorter the better). This is not a bound but an equality: Without a cost of time, you don’t need to learn (in fact optimal inference, a’ la Levin/Solomonoff, involves no learning). But by imposing a cost of time you are forced to accrue algorithmic information in the trained model. And since it is an equality, that is also the *only* way to accrue algorithmic information. Learning to reason is all about time!

Stefano Soatto

@soatto4

3 months ago

Alright, let’s try this eX-twitter thing… first post.

723

Stefano Soatto

@soatto4

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users