borrrrrrrrrris

@bbabenko

now @ :) ml swe @googleai. ex @orbital_insight @dropbox. phd @ucsandiego. i make songs for kids @pancaketrucks. views my own.

Mountain View, CA

Joined August 2012

731 Following

1.9K Followers

3K Posts

Pinned Tweet

borrrrrrrrrris @bbabenko

11 months ago

views are my own. though for a very reasonable price, they can be yours as well!

448

borrrrrrrrrris @bbabenko

about 23 hours ago

what does it matter? schmidhuber invented all this shit in 1967 anyway

George

@georgejrjrjr

2 days ago

When I say @_arohan_ vs. keller disagreement is deeply laudable (and edifying), here’s what i mean: Keller has been leading a righteous crusade against optimizers that claim to improve on sota, but turn out to be fake —compared to untuned or detuned baselines— with his speedrun project. Meanwhile, Rohan has been quietly pointing out that —ironically— Muon is just such an optimizer. ie, it was a small variation on his preexisting SOTA optimizer (Shampoo) long in use at Google, winner of AlgoPerf, etc. that only looked inferior to Keller’s Muon because it was untuned. But didn’t (perhaps due to his BigLab employment, couldn’t) conclusively set the record straight on the speedrun…until yesterday. Keller is now being hoisted by his own petard: the exemplary standards he loudly, angrily set about optimizers that incorrectly claim new SOTA, enforced by the de facto industry standard for first-pass optimizer evaluation he virtuously created. And you’ll notice he’s handling this with —maybe not perfect poise, there’s been a bit of cope— but taking the lesson with more humility and grace than most would.

111

15K

borrrrrrrrrris @bbabenko

5 days ago

absolute banger https://t.co/dhDswHq50c

borrrrrrrrrris @bbabenko

15 days ago

wait, so in bowling a strike is when you hit all the things, but in baseball a strike is when you *don't* hit the thing?

Who to follow

Brandon Amos

@brandondamos

🧙 RL @Reflection_AI past: @MetaAi @GoogleDeepmind @SCSatCMU @Cornell_Tech

Stanford Vision and Learning Lab

@StanfordSVL

SVL is led by @drfeifei @silviocinguetta @jcniebles @jiajunwu_cs and works on machine learning, computer vision, robotics and language

Andrei Bursuc @CVPR

@abursuc

Research scientist @valeoai | Teaching @Polytechnique @ENS_ULM | Alumni @upb1818 @Mines_Paris @Inria @ENS_ULM | Feedback: https://t.co/MHAm0ClYvh

borrrrrrrrrris @bbabenko

19 days ago

in Canada they take regular ham and just call it bacon... if that's not socialism, I don't know what is

bbabenko retweeted

Alec Stapp

@AlecStapp

about 1 month ago

Good questions for AOC:

112

130K

borrrrrrrrrris @bbabenko

about 1 month ago

what a completely insane take why a billion specifically? where does one draw the line? and who gets to draw it? if you're concerned about people breaking laws, focus on that. if you're concerned the laws don't prevent certain abuse, focus on that.

Marco Foster

@MarcoFoster_

about 1 month ago

AOC: “There’s a certain level of wealth and accumulation that is unearned. You can’t earn a billion dollars. You just can’t earn that. You can get market power, you can break rules, you can abuse labor laws, you can pay people less than what they’re worth, but you can’t earn that”

24K

bbabenko retweeted

Ramez Naam

@ramez

about 1 month ago

Everyone who cares about climate should understand this. Texas, with no pro-climate policies, has blown passed California in clean energy. In large part because Texas has less red tape and makes it easier to build.

313

18K

bbabenko retweeted

Armstrong and Getty

@AandGShow

about 1 month ago

220

17K

562

513K

bbabenko retweeted

Tanay Padhi

@tanaypadhi

about 2 months ago

how did Allbirds pivot to AI compute hardware before the shoe company literally called ASICS

483

303

351K

borrrrrrrrrris @bbabenko

about 2 months ago

the new @tigercub record is out of this world! it's like if rivers cuomo, kurt cobain and matt bellamy all made love to a fuzz pedal and this record was the baby.

381

bbabenko retweeted

Ben Golub

@ben_golub

2 months ago

106

158

133K

bbabenko retweeted

NASA Artemis

@NASAArtemis

2 months ago

Earthset. The Artemis II crew captured this view of an Earthset on April 6, 2026, as they flew around the Moon. The image is reminiscent of the iconic Earthrise image taken by astronaut Bill Anders 58 years earlier as the Apollo 8 crew flew around the Moon.

NASAArtemis's tweet photo. Earthset.

The Artemis II crew captured this view of an Earthset on April 6, 2026, as they flew around the Moon. The image is reminiscent of the iconic Earthrise image taken by astronaut Bill Anders 58 years earlier as the Apollo 8 crew flew around the Moon. https://t.co/ag72r97wzb

983

117K

27K

borrrrrrrrrris @bbabenko

2 months ago

ok hear me out.... we get a few bulldozers and:

bbabenko retweeted

Om Patel

@om_patel5

2 months ago

I taught Claude to talk like a caveman to use 75% less tokens. normal claude: ~180 tokens for a web search task caveman claude: ~45 tokens for the same task "I executed the web search tool" = 8 tokens caveman version: "Tool work" = 2 tokens every single grunt swap saves 6-10 tokens. across a FULL task that's 50-100 tokens saved why does it work? caveman claude doesn't explain itself. it does its task first. gives the result. then stops. no "I'd be happy to help you with that." no "Let me search the web for you" no more unnecessary filler words "result. done. me stop." 50-75% burn reduction with usage limits getting tighter every week this might be the most practical hack out there right now

om_patel5's tweet photo. I taught Claude to talk like a caveman to use 75% less tokens.

normal claude: ~180 tokens for a web search task

caveman claude: ~45 tokens for the same task

"I executed the web search tool" = 8 tokens
caveman version: "Tool work" = 2 tokens

every single grunt swap saves 6-10 tokens. across a FULL task that's 50-100 tokens saved

why does it work? caveman claude doesn't explain itself. it does its task first. gives the result. then stops.

no "I'd be happy to help you with that." no "Let me search the web for you" no more unnecessary filler words

"result. done. me stop."

50-75% burn reduction

with usage limits getting tighter every week this might be the most practical hack out there right now

946

23K

10K

bbabenko retweeted

NASA

@NASA

2 months ago

We see our home planet as a whole, lit up in spectacular blues and browns. A green aurora even lights up the atmosphere. That's us, together, watching as our astronauts make their journey to the Moon.

NASA's tweet photo. We see our home planet as a whole, lit up in spectacular blues and browns. A green aurora even lights up the atmosphere. That's us, together, watching as our astronauts make their journey to the Moon. https://t.co/6JkKufBgtJ

312K

66K

22K

77M

bbabenko retweeted

Spaceballs The X Account

@Grunt2A

2 months ago

Now that Artemis II has launched we have 10 days to get everyone on Earth a Planet of the Apes costume so we can do something hilarious when the astronauts return 😁

Grunt2A's tweet photo. Now that Artemis II has launched we have 10 days to get everyone on Earth a Planet of the Apes costume so we can do something hilarious when the astronauts return 😁 https://t.co/64jbCUnRkz

92K

18K

bbabenko retweeted

Will Smith @WillSmithVision

2 months ago

Excited to announce our latest (submitted to) SIGBOVIK 2026 @sigbovik paper: "SchmidhubAI: Accurate Historical Paper Attribution". We built an AI system that, given any modern AI paper, automatically determines which of its ideas were already published by Jürgen Schmidhuber.

136

61K

borrrrrrrrrris @bbabenko

2 months ago

WHO INVENTED NUMBERS THOUGH?

Jürgen Schmidhuber

@SchmidhuberAI

2 months ago

Dr. LeCun's heavily promoted Joint Embedding Predictive Architecture (JEPA, 2022) [5] is the heart of his new company. However, the core ideas are not original to LeCun. Instead, JEPA is essentially identical to our 1992 Predictability Maximization system (PMAX) [1][14]. Details in reference [19] which contains many additional references. Motivation of PMAX [1][14]. Since details of inputs are often unpredictable from related inputs, two non-generative artificial neural networks interact as follows: one net tries to create a non-trivial, informative, latent representation of its own input that is predictable from the latent representation of the other net’s input. PMAX [1][14] is actually a whole family of methods. Consider the simplest instance in Sec. 2.2 of [1]: an auto encoder net sees an input and represents it in its hidden units (its latent space). The other net sees a different but related input and learns to predict (from its own latent space) the auto encoder's latent representation, which in turn tries to become more predictable, without giving up too much information about its own input, to prevent what's now called “collapse." See illustration 5.2 in Sec. 5.5 of [14] on the "extraction of predictable concepts." The 1992 PMAX paper [1] discusses not only auto encoders but also other techniques for encoding data. The experiments were conducted by my student Daniel Prelinger. The non-generative PMAX outperformed the generative IMAX [2] on a stereo vision task. The 2020 BYOL [10] is also closely related to PMAX. In 2026, @misovalko, leader of the BYOL team, praised PMAX, and listed numerous similarities to much later work [19]. Note that the self-created “predictable classifications” in the title of [1] (and the so-called “outputs” of the entire system [1]) are typically INTERNAL "distributed representations” (like in the title of Sec. 4.2 of [1]). The 1992 PMAX paper [1] considers both symmetric and asymmetric nets. In the symmetric case, both nets are constrained to emit "equal (and therefore mutually predictable)" representations [1]. Sec. 4.2 on “finding predictable distributed representations” has an experiment with 2 weight-sharing auto encoders which learn to represent in their latent space what their inputs have in common (see the cover image of this post). Of course, back then compute was was a million times more expensive, but the fundamental insights of "JEPA" were present, and LeCun has simply repackaged old ideas without citing them [5,6,19]. This is hardly the first time LeCun (or others writing about him) have exaggerated LeCun's own significance by downplaying earlier work. He did NOT "co-invent deep learning" (as some know-nothing "AI influencers" have claimed) [11,13], and he did NOT invent convolutional neural nets (CNNs) [12,6,13], NOR was he even the first to combine CNNs with backpropagation [12,13]. While he got awards for the inventions of other researchers whom he did not cite [6], he did not invent ANY of the key algorithms that underpin modern AI [5,6,19]. LeCun's recent pitch: 1. LLMs such as ChatGPT are insufficient for AGI (which has been obvious to experts in AI & decision making, and is something he once derided @GaryMarcus for pointing out [17]). 2. Neural AIs need what I baptized a neural "world model" in 1990 [8][15] (earlier, less general neural nets of this kind, such as those by Paul Werbos (1987) and others [8], weren't called "world models," although the basic concept itself is ancient [8]). 3. The world model should learn to predict (in non-generative "JEPA" fashion [5]) higher-level predictable abstractions instead of raw pixels: that's the essence of our 1992 PMAX [1][14]. Astonishingly, PMAX or "JEPA" seems to be the unique selling proposition of LeCun's 2026 company on world model-based AI in the physical world, which is apparently based on what we published over 3 decades ago [1,5,6,7,8,13,14], and modeled after our 2014 company on world model-based AGI in the physical world [8]. In short, little if anything in JEPA is new [19]. But then the fact that LeCun would repackage old ideas and present them as his own clearly isn't new either [5,6,18,19]. FOOTNOTES 1. Note that PMAX is NOT the 1991 adversarial Predictability MINimization (PMIN) [3,4]. However, PMAX may use PMIN as a submodule to create informative latent representations [1](Sec. 2.4), and to prevent what's now called “collapse." See the illustration on page 9 of [1]. 2. Note that the 1991 PMIN [3] also predicts parts of latent space from other parts. However, PMIN's goal is to REMOVE mutual predictability, to obtain maximally disentangled latent representations called factorial codes. PMIN by itself may use the auto encoder principle in addition to its latent space predictor [3]. 3. Neither PMAX nor PMIN was my first non-generative method for predicting latent space, which was published in 1991 in the context of neural net distillation [9]. See also [5-8]. 4. While the cognoscenti agree that LLMs are insufficient for AGI, JEPA is so, too. We should know: we have had it for over 3 decades under the name PMAX! Additional techniques are required to achieve AGI, e.g., meta learning, artificial curiosity and creativity, efficient planning with world models, and others [16]. REFERENCES (easy to find on the web): [1] J. Schmidhuber (JS) & D. Prelinger (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. Based on TR CU-CS-626-92 (1992): https://t.co/wJFbdPhwdi [2] S. Becker, G. E. Hinton (1989). Spatial coherence as an internal teacher for a neural network. TR CRG-TR-89-7, Dept. of CS, U. Toronto. [3] JS (1992). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879. Based on TR CU-CS-565-91, 1991. [4] JS, M. Eldracher, B. Foltin (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786. [5] JS (2022-23). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015. [6] JS (2023-25). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23. [7] JS (2026). Simple but powerful ways of using world models and their latent space. Opening keynote for the World Modeling Workshop, 4-6 Feb, 2026, Mila - Quebec AI Institute. [8] JS (2026). The Neural World Model Boom. Technical Note IDSIA-2-26. [9] JS (1991). Neural sequence chunkers. TR FKI-148-91, TUM, April 1991. (See also Technical Note IDSIA-12-25: who invented knowledge distillation with artificial neural networks?) [10] J. Grill et al (2020). Bootstrap your own latent: A "new" approach to self-supervised Learning. arXiv:2006.07733 [11] JS (2025). Who invented deep learning? Technical Note IDSIA-16-25. [12] JS (2025). Who invented convolutional neural networks? Technical Note IDSIA-17-25. [13] JS (2022-25). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, arXiv:2212.11279 [14] JS (1993). Network architectures, objective functions, and chain rule. Habilitation Thesis, TUM. See Sec. 5.5 on "Vorhersagbarkeitsmaximierung" (Predictability Maximization). [15] JS (1990). Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM. [16] JS (1990-2026). AI Blog. [17] @GaryMarcus. Open letter responding to @ylecun. A memo for future intellectual historians. Substack, June 2024. [18] G. Marcus. The False Glorification of @ylecun. Don’t believe everything you read. Substack, Nov 2025. [19] J. Schmidhuber. Who invented JEPA? Technical Note IDSIA-3-22, IDSIA, Switzerland, March 2026. https://t.co/fDauPE6T2N

SchmidhuberAI's tweet photo. Dr. LeCun's heavily promoted Joint Embedding Predictive Architecture (JEPA, 2022) [5] is the heart of his new company. However, the core ideas are not original to LeCun. Instead, JEPA is essentially identical to our 1992 Predictability Maximization system (PMAX) [1][14].

Details in reference [19] which contains many additional references.

Motivation of PMAX [1][14]. Since details of inputs are often unpredictable from related inputs, two non-generative artificial neural networks interact as follows: one net tries to create a non-trivial, informative, latent representation of its own input that is predictable from the latent representation of the other net’s input.

PMAX [1][14] is actually a whole family of methods. Consider the simplest instance in Sec. 2.2 of [1]: an auto encoder net sees an input and represents it in its hidden units (its latent space). The other net sees a different but related input and learns to predict (from its own latent space) the auto encoder's latent representation, which in turn tries to become more predictable, without giving up too much information about its own input, to prevent what's now called “collapse." See illustration 5.2 in Sec. 5.5 of [14] on the "extraction of predictable concepts."

The 1992 PMAX paper [1] discusses not only auto encoders but also other techniques for encoding data. The experiments were conducted by my student Daniel Prelinger. The non-generative PMAX outperformed the generative IMAX [2] on a stereo vision task.

The 2020 BYOL [10] is also closely related to PMAX. In 2026, @misovalko, leader of the BYOL team, praised PMAX, and listed numerous similarities to much later work [19].

Note that the self-created “predictable classifications” in the title of [1] (and the so-called “outputs” of the entire system [1]) are typically INTERNAL "distributed representations” (like in the title of Sec. 4.2 of [1]).

The 1992 PMAX paper [1] considers both symmetric and asymmetric nets. In the symmetric case, both nets are constrained to emit "equal (and therefore mutually predictable)" representations [1]. Sec. 4.2 on “finding predictable distributed representations” has an experiment with 2 weight-sharing auto encoders which learn to represent in their latent space what their inputs have in common (see the cover image of this post).

Of course, back then compute was was a million times more expensive, but the fundamental insights of "JEPA" were present, and LeCun has simply repackaged old ideas without citing them [5,6,19].

This is hardly the first time LeCun (or others writing about him) have exaggerated LeCun's own significance by downplaying earlier work. He did NOT "co-invent deep learning" (as some know-nothing "AI influencers" have claimed) [11,13], and he did NOT invent convolutional neural nets (CNNs) [12,6,13], NOR was he even the first to combine CNNs with backpropagation [12,13]. While he got awards for the inventions of other researchers whom he did not cite [6], he did not invent ANY of the key algorithms that underpin modern AI [5,6,19].

LeCun's recent pitch: 1. LLMs such as ChatGPT are insufficient for AGI (which has been obvious to experts in AI & decision making, and is something he once derided @GaryMarcus for pointing out [17]). 2. Neural AIs need what I baptized a neural "world model" in 1990 [8][15] (earlier, less general neural nets of this kind, such as those by Paul Werbos (1987) and others [8], weren't called "world models," although the basic concept itself is ancient [8]). 3. The world model should learn to predict (in non-generative "JEPA" fashion [5]) higher-level predictable abstractions instead of raw pixels: that's the essence of our 1992 PMAX [1][14].

Astonishingly, PMAX or "JEPA" seems to be the unique selling proposition of LeCun's 2026 company on world model-based AI in the physical world, which is apparently based on what we published over 3 decades ago [1,5,6,7,8,13,14], and modeled after our 2014 company on world model-based AGI in the physical world [8].

In short, little if anything in JEPA is new [19]. But then the fact that LeCun would repackage old ideas and present them as his own clearly isn't new either [5,6,18,19].

FOOTNOTES

1. Note that PMAX is NOT the 1991 adversarial Predictability MINimization (PMIN) [3,4]. However, PMAX may use PMIN as a submodule to create informative latent representations [1](Sec. 2.4), and to prevent what's now called “collapse." See the illustration on page 9 of [1].

2. Note that the 1991 PMIN [3] also predicts parts of latent space from other parts. However, PMIN's goal is to REMOVE mutual predictability, to obtain maximally disentangled latent representations called factorial codes. PMIN by itself may use the auto encoder principle in addition to its latent space predictor [3].

3. Neither PMAX nor PMIN was my first non-generative method for predicting latent space, which was published in 1991 in the context of neural net distillation [9]. See also [5-8].

4. While the cognoscenti agree that LLMs are insufficient for AGI, JEPA is so, too. We should know: we have had it for over 3 decades under the name PMAX! Additional techniques are required to achieve AGI, e.g., meta learning, artificial curiosity and creativity, efficient planning with world models, and others [16].

REFERENCES (easy to find on the web):

[1] J. Schmidhuber (JS) & D. Prelinger (1993). Discovering predictable classifications. Neural Computation, 5(4):625-635. Based on TR CU-CS-626-92 (1992): https://t.co/wJFbdPhwdi
[2] S. Becker, G. E. Hinton (1989). Spatial coherence as an internal teacher for a neural network. TR CRG-TR-89-7, Dept. of CS, U. Toronto.
[3] JS (1992). Learning factorial codes by predictability minimization. Neural Computation, 4(6):863-879. Based on TR CU-CS-565-91, 1991.
[4] JS, M. Eldracher, B. Foltin (1996). Semilinear predictability minimization produces well-known feature detectors. Neural Computation, 8(4):773-786.
[5] JS (2022-23). LeCun's 2022 paper on autonomous machine intelligence rehashes but does not cite essential work of 1990-2015.
[6] JS (2023-25). How 3 Turing awardees republished key methods and ideas whose creators they failed to credit. Technical Report IDSIA-23-23.
[7] JS (2026). Simple but powerful ways of using world models and their latent space. Opening keynote for the World Modeling Workshop, 4-6 Feb, 2026, Mila - Quebec AI Institute.
[8] JS (2026). The Neural World Model Boom. Technical Note IDSIA-2-26.
[9] JS (1991). Neural sequence chunkers. TR FKI-148-91, TUM, April 1991. (See also Technical Note IDSIA-12-25: who invented knowledge distillation with artificial neural networks?)
[10] J. Grill et al (2020). Bootstrap your own latent: A "new" approach to self-supervised Learning. arXiv:2006.07733
[11] JS (2025). Who invented deep learning? Technical Note IDSIA-16-25.
[12] JS (2025). Who invented convolutional neural networks? Technical Note IDSIA-17-25.
[13] JS (2022-25). Annotated History of Modern AI and Deep Learning. Technical Report IDSIA-22-22, arXiv:2212.11279
[14] JS (1993). Network architectures, objective functions, and chain rule. Habilitation Thesis, TUM. See Sec. 5.5 on "Vorhersagbarkeitsmaximierung" (Predictability Maximization).
[15] JS (1990). Making the world differentiable: On using fully recurrent self-supervised neural networks for dynamic reinforcement learning and planning in non-stationary environments. Technical Report FKI-126-90, TUM.
[16] JS (1990-2026). AI Blog.
[17] @GaryMarcus. Open letter responding to @ylecun. A memo for future intellectual historians. Substack, June 2024.
[18] G. Marcus. The False Glorification of @ylecun. Don’t believe everything you read. Substack, Nov 2025.
[19] J. Schmidhuber. Who invented JEPA? Technical Note IDSIA-3-22, IDSIA, Switzerland, March 2026. https://t.co/fDauPE6T2N

189

686K

651

borrrrrrrrrris @bbabenko

3 months ago

way too harsh IMO. I definitely thought they leaned a too much into the comedy side of things, but overall it was a breath of fresh air as a scifi flick. no CGI slop (Rocky was a practical effect), a story about problem solving rather than pure "surviving a dystopia", etc.

roon

@tszzl

3 months ago

project hail mary was unfortunately a middling adaptation of a good book. the script has the unfortunate affect of “language model populism” - where every single line has to be some sort of punched up comedic zinger yet still unremarkable. visuals were uninspired and trite and more or less identical to other space movies. everything good about the film comes from the wonderful world scaffolding of the book and the hard science fiction of it all that lets you suspend disbelief on the alien rocky the movie doesn’t really try to get into the xenolinguistic stuff even at the depth the book tries (someone called it “arrival for idiots” which unfortunately hit ) the thing that elevated the book is the commitment to a hard science fiction engineeringporn fiction at a level nobody else is able to write. the direction of the movie doesn’t really convey the same feeling successfully, and you’re left with flat characters, an alien that is more human than several humans i know, and a marvel populism gosling and the german woman are great as actors, but this movie will not be remembered in a year. it is disappointing to see people do so little with a quarter billion, insane acting talent, and incredible source IP

232

279

403K

159

bbabenko retweeted

Jürgen Schmidhuber

@SchmidhuberAI

6 months ago

LeCun’s new company on physical AI with world models [9] looks a lot like our 2014 company on physical AI with world models [1] 😀 See also [2-8] - all references in the reply!

SchmidhuberAI's tweet photo. LeCun’s new company on physical AI with world models [9] looks a lot like our 2014 company on physical AI with world models [1] 😀 See also [2-8] - all references in the reply! https://t.co/UIZAAhPlKq

378

191

112K

borrrrrrrrrris

@bbabenko

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users