1/ Introducing GPIC: a Giant Permissive Image Corpus and benchmark for visual generation!
🚀100M VLM-captioned image-text pairs for training
📊1M image-text pairs for benchmarking
🖼️~28 trillion pixels
🤗Centrally Hosted
✅Fully permissive for research + commercial use
Dataset, benchmark and models🧵👇
Co-led with @KyleSargentAI
For over a decade, we’ve accepted that end-to-end backprop is the only way to train deep networks. But holding the entire network in memory all at once is why AI training is hitting a resource wall.
We found a new way to break the network into blocks and train them independently. The trick? Treating the network’s forward pass like a diffusion model denoising a signal.
This reinterpretation slashes the memory needed to train deep models. In our #ICLR2026 paper (https://t.co/PK5h0mqQSo), we matched end-to-end performance across ViTs, DiTs, and LLMs. We did this while training just one isolated block at a time.
I’ll be at CVPR next week! Looking forward to catching up and chatting with everyone. Our paper will be presented in the first poster session, come say hi!
Excited to share our CVPR 2026 paper: Physical Simulator In-the-Loop Video Generation!
We introduce PSIVG, which brings a physical simulator into text-to-video inference, improving physical consistency in generated videos.
Project Page: https://t.co/cbL0HTNGEG
The Polish theoretical physicist who proved that you can recreate all mathematical functions from just this one operation is Andrzej Odrzywołek from Jagiellonian University @JagiellonskiUni. Truly a remarkable piece of work! Congrats @AndrzOdrz
https://t.co/vnWJB4JCmf
#CVPR2026 What happens when a video generation model gives you a video that looks good at first glance, but breaks physics?
Objects disappear. Motion trajectories drift. Textures flicker. Collisions look wrong.
This is still a common failure mode in modern video generation (particularly open source ones): strong visual quality, but weak physical consistency.
So instead of re-prompting the model again and again, or hoping a different random seed magically fixes the result, what if we could correct these failures with explicit physical prior?
We introduce PSIVG: Physical Simulator In-the-Loop Video Generation.
PSIVG is a training-free, inference-time framework that starts from a template video, reconstructs the 4D scene and foreground object meshes, runs a 3D physical simulator to produce physically plausible trajectories, and feeds that guidance back into the video generator to steer it toward more temporally coherent, physically grounded motion.
Instead of asking the model to “guess” physics better, PSIVG brings physics directly into the generation loop.
project page: https://t.co/HYO3fFcn9U
paper: https://t.co/BC9KDRFVHn
code: https://t.co/0UJkB7Lp4e
This is the coolest thing I seen about LLMs in long time.
This guy is trying to train a model with only data from 1800s creating an llm that behaves like that society... So damn cool
https://t.co/NprF9DDRtV
I took delivery of a beautiful new shiny HW4 Tesla Model X today, so I immediately took it out for an FSD test drive, a bit like I used to do almost daily for 5 years. Basically... I'm amazed - it drives really, really well, smooth, confident, noticeably better than what I'm used to on HW3 (my previous car) and eons ahead of the version I remember driving up highway 280 on my first day at Tesla ~9 years ago, where I had to intervene every time the road mildly curved or sloped. (note this is v13, my car hasn't been offered the latest v14 yet)
On the highway, I felt like a passenger in some super high tech Maglev train pod - the car is locked in the center of the lane while I'm looking out from Model X's higher vantage point and its panoramic front window, listening to the (incredible) sound system, or chatting with Grok. On city streets, the car casually handled a number of tricky scenarios that I remember losing sleep over just a few years ago. It negotiated incoming cars in tight lanes, it gracefully went around construction and temporarily in-lane stationary cars, it correctly timed tricky left turns with incoming traffic from both sides, it gracefully gave way to the car that went out of order in the 4-way stop sign, it found a way to squeeze into a bumper to bumper traffic to make its turn, it overtook the bus that was loading passengers but still stopped for the stop sign that was blocked by the bus, and at the end of the route it circled around a parking lot, found a spot and... parked. Basically a flawless drive.
For context, I'm used to going out for a brief test drive around the neighborhood to return with 20 clips of things that could be improved. It's new for me to do just that and exactly like I used to, but come back with nothing. Perfect drive, no notes. I expect there's still more work for the team in the long march of 9s, but it's just so cool to see that we're beyond finding issues on any individual ~1 hour drive around the neighborhood, you actually have to go to the fleet and mine them. Back then, I processed the incredible promise of vehicle autonomy at scale (in the fully scaleable, vision only, end-to-end Tesla way) only intellectually, but now it is possible to feel it intuitively too if you just go out for a drive. Wait, of course surround video stream at 60Hz processed by a fully dedicated "driving brain" neural net will work, and it will be so much better and safer than a human driver. Did anyone else think otherwise?
I also watched @aelluswamy 's new ICCV25 talk last week (https://t.co/RdaM23kvez) that hints at some of the recent under the hood technical components driving this progress. Sensor streams (videos, maps, kinematics, audio, ...) over long contexts (e.g. ~30 seconds) go into a big neural net, steering/acceleration comes out, optionally with visualization auxiliary data. This is the dream of the complete Software 1.0 -> Software 2.0 re-write that scales fully with data streaming from millions of cars in the fleet and the compute capacity of your chip, not some engineer's clever new DoubleParkedCarHandler C++ abstraction with undefined test-time characteristics of memory and runtime. There's a lot more hints in the video on where things are going with the emerging "robotics+AI at scale stack". World reconstructors, world simulators "dreaming" dynamics, RL, all of these components general, foundational, neural net based, how the car is really just one kind of robot... are people getting this yet?
Huge congrats to the team - you're building magic objects of the future, you rock! And I love my car <3.
We got a call from @xai 24 hours ago
“We want to test Grok 4 on ARC-AGI”
We heard the rumors. We knew it would be good. We didn’t know it would become the #1 public model on ARC-AGI
Here’s the testing story and what the results mean:
Yesterday, we chatted with Jimmy from the xAI team, who wanted us to validate their Grok 4 score. They did their own testing on the ARC-AGI-1 & 2 public evaluation set
To validate their score (and measure possible overfitting), we self-tested the new model on our semi-private evaluation set
We walked them through our testing policy:
* No data retention
* Model checkpoint must be intended for public use
* Temporary increase in rate limits for burst testing
They were on board, so we got started
Initially, we ran into timeout errors with normal requests, so we switched to streaming. That resolved the issue
So, what do these results mean?
First, the facts: Grok 4 is now the top-performing publicly available model on ARC-AGI. This even outperforms purpose-built solutions submitted on Kaggle.
Second, ARC-AGI-2 is hard for current AI models. To score well, models have to learn a mini-skill from a series of training examples, then demonstrate that skill at test time.
The previous top score was ~8% (by Opus 4). Below 10% is noisy
Getting 15.9% breaks through that noise barrier, Grok 4 is showing non-zero levels of fluid intelligence
But the mission isn’t over. We need new ideas to solve ARC-AGI-2. Scale alone won’t get us there
Come work on ARC-AGI with us
The Australian Open doesn’t have full
broadcast rights for all matches.
So, its YouTube livestream uses AI to generate Nintendo Wii Tennis cartoon avatars that mimics the action on a 2-minute delay.
As a result, this animated clip of Daniel Medvedev smashing his tennis racket on the net might be the greatest generative AI output to date.
I’m thrilled to share that I’ll be presenting two posters at #CVPR2024: “Action Detection via an Image Diffusion Process”, “LLMs are Good Sign Language Translators”!
I’ll be presenting during Thursday’s PM session. Poster IDs are: 362, 363. Please drop by if you’re attending!
1/2 Go ahead. Dance the night away! 💃Create your 3D animations from videos or text and use them in Fortnite with our new UEFN editor plugin 🎮 Be your own hero. Get started on GitHub https://t.co/0inXhK2XQM and create your motion on https://t.co/P4DXaGFk8c #UEFN#Fortnite#SMPL
It is with great sadness that the Simons Foundation announces the death of its co-founder and chair emeritus, James Harris Simons. Jim was an award-winning mathematician, a legendary investor and a generous philanthropist. https://t.co/w48DhauUVj
apparently Google laid off their entire Python Foundations team, WTF!
( @SkyLi0n who is one of the pybind11 maintainers just informed me, asking what ways they can re-fund pybind11)
The team seems to have done substantial work that seems critical for Google internally as well.
There's a hackernews thread if folks want to read more: https://t.co/iz6uVNk4Q9
Back in 2015, @drew_jaegle, Javier Romero and I tried regressing 3D human pose and shape from images and it didn't work well so we never published it. At the time, I thought CNNs couldn't regress rotations. Then in 2018 @akanazawa proposed #HMR, which iteratively regresses #SMPL pose using an axis-angle representation. This was the first direct regressor of SMPL. Then the influential paper by Zhou et al (https://t.co/csaQ4WYTLQ) proposed using a continuous 6D representation of rotations that everyone uses today. With this, you don't need HMR's iteration trick and neural networks have no trouble regressing 3D human pose. Representations of rotations matter.
Just finished this book - Bad Therapy by @AbigailShrier
This is one of the most eye-opening books I've ever read. It's a must read for any parent, any teacher, and should be required reading for any school administrator as well.
The book dives into trying to figure out why kids are having so many mental health problems, when there are so many resources devoted to improving mental health outcomes.
Anxiety, depression, suicide, etc are all higher than they've ever been with kids, even though their lives are arguably better than ever before. It just doesn't make sense.
A few key takeaways from the book:
A constant attention on how kids are "feeling" or "thinking" is causing negative outcomes.
Constantly ruminating on your emotions and how you feel negatively impacts your mental health. If all you do is focus on your emotions, you are destined to be anxious or depressed.
We incessantly ask kids how they're feeling, if they're happy, how their mental health is, etc, and this is creating kids who think they're fragile instead of resilient.
Trying to solve every problem for kids has caused a generation who can't do anything for themselves.
We (Gen X) were told to "suck it up" or "you'll live" or "rub some dirt on it" all the time. Many of us came to the conclusion this is "bad parenting" because our feelings were neglected, and we vowed not to do this to our own children.
Because of that, kids immediately over-dramatize everything that happens to them, making mountains out of molehills, and thinking the world must revolve around their emotions and feelings.
You develop confidence and strong mental health by doing things, not by thinking or via therapy.
You can't think your way out of anxiety. You don't gain confidence by analysis of your thoughts or mental health issues.
You gain confidence and eliminate anxiety by doing gradually more difficult tasks, excelling at them, and realizing you are a competent, capable person.
The non-stop attention therapy gives to these small, common emotions we all feel blows them out of proportion to their seriousness (not talking about genuine disorders here, just normal anxieties that millions of people go to therapy to try to avoid).
One of the best ways to decrease your happiness is to chase it.
Our society constantly tells kids they should be "happy" and asks them if they are.
Happiness isn't a state you should be in 24/7. That's not realistic. Joy and bliss aren't permanent states - they are fleeting.
Contentment, stillness, and being even-keeled are much better goals to aim for mentally.
The happiest, most well adjusted kids come from families with loving parents that have strict rules for the household.
This one really set off the confirmation bias in me... I feel really blessed we have 2 well adjusted middle school kids who do great in school, are very respectful and well mannered, and we barely even need to parent them.
But for years, we were very strict with them. Bedtimes, family rules, how we do things, etc. The in-laws and lots of friends thought we were totalitarian.
In reality, we just had high standards. And it's really paying off right now. I found it really interesting that strict rules equals happy kids. Makes sense, though, as kids need to know what their boundaries are.
Constantly surveying school-age kids about their mental health causes more issues than it solves.
Mental health resources is big money. Districts need to validate all the resources allocated towards mental health, and they often do that via surveys.
Asking kids non-stop questions like:
- Have you thought about self harm?
- Have you thought about suicide?
- Have you been so anxious you can't get out of bed?
Etc, etc puts into their heads the idea that themselves, or many of their peers are broken and cannot function properly in the real world.
It normalizes situations that would be incredibly rare at any other time in history.
There's a lot of other takeaways, too, but I'll stop there.
It's a fantastic book. Go pick it up and read it. This isn't an affiliate thing or a promotion thing at all. I just really enjoyed it, and it will further shape the way I parent moving forward.