DIFFUSION MATH FROM ABSOLUTE SCRATCH 🔥
Diffusion math is infamous for being very complex but in this deep dive I try to convince u that it's not
We go thru Statistical Mechanics, Stochastic Processes, DDPMs, -LogLikelihood, etc all derived by hand w my own software!
We recently obtained the highest-resolution 3D images of the human brain ever taken from outside the skull. This is the first look.
Introducing Aleph, a research lab building brain interfaces for the telepathic future. (1/n)
we trained a 20b model specialized in agentic search
along with our tech report, we release our model weights and the full data generation pipeline used for training
@Takeraparterer Yeah, 100%. This is honestly a toy model, the real world model from comma is on the original tweet I quoted. Def recommend giving it a read
I trained (overfit ~1hr) their small indoors agent with the official dataset using the same nanogpt + mlp architecture (confirmed by quickly reverse eng the published ONNX). It's really cool!
It's like a tiny FSD,
takes frame latents, wheel speeds, current action (wasd keys) and returns the next frame logits (next 'image', speeds, action) and a plan (following 16 actions)
When I read comma's paper on training a driving AI with a world model I was talking with some friends about how this could totally be domain swapped (outdoor sidewalk naviagtion, indoors, drones, etc)
Just watched the comma con videos and it’s really cool to see that complete autonomy is actually the goal
@Takeraparterer Yeah, you can totally control it and predict frames! You just need better data, I tried with this model and it outputs noise very quick
@esther_confused Hahahh “neutral” just means is going forward, you have basically every combination of awsd. You could train a hostile mode though (bracket bot), that’d be fun!
@yassineyousfi_ I didn't know that was the goal, I think it's the right path! I watched the comma con talks, it was really cool to see the entire stack explained. You guys are doing amazing work, sincerely
When I read comma's paper on training a driving AI with a world model I was talking with some friends about how this could totally be domain swapped (outdoor sidewalk naviagtion, indoors, drones, etc)
Just watched the comma con videos and it’s really cool to see that complete autonomy is actually the goal
Finally, I generally really like comma as a company. No "AI will replace drivers in 3 months", no multi M rounds; revenue and a product you can buy. It's funny but from the comma con videos you can literally learn their entire stack end to end, it's just good engineering
When I read comma's paper on training a driving AI with a world model I was talking with some friends about how this could totally be domain swapped (outdoor sidewalk naviagtion, indoors, drones, etc)
Just watched the comma con videos and it’s really cool to see that complete autonomy is actually the goal
The magic from the world model (from comma's paper) is that it's decoupled from the actual low level implementation (tho representative of the training data). Meaning that for example comma can train both a Toyota Prius and a Ram 1500 with the same world model, just a different 'vehicle model'
Doing this for other domains is ofc quite hard (driving is arguably the easiest) but what's exciting is that it successfully launched in the real world! Today you can buy a comma 4 and go self drive multiple cars all trained with the same model simulator (generally, I think they have different training distributions depending on countries and so on)
I haven't explored the 'high policy' models enough but I bet getting a drone to follow the same motion-target of the world model (under car constraints: camera at the same height, same velocity, turning speed, ...) is totally doable: you'll get a drone that behaves like a car
Eventually I think this will happen:
https://t.co/IZyDYuG6XA
We should have enough data from the Miami delivery robots to build a sidewalk navigation "world model" (avoid crashes, be careful with strollers and chihuahuas, don't fall off the curb, ...)
Which can then be mapped into a robo-agnostic motion target (trajectory), then each platform gets its own "Vehicle Model" + low level controller to realize that motion
With a lot of work (camera viewpoints, low lv locomotion, human reaction, ...) we could get a humanoid and a robodog to learn to walk around a city, similar to how they manage to drive both a Ram 1500 and a Toyota Prius
We should have enough data from the Miami delivery robots to build a sidewalk navigation "world model" (avoid crashes, be careful with strollers and chihuahuas, don't fall off the curb, ...)
Which can then be mapped into a robo-agnostic motion target (trajectory), then each platform gets its own "Vehicle Model" + low level controller to realize that motion
With a lot of work (camera viewpoints, low lv locomotion, human reaction, ...) we could get a humanoid and a robodog to learn to walk around a city, similar to how they manage to drive both a Ram 1500 and a Toyota Prius
@chesterzelaya although in Spanish he does say it was made by kids and he seems to be in a high school science festival, he prob just said Nvidia because that’s the only company he knows
~2y ago I re imagined user interfaces with diffusion models [🖼️]: segmentation, resizing as generation, text+selection inpainting, collaborative creation, interconnected canvases, pencil as a primitive
At the time I taped together a bunch of SD models, different controlnets and a rough agent like LLM for prompts. I spent the past weekend refactoring to use @GoogleDeepMind's nano-[🍌🍌] and it's incredibly cool!!
Technical breakdown of my favorite features: