Why aren’t Diffusion Language Model smart yet? Lacking stable post training is a major bottleneck!
Meet DiPOD: the tripod for diffusion model post-training.
DiPOD boosts accuracy across reasoning tasks, with Sudoku jumping from 22% to 97%, through a one-line code change.
🧵1/5
Neural PDE solvers have seen exciting progress! 🌊
But despite growing adoption, we still don’t know 𝘄𝗵𝗲𝗻 we should use them instead of classical solvers. 🤔
Our new paper has a surprising finding: the harder the PDE task, the more cost-effective learned solvers become. 🧵👇
We never really knew how to train nonlinear RNNs well… BPTT struggled with vanishing grads (no long-range memory) and sequential rollout (hard to parallelizable).
What if instead an oracle told us the optimal memory state m_t at each step? Then the RNN could do one-step supervised learning on (m_t, x_{t+1}) → m_{t+1} labels.
We call this Supervised Memory Training (SMT): a replacement for BPTT that trains RNNs without unrolling them. SMT is time-parallelizable and solves vanishing gradients.
Website: https://t.co/BvctWJlPad
arXiv: https://t.co/5xR0mUVymp
With UMA playground, the crazy things Feynman could only ask us to imagine become something you can see. Heat, break, and build at https://t.co/bZFDORa9Mw
Most engineers have never had sufficient access to real physics analysis — simulation was too expensive, slow, and specialized. @VinciPhysics vision isn’t just replacing Ansys; it’s bringing continuous physics reasoning to 100x more engineers and 1000x more simulations in a fraction fof the time.
Same model working out-of-the-box across all types of parts, physical scales and industry be it semiconductor or robotics.
End to end automation. 33M degrees of freedom problem solved in 20 seconds of inference.
No fine tuning, no customization.
Continuous physics intelligence.
🤯 big update to our flow map language models paper!
we believe this is the future of non-autoregressive text generation.
read about it in the blog: https://t.co/DfBXrYmJc8
full details in the paper: https://t.co/coiNXj4ucC
we introduce a new class of continuous flow-based language models and distill them into their corresponding flow map for one-step text generation.
we beat all discrete diffusion baselines at ~8x speed!
v2 gives a complete theory of the flow map over discrete data, with three equivalent ways to learn it (semigroup, lagrangian, eulerian). it turns out you can train these with cross-entropy objectives that look very similar to standard discrete diffusion — but without the factorization error that kills discrete methods at few steps.
beyond improving results across the board, we showcase properties that are unique to continuous flows. in particular, inference-time steering and guidance become straightforward. autoguidance brings generative perplexity down to 51.6 on LM1B, while discrete baselines completely collapse at the same guidance scale.
we also show reward-guided generation for steering topic, sentiment, grammaticality, and safety at inference time — and it works even at 1-2 steps with our flow map model. simple, well-understood techniques from continuous flows just work incredibly well in practice for language.
we’re extremely excited about the future of this class of models.
stay tuned for results on scaling, reasoning, and reinforcement learning-based fine-tuning. 🚀
Introducing the engineering guide to Active Inference.
Physical AI is moving from imitation to learned intuition.
Foundation models are a remarkable perceptual layer - powerful priors, broad knowledge, strong pattern recognition. But perception alone is not enough for real-world deployment. The physical world doesn't wait. It shifts, drifts, and surprises.
To act reliably under real-world uncertainty, you need more than prediction. You need a system that knows what it doesn't know — and acts accordingly.
That is what Active Inference adds: a single principled objective that sits above the perceptual layer, unifying learning and action, where the agent actively reduces uncertainty rather than assuming it away.
For those familiar with JEPA: set the epistemic term to zero and you recover JEPA. Add it back, and your system goes from "I predict" to "I know what I don't know."
Active Inference has been around for over a decade. Yet to the best of our knowledge, no paper explains it clearly from an engineering perspective — until now.
Friston's Ecosystems paper outlined the research agenda. This is the engineering companion - translating Active Inference into practical implementation, with reactive message passing as the realization.
Friston wrote the vision. Bert wrote the manual.
https://t.co/FskJsikq4m
There's a fruit fly walking around right now that was never born.
@eonsys just released a video where they took a real fly's connectome — the wiring diagram of its brain — and simulated it. Dropped it into a virtual body. It started walking. Grooming. Feeding. Doing what flies do.
Nobody taught it to walk. No training data, no gradient descent toward fly-like behavior. This is the opposite of how AI works. They rebuilt the mind from the inside, neuron by neuron, and behavior just... emerged. It's the first time a biological organism has been recreated not by modeling what it does, but by modeling what it is.
A human brain is 6 OOM more neurons. That's a scaling problem, something we've gotten very good at solving. So what happens when we have a working copy of the human mind?
@yacineMTB Deepseek is great but OpenAI invented reasoning models and their precursors. o1 was the first production model, they were the first to apply RL to an LLM with rlhf, published lets verify step by step
@IbrahimDagher20@SteveHere255@jsuarez great q - it depends how many reachable minima per dimension are added. If < 2 then an optimizer's chance of hitting a local minima shrinks with number of dims, because along every dimension the optimizer can move in 2 directions. Empirically it's < 2 (had to google why)
@IbrahimDagher20@jsuarez For a point to be a local minima the surface must curve upward in every single direction which becomes exponentially less likely as dimensionality increases