I'm so happy to finally share this work - Computing Diffusion Geometry
We present a simple, data-driven method for the *whole* toolkit of Riemannian geometry, PDEs, and vector calculus!
"The Annotated Transformer" changed how we learned sequence modeling. It’s time to do the same for World Models.
Introducing: The Annotated JEPA.
A step-by-step, from-scratch PyTorch walkthrough of Joint Embedding Predictive Architectures. 🧵
Here's why it works: embeddings lie on a hypersphere, so d-1 angles can replace d Cartesian coordinates. In high dimensions, those angles concentrate around pi/2, causing IEEE 754 exponents to collapse to a single value. This makes the byte stream highly compressible.
Highly recommended 🆓 PDF book:
"Algorithms" by Jeff Erickson with many nice figures and exercises!
👉 https://t.co/fVIdxoYSGZ
Nice subtle cover page design too!
Much like the switch in 2025 from language models to reasoning models, we think 2026 will be all about the switch to Recursive Language Models (RLMs).
It turns out that models can be far more powerful if you allow them to treat *their own prompts* as an object in an external environment, which they understand and manipulate by writing code that invokes LLMs!
Our full paper on RLMs is now available—with much more expansive experiments compared to our initial blogpost from October 2025!
https://t.co/x47pIfIkTb
Usually books on geometry have only a very few if no figures....
...But personally, I like figures to understand quickly and grasp concepts!
IMHO, this great book is a masterpiece at conveying notions and intuitions of geometry with many figures!
Highly recommended!!!
Information geometry studies probability distributions as geometric objects, equipping statistical models with a Riemannian structure derived from Fisher information. This viewpoint turns inference into geometry, where learning corresponds to moving along curved manifolds of distributions. In probability, information geometry clarifies concepts like divergence, sufficiency, and exponential families, and provides precise bounds on estimation and hypothesis testing. In machine learning, it powers natural gradient descent, variational inference, and optimization of deep models by respecting parameter geometry rather than using naive Euclidean updates. In real life, information geometry is applied in signal processing, neuroscience, thermodynamics, finance, and robotics, improving efficiency, stability, and interpretability in systems that learn or adapt under uncertainty.
Image: https://t.co/vbXeZyX7Am
Eshkol v1.0-foundation has been released. the first language with compiler-integrated autodiff and the first fully homoiconic programming language with no GC and LLVM native performance, all in a LISP with deterministic memory. the Future of integrated AI.
https://t.co/Olsae4MQ8T
Minimizing Kullback-Leibler divergence interpreted as an information projection wrt to Fisher orthogonality and exponential/mixture connection.
Uniqueness of projection proof proved with a dual generalization of the Pythagoras' theorem!
PDF (≈1MB):
https://t.co/VQE7wcTlXn
Voronoi diagram (black points) computed by projecting vertically lower envelope of n 3D graphs of functions {(x,y_i(x))} with y_i(x)=D(x_i,x) (pink).
When distance D(x,x')=‖x-x'‖^2, graphs of y_i are paraboloids and Voronoi cell borders are linear
At Neureps workshop
Hilbert geometry of the symmetric positive-definite bicone with applications to extended Gaussians:
Degenerate covariance and precision matrices or both on the bicone boundary.
https://t.co/X547Dqp6Ws