Generally, I don't bookmark anything on X. If I find something I MUST revisit later, I'd just put it on Apple Notes... a place my eyes instinctively stumbles upon every single day.
4.1k people bookmarked this great resource, 1% will return unfortunately.
"Latent Reasoning with Normalizing Flows"
NF-CoT makes latent reasoning feel native to LLMs. So instead of forcing every intermediate thought through verbose CoT text, it learns compact continuous thoughts with a normalizing flow inside the causal LLM stream.
The key move is that latent thoughts become sampleable, scoreable, and RL-trainable like tokens, with exact likelihoods and KV-cache friendly decoding.
This beats explicit CoT and prior latent methods, while using 64 latent tokens to compress roughly 385 CoT tokens and running much faster than diffusion-based latent reasoning.
VLA-JEPA just dropped in LeRobot 🤖
What makes this model special is that it does not just learn what action to take from a given observation, it also leverages a JEPA world model to learn action-relevant dynamics.
During training, the VLA leverages V-JEPA2 by conditioning its predictor. This clever trick adds a world modeling objective to the training, which also allows pretraining on human videos.
At inference, the world model is dropped entirely, keeping only a standard VLA architecture: Qwen backbone and action head.
The demo here was only fine-tuned on 13 examples, showing great pretraining capability and running in real time on @NVIDIARobotics DGX Spark!
VLA-JEPA is the first world model to be ported to LeRobot, and I feel like it won't be the last 🚀
@Thom_Wolf@ClementDelangue
Can LLMs reason in superposition? We introduce MUX, a method that turns text CoT into latent continuous reasoning.
Instead of one-hot vectors as in CoT, the model now learns to predict weighted averages of several one-hot vectors, that we call multiplexed tokens. These multiplexed tokens can be designed to be lossless, so by predicting them one is essentially doing multi-token prediction (MTP) in superposition.
MUX is the best latent reasoning method across 32 math settings spanning 1-8B LLaMA base models, reducing CoT length by 3-6x. Furthermore, it is able to perform parallel search, harnessing a core strength of superposed reasoning.
In collaboration with @alperen_gozeten , @mmbronstein, @ismaililkanc, and @jw9730.
1/🧵
We unlocked the working memory of LLMs 💥
Reasoning in Memory (RiM) replaces autoregressive "thinking out loud" with fixed memory blocks that form a task-specific workspace for latent reasoning.
The key idea is simple: reasoning should happen inside the LLM, not in its output!
Interested in learning how to run RL at scale? Here are the best resources to read…
Research on Scaling RL
1. The Art of Scaling RL compute for LLMs: https://t.co/PGjI6Gwgv0
2. Scaling Behaviors of LLM RL Post-Training: https://t.co/2u2saB3C0h
3. Optimally Scaling Sampling Compute for LLM RL: https://t.co/rUSdUvJyNH
4. Scaling up RL: https://t.co/O8vV6z8ymx
5. ProRL V2 - Prolonged Training Validates RL Scaling Laws: https://t.co/vu72juvRW4
6. Polaris - A Recipe for Scaling RL with Reasoning Models: https://t.co/rMibSAeJbg
RL Frameworks
1. Hybrid Flow (early outline of the verl framework): https://t.co/GnWXx131uD
a. More up-to-date info can be found here: https://t.co/j801HcJmPP
2. AReal - Large-Scale Async RL: https://t.co/qhOvsQK09N
3. PipelineRL - Fast On-Policy RL: https://t.co/iRM7KzySXe
4. AsyncFlow - Async Streaming RL: https://t.co/YwmzFtiU2q
RL for Agents
1. DeepSWE - Open Coding Agent Trained w/ RL: https://t.co/GHQHcmtE6F
2. AutoForge - Environment Synthesis for Agentic RL: https://t.co/mr3WDIL5vq
3. Agent-R1 - Training Agents w/ End-to-End RL: https://t.co/xpfQJGgzEv
4. AgentRL - Scaling RL for Multi-Turn, Multi-Task Agents: https://t.co/7fbVl0RWXG
5. The Landscape of Agentic RL: https://t.co/OMnSV4rgdW
6. Training SWE Agents with RL: https://t.co/YqMqySbyXS
Case Studies & Tech Reports
1. Kimi tech reports:
a. Kimi K2 - Open Agentic Intelligence: https://t.co/aAw17SXrIw
b. Kimi End-to-end Agentic RL: https://t.co/ProBpOPIiI
c. Kimi K1.5 - Scaling RL for LLMs: https://t.co/kRGOxY9Jvp
2. Composer series from Cursor:
a. Composer 2: https://t.co/K0v8rNCE6Z
b. Composer 2.5: https://t.co/D9PYimfOMU
3. Olmo 3 (also has open code / data): https://t.co/khetJFvp6N
4. MiniMax tech reports:
a. MiniMax-M2: https://t.co/HApb0OB80S
b. MiniMax-M1: https://t.co/mZj9UQsrnC
5. Nemotron 3 (NVIDIA): https://t.co/lCpE1GzxSi
Train AI robots without writing a single line of code. 🤖
We just launched LeLab, the official graphical user interface for LeRobot built by @rabault_nicolas. It completely removes the command line from the robot learning workflow, taking you from raw hardware to autonomous movement visually.
If you've ever wanted to get into AI robotics but were held back by complex terminal setups, this is for you.
- Zero-Terminal Setup: Smart calibration with automatic USB port detection.
- Easy Data Collection: Teleoperate your robot and record a dataset.
- One-Click GPU Training: Don't have a massive local GPU? Scale your training instantly with Hugging Face Jobs right inside the app.
Just plug in your SO-ARM101 and start teaching your robot. We put together a complete, step-by-step video guide showing exactly how to get started and train your first policy.
Docs: https://t.co/PrUEIeaXKW
GitHub: https://t.co/SFuOiN8rjN
Another banger resource when designing reward functions for your RL environment
The Prime RL Environments Hub. Contains so many environments with various kinds of reward functions and task settings, worked on by so many contributors. Lot of brain juices flowing here.
What if you could take three completely different model families… and distill them into one tiny model? 🤯
📜 Paper: https://t.co/K2iKD4xFvp
MOPD (Multi-Teacher On-Policy Distillation) has become a standard procedure in post-training. We already distill multiple specialized variants of the same model into a single set of weights.
But what if we could go further - and distill models from entirely different families? Turns out, it is possible.
Today we’re releasing a paper on cross-tokenizer distillation - our first steps in this exciting direction. 📄
We distilled Qwen3-4B, Phi-4-Mini, and Llama-3B into Llama-3.2-1B.
MMLU jumped from 32.05 → 46.32 when using multiple teachers. 📈
The team is now working on Nemo-RL integration so the community can try this method in their own settings. Plus, we are scaling experiments up. 🚀
We built a bipedal robot for about $2,500.
A real, mostly 3D-printed robot you can build, repair, simulate, train, and control.
Today we’re releasing LeRobot Humanoid: an open robot-learning platform with hardware, runtime, identification tools, and training environments.
Blog post: https://t.co/zu2etb1NZo
Repo: https://t.co/4myLRUtZ3W
Introducing EXPO-FT – Efficient, Reliable & Open-Source VLA Finetuning!
EXPO-FT unlocks π0.5 for challenging manipulation tasks:
Routing string lights & inserting the power connector to illuminate them
Striking pool ball into pocket
Inserting flower into wine bottle
(1/5)
i just beat @GoogleDeepMind's turboquant
introducing Shard. 10x KV cache compression on Llama-3.1-8B. zero quality loss
- 10x @ 8K context, 11.2x @ 32K
- NIAH recall 1.000 across 4K-32K
- LongBench Δ ≈ 0 vs FP16
turboquant tops out at 4-6x at the same quality. we doubled it.
read more: https://t.co/PAV5WdAzN6
@kirrithan
Oh, they actually open-sourced a strong 1B model.
128K context... wait, you mean a 1B model with 128K context length?
And they even released MLX quantizations??? These guys are insane.
Another cool research on Looped Transformers
They ask the question: "Can we loop a frozen, off-the-shelf checkpoint directly at inference time without any modifications?"
So naive repetition pushes hidden states outside the distribution later layers expect, so performance drops.
But if you treat transformer layers as Euler steps in a residual ODE and replaces naive loops with damped Runge–Kutta substeps, it is possible.
This lets the frozen models get extra latent compute at test time with no fine-tuning, no new weights, and no architecture changes.
And the best gains show up on hard knowledge MC tasks like MMLU-Pro, GPQA, and ARC.
Want to train your own Claude Code/Codex agent with your own model? We are excited to roll out ProRL Agent V2: Polar.
An infrastructure for black-box agentic RL, Polar lets you train agents with any harness, whether it’s OpenClaw, Hermes, or a custom agent built with frameworks like LangChain, Autogen, AG2 and others.
Check out here:
Code: https://t.co/AdfJHWE1Gp
Paper: https://t.co/5SnzJX2wCZ
Welcome to the world of agentic RL, without opening the box.
"Vector Policy Optimization"
This paper trains models to generate diverse answer sets by using vector rewards, then randomly weighting reward dimensions so each answer can specialize in a different tradeoff.
Instead of optimizing one best response, VPO optimizes a set that covers the Pareto frontier.
On reasoning, tool use, navigation, and coding, VPO improves best@k and pass@k as the search budget grows, and even solves hard OpenEvolve problems GRPO cannot reach.
10 GitHub repos that quietly run my daily life and save me $2,000 a year in 2026.
Bookmark this list.
1. Paperless-ngx
Every receipt, invoice, contract, and tax document scanned, OCR'd, and tagged automatically. The most-cited "non-negotiable" self-hosted tool of 2026. Replaces Adobe Scan + Evernote at $15/month.
Repo → https://t.co/wW6mA7Zb2p
2. Karakeep
Saves every link, screenshot, article, and PDF I'll ever want again. AI auto-tags everything. Mozilla just killed Pocket. This took its place. Replaces Raindrop Pro + Pocket at $15/month.
Repo → https://t.co/IZ96g5duzC
3. Vaultwarden
Every password I'll ever need, on every device, encrypted. Replaces 1Password Family at $10/month.
Repo → https://t.co/lwESQjTyr9
4. Anytype
My notes, tasks, knowledge base, all local, all encrypted. Notion is $10 billion. Anytype is mine. Replaces Notion Plus + Roam at $20/month.
Repo → https://t.co/3pZnRBeSeR
5. AdGuard Home
Blocks ads on every device on my home network. Phones, TVs, tablets, laptops. Replaces NextDNS Premium at $20/month.
Repo → https://t.co/nmoF3k4D5b
6. Syncthing
Syncs files across every device I own, peer-to-peer. No cloud, no subscription, no Dropbox account. Replaces Dropbox 2TB at $15/month.
Repo → https://t.co/Z1UzrFejaw
7. Home Assistant
Lights, doors, thermostat, security cameras — all on one dashboard. Replaces SmartThings Pro + Alexa Plus at $25/month.
Repo → https://t.co/71S8Bq1Hmq
8. Audiobookshelf
Audiobooks and podcasts on every device. Beautiful apps. Mine forever. Replaces Audible + premium podcasts at $30/month.
Repo → https://t.co/MhwerkVSrK
9. Stirling-PDF
Every PDF operation in one place. Merge, split, OCR, compress, sign, redact. Replaces Adobe Acrobat Pro at $20/month.
Repo → https://t.co/cox9pHE4zp
10. Bitwarden Send
Encrypted file sharing with expiration timers. Replaces WeTransfer Pro + Dropbox Transfer at $20/month.
Repo → https://t.co/XCZ2JtWqWQ
Save this. Share it with the person in your life still paying $190 a month for what's been free this whole time.
100% free. 100% open source.
The more I play around with GEPA the more I realize its not just a prompt optimization algorithm.
Its really the most efficient way to have an LLM explore a massive dataset and and make useful insights.
This visualizer alone shows just how cool that process is.
Looking at the new Generative Recursive Reasoning Models from Bengio and co.
They are models that think by iteratively updating an internal latent state (a hidden vector), and can branch by sampling multiple “thought trajectories”
Very fascinating. May write about it later.
HRM beats Transformers that's 7x its size on language modeling!?
"HRM-Text: Efficient Pretraining Beyond Scaling"
This paper's Hierarchical Recurrent Model, which contains slow planning layers and fast execution layers to promote planning and recurrence, was trained directly on instruction response pairs instead of raw text.
Their 1B model trained from scratch on 40B unique tokens for about $1,500 gets competitive results with 2-7B open models using up to 900x fewer tokens!