On-Policy Distillation is the most active new research direction being explored in RL for LLMs. Had the chance to discuss how it works with Dwarkesh and why it fits so nicely into large-scale pipelines.
"Attention is just a special case of <abstract math thing> so we generalized it by <neglecting the other 30 abstractions and conditions required for frontier architecture> and we found it performed <p hacking> compared to <naive baseline>"
recent "generalization" papers be like:
1. use system prompts to generate synthetic data, which functions as a steering vector
2. fine-tune LMs on the synthetic data
3. WOW we see "generalization"
4. WOW we can use rank-1 LoRA to replicate this "generalization"
5. WOW we find a steering vector that can explain, predict, and control "generalization"
We agree that the world model should be a simulator that supports decision-making, not rendering beautiful images/videos.
Our difference is in how the world state should be represented.
Should the world be anchored in Gaussian splats and physics engines for program-as-simulator? Or in learned representations for model-as-simulator?
We believe the latter is a more scalable, bitter-lesson-pilled approach.
More in our position paper "Critiques of World Models" coauthored with Prof. @ericxing and @jinyuhou0
https://t.co/NqnxGtKNBL
we got bought by Nvidia and made a bunch of contributions to Nemotron models and released a lot of successful open source data and software like:
- OpenShell
- NeMo Data Designer
- NeMo Anonymizer
- NeMo Safe Synthesizer
- Nvidia PII detector
…
When you leave an HFT, they put you on a non-compete for 1 or even 2 years! This is the biggest gift from HFTs to open source world.
Aman Gupta is being paid by Jump Trading (to sit at home) just added multi-token prediction to llama.cpp which speeds up local LLM models by 2x
everyone is building an agent or a tool
you don't want an agent or a tool, you want a reactor
I've been working on something cool and I think you'll like it
it's simple: an agent session DAG that keeps a declared world-model up to date in an efficient (memoized) render
each render node is an agent session: you declare the desired state with OpenProse markdown files
once invoked, each agent session acts as the provider. the agent session uses the open source openai-agents-sdk, extensible however you like with any model (I use with opus, sonnet, haiku)
the facets of the world-state are memoized, so not every agent has to run on every event, saving you on inference
if that sounds a lot like React or dataflow, that's because even in our brave new world the wisdom of the agents holds fast
so we solve the constraint satisfaction problem by building representationally invariant structure around the constraints of the physics that we live in. rationalizations of invariants are also not always true. we had to start measuring to figure out a newtonian model of gravity
data augmentation producing near exact invariants on networks that otherwise would just drift (by default) should tell people something about what the process of iterating on the optimization rule is doing
humans don't get these for free either but we do live in a world that asserts invariants on us
"transfer learning from geometric structure happens when you distribute the solution to a problem thinly across an iterated structure" is the most parsimonious accounting
i worry that hoping for a unified general relativity type discovery is kind of a category error in some way