VLA is 95% certain about current action. Will it 95% succeed in the task?
Obviously, not necessarily. But if youโre clever, you can *calibrate* action prob. to task success.
Our #ICML2026 paper formulates this + SOTA algorithms based on new connection to RL temporal differences
More generally, weโre interested in fast, targeted correction of unwanted LLM behaviors without full retraining, avoiding degradation of model quality in other areas.
Check out https://t.co/YCxi8pIXJD for more exciting examples!
2/2
Great example of the post-training behavioral adaptation work weโre doing at @hirundo_ai.
In this case, the focus was resilience to prompt injection while preserving general model capabilities, all using very limited data.
1/2
Case study time! Hirundo trained Gemma 4 E4B to resist adversarial overrides while overcoming the alignment tax:
- Weight-level defense based on Gemma 4 architecture
- Stronger security posture than models over 100x its size
- Preserves utility across benchmarks
New preprint out!
TL;DR: With a few calls to a lightweight, pre-trained VLM, we can select better experience for off-policy RL training, and improve both performance and sample efficiency ๐๐ค๐
Proud to have worked on this with @elad_sharony and @TomJurgenson !
Experience replay is the backbone of off-policy RL.
But here's a question:
Which experiences should you replay?
New paper: ๐๐๐-๐๐ฎ๐ข๐๐๐ ๐๐ฑ๐ฉ๐๐ซ๐ข๐๐ง๐๐ ๐๐๐ฉ๐ฅ๐๐ฒ
Project page: https://t.co/8tmBFdMxmX
๐งต
We're partnering with Tzafon, an AI R&D lab, to build next-generation agentic machine intelligence!
Through our partnership, we will provide compute capacity & cloud services to train Tzafonโs new multi-agent models & develop new automation frameworks โ https://t.co/gC4Zca9a4w
Weโre soon releasing Lightcone, which brings Light to your Mac. Lightcone will be able to automate any task on your computer. It will be powered by our very own pre-trained foundation model, built from the ground up for computer use.
https://t.co/Tf1y6yRmjO
Weโre hiring ML Engineers at Tzafon in SF.
Join our elite team (IOI, IMO, Google alumni) working at the intersection of AI research and practical engineering. Focus areas include multi-agent RL, memory architectures, efficient sampling techniques, and Large Action Models.
Send ML demos or papers to [email protected] or DM directly.
Thrilled to share that I'll be joining the team at @tzafon_company in a few weeks' time!
Excited for the opportunity to work at the frontier of AI research, and to build awesome stuff with some great people, making an impact in the real world.
Today we're announcing Tzafon, an applied artificial intelligence research and development firm, to the world.
Our mission is to expand the frontiers of machine intelligence. Through the intersection of artificial intelligence & software engineering, we're looking to push the boundaries of what machines are capable of.
We're launching WayPoint, our first open-source productโa robust, scalable solution for managing large fleets of browser instances, capable of launching up to 1,000 browsers per second and easily handling well over 10,000 browsers concurrently without issue.
We've secured a $4M pre-seed funding round led by Streamlined and are rapidly expanding our elite team of IOI & IMO medalists, PhDs, and alumni from Google, Jane Street, and PayPal. Interested in joining us? Reach out at [email protected].
We're also launching a production-ready version of WayPointโsign up below to get early access.
- Waitlist: https://t.co/kQSzn6lTq5
- WayPoint Repo: https://t.co/oXLVL5qHyu
- WayPoint Blog post: https://t.co/oK2WYFdz37
Want to learn / teach RL?
Check out new book draft:
Reinforcement Learning - Foundations
https://t.co/142MbSiTIQ
W/ @shiemannor and @YishayMansour
This is a rigorous first course in RL, based on our teaching at TAU CS and Technion ECE.
@chris_j_paxton Do you think weโre close to reaching the limit of what we can do with data and current architectures though? I feel like the tail of the distribution of scenarios (at least for general purpose / home robots) is at least as heavy as the driving one, if not more soโฆ
@AvivTamar1 If this is operational, it's a huge leap forward from their previous humanoid (H1) which doesn't seem nearly as impressive when seen in person.
Come say Hi at #ICLR2024!
Iโll be at our poster on Friday afternoon with @ZoharRimon, but also around the entire conference if anyone wants to chat ๐ค๐ฆพ๐
Heading to Vienna for #ICLR2024โ๏ธ
Me and @orrkrup will be presenting our work - MAMBA.
If you're interested in
- Meta RL ๐
- Generalization in RL ๐
- Efficient exploration ๐
Ping me or come by our poster on Friday 16:30-18:30 in Halle B #113
https://t.co/NCLWLA5ISR
@iclrconf