Excited to share Flow Matching Policy Gradients: expressive RL policies trained from rewards using flow matching. It’s an easy, drop-in replacement for Gaussian PPO on control tasks.
@ronskoro Thanks! We didn’t try with distilled students though that sounds promising. Our experiments initialized from pretrained SD3 checkpoints, would you consider that late training?
We developed a simple, sample-efficient online RL technique for post-training image generation models. We see it as a possible steerable alternative to CFG, driven by any scalar reward, including human preference.
Check out our blog post at https://t.co/W9CqU9yIlJ for a walkthrough of our design decisions.
w/ fantastic collaborators: Miika Aittala, Tero Karras, Janne Hellsten, @akanazawa Timo Aila and Samuli Laine
𝗢𝗻𝗲 𝗺𝗲𝗺𝗼𝗿𝘆 𝗰𝗮𝗻’𝘁 𝗿𝘂𝗹𝗲 𝘁𝗵𝗲𝗺 𝗮𝗹𝗹.
We present 𝗟𝗼𝗚𝗲𝗥, a new 𝗵𝘆𝗯𝗿𝗶𝗱 𝗺𝗲𝗺𝗼𝗿𝘆 architecture for long-context geometric reconstruction.
LoGeR enables stable reconstruction over up to 𝟭𝟬𝗸 𝗳𝗿𝗮𝗺𝗲𝘀 / 𝗸𝗶𝗹𝗼𝗺𝗲𝘁𝗲𝗿 𝘀𝗰𝗮𝗹𝗲, with 𝗹𝗶𝗻𝗲𝗮𝗿-𝘁𝗶𝗺𝗲 𝘀𝗰𝗮𝗹𝗶𝗻𝗴 in sequence length, 𝗳𝘂𝗹𝗹𝘆 𝗳𝗲𝗲𝗱𝗳𝗼𝗿𝘄𝗮𝗿𝗱 inference, and 𝗻𝗼 𝗽𝗼𝘀𝘁-𝗼𝗽𝘁𝗶𝗺𝗶𝘇𝗮𝘁𝗶𝗼𝗻.
Yet it matches or surpasses strong optimization-based pipelines. (1/5)
@GoogleDeepMind@Berkeley_AI
@brenthyi who worked on FPO/FPO++ is finishing his PhD and going on the job market 😭✨
He is also the person behind viser, pyroki, egoallo, jaxls, tyro and more!
I can't express how amazing it is to have Brent on your team..! Any team would be incredibly lucky to have him!!
We trained diffusion models on a billion LLM activations, and we want you to use them!
New preprint: Learning a Generative Meta-Model of LLM Activations
Joint work with @feng_jiahai, @trevordarrell, @AlecRad, @JacobSteinhardt.
More in thread 🧵
New project! Flow Policy Gradients for Robot Control
tldr; a simple online RL recipe for training and fine-tuning flow policies for robots
co-led w/ @redstone_hong: https://t.co/nKSq9EakUy
𝑪𝒐-𝒕𝒓𝒂𝒊𝒏𝒊𝒏𝒈 is a promising way to scale Large Behavior Models (LBMs) beyond robot data, yet the data and training recipe are far from settled. 🤔
We present a large-scale empirical study leveraging 4,000h of robot/human data and 50M vision-language samples, evaluating 89 policies across 58,000 simulation rollouts and 2,835 real-world trials. 🤖📊
https://t.co/jMlZWdXexl
Work done during my internship at @ToyotaResearch.
tyro 1.0 is out 🐣
This has been a pet project/niche interest of mine for ~4 years now, so it's a bit of a sentimental moment...
https://t.co/bAibP3RjxE
Action chunking is drawing growing interest in RL, yet its theoretical properties are still understudied.
We are excited to share some insights on when we should use action chunking in Q-learning + a new algo (DQC) to tackle hard long-horizon tasks!https://t.co/izVWQBgH3c🧵1/N