paper baselines are matching | 34% offline → 99% with 200k online steps
but somehow could not scale the same result to triple, quadruple or bigger environments
one thing is sure q-loss scaling matters a lot, spent hours debugging q-value explosions before finding the right normalization
implemented q-chunking on top of it
offline only for now
already converges significantly faster: 84% at 50k steps vs 56% for vanilla fql
online fine-tuning + harder envs coming next
implemented q-chunking on top of it
offline only for now
already converges significantly faster: 84% at 50k steps vs 56% for vanilla fql
online fine-tuning + harder envs coming next
implemented flow q-learning (FQL) from scratch in PyTorch, tested on OGBench cube manipulation
smol 200k step pilot on my mac
some more bigger scale experiments coming soon
implemented flow q-learning (FQL) from scratch in PyTorch, tested on OGBench cube manipulation
smol 200k step pilot on my mac
some more bigger scale experiments coming soon
Congrats to Aime!! He said his left forearm is basically broken 😂
Final scores:
→ F.03: 12,732 packages (2.83 seconds/package)
→ Aime: 12,924 packages (2.79 seconds/package)
This is the last time a human will ever win