Pi models are now running in production settings, in collab with @Ultraroboticsco and @weaverobotics.
We see:
- much higher autonomy with pi-0.6 over using pi-0.5
- fewer mistakes & higher throughput from incorporating data in pre-training
Blog post: https://t.co/ltgLlXRnXg
It was absolutely disgraceful seeing the venue people just rudely cut the mic. Even when @breadli428 repeatedly beg for just 3 more minutes and he will be done, the person simply repeated robotically: you should have been done. I was clearing rooms one by one. It’s the event organizer’s fault.
Absolutely no respect at all.
@SwayStar123 Matches my observations too!
One thing I've found interesting is that landscapes/scenes converge the fastest. Is it a coincidence that the first large-ish generative models (e.g., iGANs) were for landscapes?
Q: How do we scale robustness/invariance to foundation models like CLIP?
A: Test-time search! 🔍
Our new work FoCal finds canonical views to boost robustness to complex transforms (e.g. viewpoint): https://t.co/f38zLKu2uz
📍 ICML Poster: Tue 11–1:30, E. Hall A-B (E-2203)
🧵 1/5
Check out our work @ ICML tomorrow!
I can't join in person 🥲, but I'm genuinely excited to share this work.
FoCal hits the core issues I've encountered working on invariance over the last few years: complex transforms, scaling to foundation models & data-driven invariance 😀
Q: How do we scale robustness/invariance to foundation models like CLIP?
A: Test-time search! 🔍
Our new work FoCal finds canonical views to boost robustness to complex transforms (e.g. viewpoint): https://t.co/f38zLKu2uz
📍 ICML Poster: Tue 11–1:30, E. Hall A-B (E-2203)
🧵 1/5
@aaron_defazio@torchcompiled Would love to read the paper! I have seen similar phenomena with noisy gradient estimates.
My guess before seeing the paper: noise accumulates in the weights over many batches until it starts producing 2nd order (hessian) effects.
@EliSennesh Is this even a hot take? Since all EBMs are probabilistic models and vice versa? (The only difference being implementation, which i will conveniently ignore :) )
Hi all, I'll be at NeurIPS from Tuesday to Sunday.
I would love to chat with people about invariance, test-time optimization, and vision.
Please DM me if you'd like to talk (or catch up)
@JamesAllingham @kayembruno @shreyaspadhy@JaviAC7@DavidSKrueger @eric_nalisnick @jmhernandez233 Wonderful work! I love the beautifully drawn graphics especially. :)
I've been working on a similar line of research (https://t.co/x0FdxGRBQr). There is so much exciting stuff to be done. It would be great to chat at NeurIPS!
@wgussml Really cool observations! Might be related to some of @thisismyhat 's recent work :)
Personally I wonder if this is because (1) the loss functional is convex in function space; (2) averaging gradients ≈ averaging learned functions, at least for MNIST.