@kylelostat@allen_ai When you came to UT Austin and gave a talk about Olmo data distributions, I was really inspired by the work you've done! Thank you for the talk and I look forward to speaking with you in the future! Best of luck!
VLA models are capable generalists. But can they continually self-improve?
Such Continual Reinforcement Learning (CRL) problems are traditionally considered very challenging.
Surprisingly, we found that with the right setup, the simplest CRL recipe can work really well!
https://t.co/7DmlhAqX9L
@trq212 Have you seen Claude "game the system" by pushing lots of useless code/comments to boost its own metrics? Or is that specifically de-emphasized
The real bottleneck for AI in medicine might be human trust, not technical capabilities.
Even if an AI hospital had higher survival rates, many of us would still hesitate. What would it actually take for people to trust AI in high-stakes settings?
Realizing that building AI for healthcare means first working on safety reshaped how I think about my path.
@DarioAmodei's essay on powerful AI left me with both awe at what's coming and urgency about getting it right, so I wrote down my thoughts.
https://t.co/8ELYMgOZoc
@bcherny Super cool tips! Using diction for prompts was a surprising tip that I hadn't even considered before. On a similar note, what specific keywords/phrases have you found boost performance significantly when written in the https://t.co/gs5IfSXaXm, even more than you expected?
@bcherny Congrats on the successful usage! In hindsight, were there any specific areas you thought needed more safety-proofing? If Claude provided incorrect simulation/planning it could've been disastrous right?
@karpathy The sentiment of not being able to compete with big names to me isn't a de-motivator, rather it feels like a challenge to outcompete them even without the same resources or connections
@DanielXieee@yukez What was the overall cost of developing something like this? It seems like a super cool project I'd like to try, but maybe there's a more cost-effective option
TIL: Forcing the model to output two tokens at opposite extremes for the gripper dimension doesn't destabilize/make it more difficult for the model to learn even though it's bin size is >> 2. Feel free to let me know if you've seen instances that disagree
Training run is looking a lot better now. Before, the loss decreased and accuracy somewhat increased, but still somehow got close to 0% success on held-out tasks. Gripper seems to fix most of the issue, now there is some shakiness but I want to say that's from sampling.
Spent a few days debugging a policy and found that the dataset I'm training on requires the gripper dim to be negated and spread out to {-1,1}. Hopefully this helps at least one other person since it was hard for me to find the 3 lines of transformations in a giant codebase
TIL: the LIBERO dataset suites have image observations that are upside down (flipped over y axis). I guess LIBERO didn't have this issue since they trained all their transformers from scratch?
So far, this issue still persists and it seems like Claude and GPT have issues pinpointing the problem as well, since the code "appears" correct. For now, I am going to try to reduce the amount of memory and see when exactly, if at all, the model gets offloaded
I'm currently debugging an issue with FSDP and offloading sharded model weights. For some reason, even if I put a cuda synchronize + torch garbage collection + gc, on specific clusters it seems to maintain the memory on the gpu. Let me know if anyone else has experienced this!
I've tried manually getting rid of potentially remaining gradients, blocking all threads until garbage collection executes. Yet the issue still seems to persist
@neelsomani Where do you think this solving capability is coming from? Is it coming up with creative proofs humans wouldn't think of or simply no one has applied it in a new way?