Jay Shim

@jayjshim

Undergrad RL Researcher @ UT Austin | Sharing what I learn about RL/ML

Austin

Joined December 2025

38 Following

7 Followers

41 Posts

Jay Shim @jayjshim

2 months ago

@kylelostat @allen_ai When you came to UT Austin and gave a talk about Olmo data distributions, I was really inspired by the work you've done! Thank you for the talk and I look forward to speaking with you in the future! Best of luck!

0

0

0

0

105

jayjshim retweeted

Jiaheng Hu @JiahengHu1

3 months ago

VLA models are capable generalists. But can they continually self-improve? Such Continual Reinforcement Learning (CRL) problems are traditionally considered very challenging. Surprisingly, we found that with the right setup, the simplest CRL recipe can work really well! https://t.co/7DmlhAqX9L

8

271

50

218

46K

Jay Shim @jayjshim

4 months ago

@trq212 Have you seen Claude "game the system" by pushing lots of useless code/comments to boost its own metrics? Or is that specifically de-emphasized

0

0

0

0

185

Jay Shim @jayjshim

4 months ago

@bcherny Super exciting! Has there been any major noticeable differences in persona or output quality when using the fast mode?

0

1

0

0

214

Jay Shim @jayjshim

4 months ago

The real bottleneck for AI in medicine might be human trust, not technical capabilities. Even if an AI hospital had higher survival rates, many of us would still hesitate. What would it actually take for people to trust AI in high-stakes settings?

0

0

0

0

42

Jay Shim @jayjshim

4 months ago

Realizing that building AI for healthcare means first working on safety reshaped how I think about my path. @DarioAmodei's essay on powerful AI left me with both awe at what's coming and urgency about getting it right, so I wrote down my thoughts. https://t.co/8ELYMgOZoc

1

0

0

0

63

Jay Shim @jayjshim

4 months ago

@bcherny Super cool tips! Using diction for prompts was a surprising tip that I hadn't even considered before. On a similar note, what specific keywords/phrases have you found boost performance significantly when written in the https://t.co/gs5IfSXaXm, even more than you expected?

0

0

0

0

618

Jay Shim @jayjshim

4 months ago

@bcherny Congrats on the successful usage! In hindsight, were there any specific areas you thought needed more safety-proofing? If Claude provided incorrect simulation/planning it could've been disastrous right?

0

0

0

0

175

Jay Shim @jayjshim

4 months ago

@karpathy The sentiment of not being able to compete with big names to me isn't a de-motivator, rather it feels like a challenge to outcompete them even without the same resources or connections

0

0

0

0

155

Jay Shim @jayjshim

5 months ago

On my TODO list is figuring out how to get multi-node pytorch training working with ray, FSDP, Huggingface, etc. Will keep you updated on my progress

0

0

0

0

46

Jay Shim @jayjshim

5 months ago

@DanielXieee @yukez What was the overall cost of developing something like this? It seems like a super cool project I'd like to try, but maybe there's a more cost-effective option

0

0

0

0

43

Jay Shim @jayjshim

5 months ago

Anyone have some comprehensive resources for learning to use Claude Code? I've been seeing it everywhere on my feed and excited to dive into it

0

0

0

0

45

Jay Shim @jayjshim

5 months ago

TIL: Forcing the model to output two tokens at opposite extremes for the gripper dimension doesn't destabilize/make it more difficult for the model to learn even though it's bin size is >> 2. Feel free to let me know if you've seen instances that disagree

0

0

0

0

43

Jay Shim @jayjshim

5 months ago

Training run is looking a lot better now. Before, the loss decreased and accuracy somewhat increased, but still somehow got close to 0% success on held-out tasks. Gripper seems to fix most of the issue, now there is some shakiness but I want to say that's from sampling.

0

0

0

0

26

Jay Shim @jayjshim

5 months ago

Spent a few days debugging a policy and found that the dataset I'm training on requires the gripper dim to be negated and spread out to {-1,1}. Hopefully this helps at least one other person since it was hard for me to find the 3 lines of transformations in a giant codebase

1

0

0

0

33

Jay Shim @jayjshim

5 months ago

TIL: the LIBERO dataset suites have image observations that are upside down (flipped over y axis). I guess LIBERO didn't have this issue since they trained all their transformers from scratch?

0

0

0

0

25

Jay Shim @jayjshim

5 months ago

So far, this issue still persists and it seems like Claude and GPT have issues pinpointing the problem as well, since the code "appears" correct. For now, I am going to try to reduce the amount of memory and see when exactly, if at all, the model gets offloaded

0

0

0

0

23

Jay Shim @jayjshim

5 months ago

I'm currently debugging an issue with FSDP and offloading sharded model weights. For some reason, even if I put a cuda synchronize + torch garbage collection + gc, on specific clusters it seems to maintain the memory on the gpu. Let me know if anyone else has experienced this!

1

0

0

0

30

Jay Shim @jayjshim

5 months ago

I've tried manually getting rid of potentially remaining gradients, blocking all threads until garbage collection executes. Yet the issue still seems to persist

1

0

0

0

26

Jay Shim @jayjshim

5 months ago

@neelsomani Where do you think this solving capability is coming from? Is it coming up with creative proofs humans wouldn't think of or simply no one has applied it in a new way?

0

0

0

0

78

Last Seen Users on Sotwe

Trends for you

Most Popular Users