Jerry Li @jeli_04 - Twitter Profile

[3/3] I’m also grateful for the wonderful mentoring by @johntzwei, @ameya_godbole1 and @robinomial! I learned a lot from this project and am really thankful for their guidance throughout. 🎉

0

58

Jerry Li @jeli_04

7 days ago

[1/3] Excited to finally share what I’ve been working on the past few months! Spiking intentionally contaminates training data to measure test set contamination. We show it can estimate contamination and adjust test scores for a more truthful evaluation.

Johnny Tian-Zheng Wei @johntzwei

8 days ago

🧵[1/5] Works on test set contamination focus on detection, but we show *correction* of inflated test scores is possible. https://t.co/7D6lr63d40 Our proposal is to spike the training data and insert some test examples at known rates. The spiked examples are used to calibrate...

1

32

10

9

5K

3

5

1

1K

Jerry Li @jeli_04

7 days ago

[2/3] @johntzwei and I believe spiking opens a new direction for model evaluation for devs/labs. In the age of internet-scale training data, building robust models across a multitude of domains and environments starts with accurately measuring what our current models can truly do

1

0

65

jeli_04 retweeted

Johnny Tian-Zheng Wei @johntzwei

8 days ago

🧵[1/5] Works on test set contamination focus on detection, but we show *correction* of inflated test scores is possible. https://t.co/7D6lr63d40 Our proposal is to spike the training data and insert some test examples at known rates. The spiked examples are used to calibrate...

1

32

10

9

5K

Jerry Li @jeli_04

23 days ago

@R2Cdev_ How is grok losing to 4o 😭

0

312

Jerry Li @jeli_04

23 days ago

Great to see Thinking Machines taking a slightly different route instead of just trying to compete with the other big players in the LLM space (Meta). And what’s even better is a technical report with details on their architecture! Was bearish before on TM but more excited now.

Thinking Machines

@thinkymachines

23 days ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

461

16K

2K

12K

8M

0

1

0

69

jeli_04 retweeted

Thinking Machines

@thinkymachines

23 days ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

461

16K

2K

12K

8M

Jerry Li @jeli_04

about 2 months ago

They weren’t lying 💀

Alex Albert

@alexalbert__

about 2 months ago

We released Claude Opus 4.6 just two months ago. Today we're sharing some info on our new model, Claude Mythos Preview.

844

18K

1K

3K

3M

0

40

Jerry Li @jeli_04

3 months ago

@SenSanders Unreal 😂

0

7

Jerry Li @jeli_04

3 months ago

@tunguz Static weights won’t enable dynamic thinking. However for this benchmark the context given to the LLM is key.

0

213

Jerry Li @jeli_04

3 months ago

@fchollet But doesn’t that just mean a better way of evaluating is giving context of the docs as well? Hence coding agents

0

1

0

133

jeli_04 retweeted

Zhikai Zhang

@Zhikai273

3 months ago

🎾Introducing LATENT: Learning Athletic Humanoid Tennis Skills from Imperfect Human Motion Data Dynamic movements, agile whole-body coordination, and rapid reactions. A step toward athletic humanoid sports skills. Project: https://t.co/MFy2NIOsrn Code: https://t.co/A7B5H8PIBh

160

4K

629

2K

1M

Jerry Li @jeli_04

3 months ago

@awnihannun Contextual token compaction is pretty good but in the long run it’ll always be capped by whatever the models capabilities are. More people should probably be focused on figuring how to do local updates within the model weights without being overly expensive like full backprop

0

74

jeli_04 retweeted

hardmaru

@hardmaru

3 months ago

Instead of forcing models to hold everything in an active context window, we can use hypernetworks to instantly compile documents and tasks directly into the model's weights. A step towards giving language models durable memory and fast adaptation. Blog: https://t.co/iHoifpsLMu

67

2K

230

2K

306K

jeli_04 retweeted

Physical Intelligence

@physical_int

6 months ago

We discovered an emergent property of VLAs like π0/π0.5/π0.6: as we scale up pre-training, the model learns to align human videos and robot data! This gives us a simple way to leverage human videos. Once π0.5 knows how to control robots, it can naturally learn from human video.

81

3K

344

1K

1M

jeli_04 retweeted

OpenAI

@OpenAI

7 months ago

We’ve developed a new way to train small AI models with internal mechanisms that are easier for humans to understand. Language models like the ones behind ChatGPT have complex, sometimes surprising structures, and we don’t yet fully understand how they work. This approach helps us begin to close that gap. https://t.co/g4zOcdezPU

217

5K

695

4K

2M

Jerry Li @jeli_04

11 months ago

No benchmark can convince me that Gemini is good

0

26

Jerry Li

@jeli_04

Last Seen Users on Sotwe

Trends for you

Most Popular Users