Taiming Lu @taiminglu - Twitter Profile

Taiming Lu

@TaimingLu

9 days ago

@muhan_gao Nice work!

0

103

Taiming Lu

@TaimingLu

10 days ago

@kentonmurray Congrats!!! 🎉

0

1

0

149

Taiming Lu

@TaimingLu

10 days ago

Teacher–student compatibility matters more than raw teacher strength. This changes how you pick a teacher: both for frontier training (where the best available teacher is often a prior generation) and for efficient small models, where "bigger teacher is better" isn't the right rule. Thanks @liuzhuang1234 for the support! arxiv: https://t.co/EWcX461JHS code: https://t.co/efl747LsVd

TaimingLu's tweet photo. Teacher–student compatibility matters more than raw teacher strength.

This changes how you pick a teacher: both for frontier training (where the best available teacher is often a prior generation) and for efficient small models, where "bigger teacher is better" isn't the right rule.

Thanks @liuzhuang1234 for the support!

arxiv: https://t.co/EWcX461JHS
code: https://t.co/efl747LsVd

1

10

2

5

962

Taiming Lu

@TaimingLu

10 days ago

Knowledge doesn't always flow downhill. We find that in LLM pretraining, a weaker teacher can improve a stronger student, and pushing the teacher further can actually hurt. New paper: Strong Teacher Not Needed? On Distillation in LLM Pretraining.

TaimingLu's tweet photo. Knowledge doesn't always flow downhill.

We find that in LLM pretraining, a weaker teacher can improve a stronger student, and pushing the teacher further can actually hurt.

New paper: Strong Teacher Not Needed? On Distillation in LLM Pretraining. https://t.co/I60XsjfOdG

7

349

57

267

46K

Taiming Lu

@TaimingLu

10 days ago

Distillation improves generalization more readily than in-domain fit. Out-of-distribution perplexity and downstream accuracy improve more consistently than in-domain perplexity, where some configurations help OOD/downstream while doing nothing for in-domain.

TaimingLu's tweet photo. Distillation improves generalization more readily than in-domain fit.

Out-of-distribution perplexity and downstream accuracy improve more consistently than in-domain perplexity, where some configurations help OOD/downstream while doing nothing for in-domain. https://t.co/NkGViY3VaD

1

7

2

0

1K

Taiming Lu

@TaimingLu

2 months ago

@DanielKhashabi Congrats Daniel!!! 🎉🎉🎉

0

3

0

108

TaimingLu retweeted

Zhuang Liu

@liuzhuang1234

6 months ago

Stronger Normalization-Free Transformers – new paper. We introduce Derf (Dynamic erf), a simple point-wise layer that lets norm-free Transformers not only work, but actually outperform their normalized counterparts.

liuzhuang1234's tweet photo. Stronger Normalization-Free Transformers – new paper.

We introduce Derf (Dynamic erf), a simple point-wise layer that lets norm-free Transformers not only work, but actually outperform their normalized counterparts. https://t.co/NAPJvfsEGI

19

1K

175

787

166K

TaimingLu retweeted

Jieneng Chen

@jieneng_chen

8 months ago

🤯 Think better visuals mean better world models? Think again. 💥 Surprise: Agents don’t need eye candy— they need wins. Meet World-in-World, the first open benchmark that ranks world models by closed-loop task success, not pixels. We uncover 3 shocks: 1️⃣ Visuals ≠ utility 2️⃣ Action data > bigger models 3️⃣ Scaling test-time compute = more success 🤗 https://t.co/OXn4WfnuTU 🌍 https://t.co/AKRgXhSCJV 📄 https://t.co/izyjaKTHgO https://t.co/hd6F9VPGQ2

jieneng_chen's tweet photo. 🤯 Think better visuals mean better world models? Think again.
💥 Surprise: Agents don’t need eye candy— they need wins.

Meet World-in-World, the first open benchmark that ranks world models by closed-loop task success, not pixels.

We uncover 3 shocks:
1️⃣ Visuals ≠ utility
2️⃣ Action data > bigger models
3️⃣ Scaling test-time compute = more success

🤗 https://t.co/OXn4WfnuTU
🌍 https://t.co/AKRgXhSCJV
📄 https://t.co/izyjaKTHgO
https://t.co/hd6F9VPGQ2

2

153

38

81

42K

TaimingLu retweeted

Zhuang Liu

@liuzhuang1234

8 months ago

Excited to share our lab’s first open-source release: LLM-Distillation-JAX supports practical knowledge distillation configurations (distillation strength, temperature, top-k/top-p), built on MaxText designed for reproducible JAX/Flax training on both TPUs and GPUs

liuzhuang1234's tweet photo. Excited to share our lab’s first open-source release: LLM-Distillation-JAX

supports practical knowledge distillation configurations (distillation strength, temperature, top-k/top-p), built on MaxText

designed for reproducible JAX/Flax training on both TPUs and GPUs https://t.co/8zhDPRv7uS

4

222

30

66

21K

TaimingLu retweeted

JHU Computer Science @JHUCompSci

over 1 year ago

Meet the AI system that can envision an entire world from a single picture. @genex_world—developed by @jieneng_chen, @YuilleAlan, @TaiMingLu, @DanielKhashabi, & @tianminshu—imagines in-depth scenarios to make informed decisions. Learn more here: https://t.co/yXTaBBxH8K

0

12

6

1

2K

TaimingLu retweeted

Jieneng Chen

@jieneng_chen

over 1 year ago

Thrilled to introduce GenEx: Generating an Explorable World. ✨ ✨ GenEx takes a single image 🖼️ and create a 3D generative world 🌍 — you can dive in for interactive exploration, and so as embodied AI agent. Follow our X for more demos: https://t.co/3pgBPvo2ap Paper on huggingface: https://t.co/e6TLHKheHy Tech details: https://t.co/3TRt9SpwJv (1/n)

2

103

31

42

10K

TaimingLu retweeted

GenEx @genex_world

over 1 year ago

Introducing GenEx: Turn any image into a 3D world adventure! 1️⃣ Create a fully explorable 360° world in 3D from just a single image! 2️⃣ Explore interactively or with GPT assistance. 3️⃣ Advance embodied AI with this imagined world! Check out our website: https://t.co/Kj4g3STesR

2

37

11

14

10K

TaimingLu retweeted

Jieneng Chen

@jieneng_chen

over 1 year ago

Introducing Genex: Generative World Explorer. 🧠 Humans mentally explore unseen parts of the world, revising their beliefs with imagined observations. ✨ Genex replicates this human-like ability, advancing embodied AI in planning with partial observations. (1/6)

6

164

49

90

37K

TaimingLu retweeted

Muhan Gao

@muhan_gao

almost 2 years ago

🤖LLMs know more long-context information than they show! 🔍Probing reveals higher accuracy than generation output. #LLMs know but don't tell.🤐 The earlier relevant information is learned within the layers, the higher the final output accuracy! 📈 (https://t.co/1f4I65VAEy)

muhan_gao's tweet photo. 🤖LLMs know more long-context information than they show!

🔍Probing reveals higher accuracy than generation output. #LLMs know but don't tell.🤐

The earlier relevant information is learned within the layers, the higher the final output accuracy! 📈

(https://t.co/1f4I65VAEy) https://t.co/IFWmzXewtw

5

14

6

2K

Taiming Lu

@TaimingLu

Last Seen Users on Sotwe

Trends for you

Most Popular Users