Teacher–student compatibility matters more than raw teacher strength.
This changes how you pick a teacher: both for frontier training (where the best available teacher is often a prior generation) and for efficient small models, where "bigger teacher is better" isn't the right rule.
Thanks @liuzhuang1234 for the support!
arxiv: https://t.co/EWcX461JHS
code: https://t.co/efl747LsVd
Knowledge doesn't always flow downhill.
We find that in LLM pretraining, a weaker teacher can improve a stronger student, and pushing the teacher further can actually hurt.
New paper: Strong Teacher Not Needed? On Distillation in LLM Pretraining.
Distillation improves generalization more readily than in-domain fit.
Out-of-distribution perplexity and downstream accuracy improve more consistently than in-domain perplexity, where some configurations help OOD/downstream while doing nothing for in-domain.
Stronger Normalization-Free Transformers – new paper.
We introduce Derf (Dynamic erf), a simple point-wise layer that lets norm-free Transformers not only work, but actually outperform their normalized counterparts.
🤯 Think better visuals mean better world models? Think again.
💥 Surprise: Agents don’t need eye candy— they need wins.
Meet World-in-World, the first open benchmark that ranks world models by closed-loop task success, not pixels.
We uncover 3 shocks:
1️⃣ Visuals ≠ utility
2️⃣ Action data > bigger models
3️⃣ Scaling test-time compute = more success
🤗 https://t.co/OXn4WfnuTU
🌍 https://t.co/AKRgXhSCJV
📄 https://t.co/izyjaKTHgO
https://t.co/hd6F9VPGQ2
Excited to share our lab’s first open-source release: LLM-Distillation-JAX
supports practical knowledge distillation configurations (distillation strength, temperature, top-k/top-p), built on MaxText
designed for reproducible JAX/Flax training on both TPUs and GPUs
Thrilled to introduce GenEx: Generating an Explorable World. ✨
✨ GenEx takes a single image 🖼️ and create a 3D generative world 🌍 — you can dive in for interactive exploration, and so as embodied AI agent.
Follow our X for more demos: https://t.co/3pgBPvo2ap
Paper on huggingface: https://t.co/e6TLHKheHy
Tech details: https://t.co/3TRt9SpwJv
(1/n)
Introducing GenEx: Turn any image into a 3D world adventure!
1️⃣ Create a fully explorable 360° world in 3D from just a single image!
2️⃣ Explore interactively or with GPT assistance.
3️⃣ Advance embodied AI with this imagined world!
Check out our website: https://t.co/Kj4g3STesR
Introducing Genex: Generative World Explorer.
🧠 Humans mentally explore unseen parts of the world, revising their beliefs with imagined observations. ✨ Genex replicates this human-like ability, advancing embodied AI in planning with partial observations. (1/6)
🤖LLMs know more long-context information than they show!
🔍Probing reveals higher accuracy than generation output. #LLMs know but don't tell.🤐
The earlier relevant information is learned within the layers, the higher the final output accuracy! 📈
(https://t.co/1f4I65VAEy)