Terence Tao's SAIR foundation is doing some really cool work on enabling AI4Maths to be open and collaborative
I'm heaps excited that we now get to work together on bringing projects like their Mathematics Distillation Challenge to the HF ecosystem. Let's go 🚀!
Thanks to @AI21Labs for tracking down a silent uint32 overflow in vLLM's Mamba-1 CUDA kernel and contributing the fix.
Root cause: `uint32_t` stride × cache_index overflows silently at scale. Fix merged in #35275. The debugging story is worth a read.
🔗 https://t.co/S4XBnEn1uv
Dogs stolen from their owners in China walk 17km along a highway led by a corgi to get back home.
The dogs escaped a dog meat truck and walked along a highway in Changchun, Jilin before returning to their village.
@RNR_0 When you guys go on international vacations you will unfortunately be so much poorer than the people in your plane from Switzerland or US, or even working in London.
Announcing NVIDIA Nemotron 3 Super!
💚120B-12A Hybrid SSM Latent MoE, designed for Blackwell
💚36 on AAIndex v4
💚up to 2.2X faster than GPT-OSS-120B in FP4
💚Open data, open recipe, open weights
Models, Tech report, etc. here:
https://t.co/CAYpP1iK3i
And yes, Ultra is coming!
Excited to share that Nemotron 3 Super is now released! 🚀
A 120B hybrid MoE model (12B active, 1M context) built for complex agentic systems and long-context reasoning.
Key innovations:
• LatentMoE + Hybrid MoE
• Multi-Token Prediction (MTP)
• NVFP4 pretraining
@0xSeco Unrealised shouldn't be taxed at all, that's an idiocy of a liquidity crisis waiting to happen. The rate itself should be consistent with whatever prevents opening BVs. Ideally NL should draw inspiration from Switzerland, lower taxes but better infrastructure
In today's episode of "Would You Please Just Look at the Data?"
Eric finds that in MMLU-Pro chemistry and physics subsets, blindly picking the answer that has a leading space is correct pretty often!
Most LLMs learn to think only after pretraining—via SFT or RL. But what if they could learn to think during it? 🤔
Introducing RLP: Reinforcement Learning Pre-training—a verifier-free objective that teaches models to “think before predicting.”
🔥 Result: Massive reasoning boosts & gains that COMPOUND after post-training!
📝 Blog: https://t.co/5v4eLVHxRe
🔗Paper: https://t.co/OWnX0L1Wv3
🧵↓
New video on the details of diffusion models: https://t.co/rRjJehNuF3
Produced by @welchlabs, this is the first in a small series of 3b1b this summer. I enjoyed providing editorial feedback throughout the last several months, and couldn't be happier with the result.
Official results are in - Gemini achieved gold-medal level in the International Mathematical Olympiad! 🏆 An advanced version was able to solve 5 out of 6 problems. Incredible progress - huge congrats to @lmthang and the team! https://t.co/pp9bXF7rVj
A Sunset to Remember ☀️🌊
“The paddleboarder portrays the peaceful coexistence of people and wildlife,” as captured at sunset in August 2020.
🥇 People in Nature | 📷 Renee Capozzola
2024 National Wildlife Photo Contest Winners 📲: https://t.co/WRNSrUCfbW
The French overseas territory St. Pierre et Miquelon (population 5,800) now has the highest tariff rates in the world at 99%
Their exports are valued at just $3.5 million dollars a year. My guess as to what happened here is that they likely export a tiny amount (like $100 k worth of lobsters) to America without importing anything in return. So the insane White House algorithm (trade deficit/imports) would have produced this insane 99% tariff figure.
So unbelievably stupid, incompetent and insane
AI models are *not* solving problems the way we think
using Docent, we find that Claude solves *broken* eval tasks - memorizing answers & hallucinating them!
details in 🧵
we really need to look at our data harder, and it's time to rethink how we do evals...