personal news: i've joined Elorian as Chief Reasoning Architect. multimodal AGI is the most critical frontier as we move from the era of chatbots to coding agents to models that reason and act over the physical world. i'm really excited to design natively visual models across thinking, agents, architectures, and the systems stack with the amazing team at Elorian.
i wish the best to everyone at xAI & SpaceX — driving posttraining was a unique experience with so many memorable stories. all the best to the team, and to Elon.
We’re thrilled to welcome @dustinvtran to Elorian as our Chief Reasoning Architect.
After leading post-training at xAI and contributing to Gemini at Google DeepMind, Dustin is joining Elorian to help build the next generation of visual reasoning models.
Excited for what's ahead 🚀
What makes a dataset valuable? And when is "more data" not the same as "better data" in machine learning and AI? Read more to find out: https://t.co/Q0wPOtfm5d
New research paper with Anthropic and Thinking Machines
AI companies use model specifications to define desirable behaviors during training. Are model specs clearly expressing what we want models to do? And do different frontier models have different personalities?
We generated thousands of scenarios to find out. 🧵
Whoa... Grok 4 beats o3 on our never-released benchmark: HumorBench, a non-STEM reasoning benchmark that measures humor comprehension. The task is simple: given a New Yorker Caption Contest cartoon and caption, explain the joke.
1/8 🚀 How can retrieval augmentation be made both relevant and non-redundant for few-shot adaptation? I'm excited to introduce COBRA. Catch our poster at #CVPR25 (ExHall D, Poster #450) on Sat 14 Jun, 5–7 p.m. CDT: https://t.co/dsdH6PJTHj
7/8 Despite its richer objective, COBRA incurs negligible extra computation at retrieval time and scales effortlessly to pools of hundreds of millions of images.
🚨 New Paper! 🚨
Guard models slow, language-specific, and modality-limited?
Meet OmniGuard that detects harmful prompts across multiple languages & modalities all using one approach with SOTA performance in all 3 modalities!! while being 120X faster 🚀
https://t.co/r6DGPDfwle