Our newest AI accelerator Maia 200 is now online in Azure.
Designed for industry-leading inference efficiency, it delivers 30% better performance per dollar than current systems.
And with 10+ PFLOPS FP4 throughput, ~5 PFLOPS FP8, and 216GB HBM3e with 7TB/s of memory bandwidth it's optimized for large-scale AI workloads.
It joins our broader portfolio of CPUs, GPUs, and custom accelerators, giving customers more options to run advanced AI workloads faster and more cost-effectively on Azure.
I spent the last few days prompting ChatGPT to understand how its memory system actually works.
Spoiler alert: There is no RAG used
https://t.co/zxvRRP2GK8
This is insane.
New AI model from Samsung, 10,000x smaller than DeepSeek and Gemini 2.5 Pro just beat them on ARC-AGI 1 and 2
Samsung’s Tiny Recursive Model (TRM) is about 10,000x smaller than typical LLMs yet smarter because it thinks recursively instead of just predicting text. It first drafts an answer, then builds a hidden "scratchpad" for reasoning, repeatedly critiques and refines its logic (up to 16 times), and produces improved answers each cycle.
This approach shows that architecture and reasoning loops (not just size), can drive intelligence. It enables powerful, efficient models that run cheaply, validate neuro symbolic ideas, and open highest quality reasoning to far more applications.
Acceleration is everywhere
We’re making robots more capable than ever in the physical world. 🤖
Gemini Robotics 1.5 is a levelled up agentic system that can reason better, plan ahead, use digital tools such as @Google Search, interact with humans and much more. Here’s how it works 🧵
Quick take: Are open-weight AI models getting a fair shake in evals? A few thoughts on comparing systems-to-models, sparked by Anthropic’s recent postmortem.
Anthropic published a careful account of a routing bug that degraded Claude responses. It was refreshingly specific.
Some short requests to Claude were misrouted to long-context servers resulting in degradation. Meaning, in short: two different models (or model configs) are specialized for different context length.
But this raises an ongoing thought I've had: closed providers can lean on routing, specialization, multiple models, and other scaffolding, while open-weight models are often judged as if they must perform well in every condition, alone. If we allowed comparable routing/specialization around open models, how much of the apparent gap would close?
For research—and policy—we should compare system-to-system (or model-to-model), not model-to-system. Ideally, we'd get per-call metadata from closed APIs so researchers know what they actually hit. But in the alternative, maybe we should be building more systems around open-weight models to give them a fair shake in capabilities evals.
New from Meta FAIR: Code World Model (CWM), a 32B-parameter research model designed to explore how world models can transform code generation and reasoning about code.
We believe in advancing research in world modeling and are sharing CWM under a research license to help empower the community to build upon our work.
➡️ Read the technical report: https://t.co/i9BqtfyJ7L
➡️Download the open weights: https://t.co/S2CxqCOMn0
➡️Download the code: https://t.co/wOsDR8Q5OQ
Mistral Medium 3.1 just landed on @lmarena_ai leaderboard—punching way above its weight!
🏆 #1 in English (no Style Control)
🏆 2nd overall (no Style Control)
🏆 Top 3 in Coding & Long Queries
🏆 8th overall
Small model. Big impact. Try it now on Le Chat and the API!
🚀 Excited to share my latest tutorial on Janus 1.3B! This ultra-lightweight multimodal model packs a punch with just 1.3B parameters, handling text and image generation with ease. Perfect for streamlined VLM tasks without massive compute power!
https://t.co/spSWUdop28