We’re going all in on World Models.
Today we’re launching the 1X World Model Lab.
The bet is simple:
You can’t fine-tune your way to AGI.
And you definitely can’t fine-tune your way to robots that can operate in the physical world.
General-purpose humanoids need models that understand space, motion, objects, causality, affordances, physics, and action before they ever see a specific task.
The frontier is not better VLA wrappers.
The frontier is embodied world models.
The 1X World Model Lab will focus on large-scale embodied world model pretraining: building the most generalizable foundation model for humanoid robots from the ground up.
The next frontier in AI requires scaling:
web-scale media + egocentric human videos + sim + dexterous remote operated robot data + on-policy NEO data → real-world deployment for robot data collection and RL → abundance of data → physical AI
The robot collects data.
The model gets better.
The robot gets better.
Repeat.
To lead this, we brought in one of the best for the mission: @_sam_sinha_ , as Head of World Models.
Sam was a founding research scientist at Luma AI and has been at the frontier of scaling multimodal generative video models his whole career.
If you’re the best in the world at large-scale pretraining, video models, robotics, RL, infra, or data — and you want your models to move atoms, not just pixels — join us.
Send background + evidence of exceptional ability to:
[email protected]
We’re building the model that makes autonomous labor real.
NVIDIA has just released Nemotron 3 Ultra, the new most intelligent US open weights model, with leading speed for its intelligence
Nemotron 3 Ultra scores 47.7 on the Artificial Analysis Intelligence Index, well ahead of the next strongest US open weights models, Gemma 4 31B (39.2), Nemotron 3 Super (36.0) and gpt-oss-120b (33.3), but behind the Chinese-led open weights frontier (Kimi K2.6 at 53.9).
We partnered with @NVIDIA to evaluate this model for intelligence and speed ahead of its public release. These figures use the final NVFP4 weights that NVIDIA recommends for inference, but our tests show minimal intelligence impact compared to BF16 testing, with higher precision resulting in an Artificial Analysis Intelligence Index score of 48.2 vs. the NVFP4 score of 47.7.
Key Takeaways:
➤ Nemotron 3 Ultra leads in speed for its intelligence: through BlackBox AI ahead of release, Nemotron 3 Ultra is served at over 400 output tokens per second - this is slightly faster than the typical serving speed of gpt-oss-120b despite being >4X larger, and comes with significantly greater intelligence
➤ Largest Nemotron 3 model so far: with approximately 550 billion total parameters and 55 billion active, Nemotron 3 Ultra is significantly larger than its siblings and is the largest and most intelligent US open weights model release ever
➤ Nemotron 3 Ultra is the leading US open weights model on the Artificial Analysis Intelligence and Agentic Indexes by far, but Gemma 4 31B scores ~1 point higher on the Coding Index (comprised of Terminal-Bench Hard and SciCode)
"#Cosmos 3 is the world’s first fully open omnimodel that can natively understand & generate text, images, video, ambient sound & actions with leading physics accuracy..."
@NVIDIAAI : Natively understand? (i.e. token generation via RL-right?)
Dr. @ylecun & Dr.@drfeifei prob. know
@NVIDIAAI & @NVIDIARobotics: Why do you keep refering to 'vision reasoning' (instead of #VLA, which was also inaccurate) in #Cosmos3?
#worldmodels: there is training data, simulation & action prediction
Perhaps Dr. @ylecun & Dr. @drfeifei know?
https://t.co/5kRgllZVZx
@NVIDIAAI & @NVIDIARobotics: Why do you keep refering to 'vision reasoning' (instead of #VLA, which was also inaccurate) in #Cosmos3?
#worldmodels: there is training data, simulation & action prediction
Perhaps Dr. @ylecun & Dr. @drfeifei know?
https://t.co/5kRgllZVZx
In a structural development positioned to reshape the baseline parameters of high-performance data infrastructure, Microsoft has officially unveiled its next-generation topological quantum processing unit, the Majorana 2 superchip.
Steering clear of standard superconducting or trapped-ion formulations backed by regional tech conglomerates, Microsoft’s deep 20-year commitment to the elusive Majorana fermion underpins a highly calculated strategic timeline: delivering a commercially viable, fault-tolerant quantum computer scaling to 1 million logical qubits by 2029.
The 20-Year Paradigm Shift: Engineering Inherent Topological Protection
While competitor platforms scaled physical qubit aggregates into the hundreds and thousands over the past decade, Microsoft sustained severe industry skepticism due to the prolonged engineering intervals required to validate its underlying physics.
However, standard quantum topologies face an existential barrier: environmental decoherence and high error rates. Classical superconducting qubits suffer severe state collapse from microscopic thermal variances or electromagnetic noise, requiring more than a 10,000:1 ratio of unstable physical qubits to construct a single high-fidelity "Logical Qubit."
The Majorana 2 sub-architecture re-engineers this paradigm at the foundational materials layer:
Hardware-Level Error Correction: The Majorana 2 chip exploits non-local topological states within engineered superconductor-semiconductor heterostructures. Quantum information is encoded non-locally across the physical boundaries of the geometric network. Consequently, localized environmental perturbations cannot alter the global topological braid, establishing native hardware immunity to standard decoherence profiles.
Disruptive Qubit Efficiency Mapping: By shifting the burden of error correction away from software layers and directly onto the physical characteristics of the silicon substrate, Microsoft’s architecture projects an unprecedentedly tight physical-to-logical qubit ratio. This structural efficiency translates the deployment of 1 million logical qubits from a logistical impossibility into an actionable engineering roadmap capable of fitting within standard data center footprints.
Cross-Disciplinary Fabrication Concurrency: Deep visibility into elite foundry logistics reveals that the realization of the Majorana 2 platform depended entirely on translating molecular beam epitaxy (MBE) research into standard semiconductor cleanroom environments. Microsoft successfully bonded high-purity semiconductor nanowires to highly uniform superconducting shells at an atomic level, shifting topological quantum computing away from academic observation and straight into high-yield industrial fabrication.
The 2029 Operational Matrix: Sovereign Compute & Advanced LLM Integration
Microsoft intends to transition the Majorana 2 compute fabric into the foundational compute engine of its Azure Quantum platform. By targeting the 2029 commercial window for its million-qubit infrastructure, the enterprise seeks to unlock massive market premiums across high-barrier sectors:
Exact Molecular and Catalyst Simulation: Achieving 1 million logical qubits grants the hardware the capability to natively simulate the quantum states of multi-atomic molecules without approximation. This capability collapses materials-science development loops from decades to days, empowering enterprises to synthesize room-temperature superconductors, optimize high-density BMS storage matrices, and discover zero-carbon industrial chemical catalysts.
Quantum-Accelerated Reasoning Engines: While emerging client-side AI PCs and cloud-hosted custom silicon process foundational agentic reasoning efficiently, hyper-complex optimization tasks remain constrained by von Neumann energy boundaries. Microsoft plans to interface its quantum cloud directly with its MAI-Thinking reasoning pipelines, allowing long-horizon autonomous agents to evaluate trillions of systemic interactions concurrently within quantum sandboxes.
Post-Quantum Cryptographic Isolation: To secure corporate databases ahead of this massive transition in computing power, Microsoft is accelerating the integration of Post-Quantum Cryptography (PQC) across its global data center perimeters. By implementing deterministic quantum-safe authorization layers, the enterprise ensures that sovereign asset variations and multi-tenant telemetry remain fully isolated against emerging algorithmic decryption threats.
#Microsoft #Majorana2 #QuantumComputing #TopologicalQubit #AzureQuantum #Semiconductor #TechFinance #DeepTech #AIAgent #FutureComputing
Microsoft has officially unveiled its new quantum computing hardware component, the Majorana 2 chip.
In a significant shift, researchers utilized advanced AI materials-science tools to bypass standard manufacturing limits, successfully integrating lead, a water-soluble material historically avoided in chip fabrication, into the architecture.
Microsoft claims this AI-driven materials breakthrough puts them on a definitive timeline to deploy commercially viable, fault-tolerant quantum machines by 2029.
Introducing Cosmos 3: Our latest frontier model for Physical AI
Cosmos 3 is the world’s first fully open omnimodel with native vision reasoning, world and action generation.
Today we’re releasing Super (32B) and Nano (8B) variants.
@MistralAI : One of the few #AI companies with a focused approach, across their hardware stack, #data sovereignity, language models, hosting, functionality...not boiling the ocean.
#MistralVibe
SHIPPED. Mistral Vibe is now the AI agent for long-horizon productivity and coding, and the home for Work mode, Code mode, the CLI, and a brand new VS Code extension. Let's go... 🧵
This #AI PC might create it's own elite category similar to @Apple 's Macs. 128GB of unified memory for #GPUs (big deal).
Hope-price doesn't turn out to be a big deal.
@nvidia#RTXSpark brings CUDA, Blackwell and local AI agents to thin Windows laptops 👏https://t.co/d7k3bGTS8e
What if you could take three completely different model families… and distill them into one tiny model? 🤯
📜 Paper: https://t.co/K2iKD4xFvp
MOPD (Multi-Teacher On-Policy Distillation) has become a standard procedure in post-training. We already distill multiple specialized variants of the same model into a single set of weights.
But what if we could go further - and distill models from entirely different families? Turns out, it is possible.
Today we’re releasing a paper on cross-tokenizer distillation - our first steps in this exciting direction. 📄
We distilled Qwen3-4B, Phi-4-Mini, and Llama-3B into Llama-3.2-1B.
MMLU jumped from 32.05 → 46.32 when using multiple teachers. 📈
The team is now working on Nemo-RL integration so the community can try this method in their own settings. Plus, we are scaling experiments up. 🚀
🏹5 Days of Trajectory.
Day 3 - An Open Source Training Stack for Continual Learning
Building the platform for continual learning requires both partnering with pioneering AI companies, as we showed on Day 2 with Harvey, and working toward frontier research, which we are highlighting today.
Continual learning means models that improve hourly from real production use. But with the size of frontier models, this becomes quite difficult. A Qwen-397b would need to spin up and tear down repeatedly across six GPU nodes, and that's valuable time gone.
Our contribution is Continual LoRA (C-LoRA): many lightweight adapters running at once on one shared base model. Our insight centers on where the parallelism lives: instead of splitting one giant job across nodes, we load-balance many small jobs over a single base.
The result: 2.81x experiment throughput over single-tenant training, with no regression on rewards.
We built this together, with @anyscalecompute, @NovaSkyAI, and generous support from @GoogleCloud and @GoogleStartups. We've open-sourced on SkyRL as one of the first multi-LoRA, RL training platforms, so that every team can get to continual learning faster.
We’re very excited to see what you build, please reach out!
@nvidia consistently puts out lots of research, explores new fields but this might be my favorite.
Good and much needed #opensource fundamental work by @NVIDIAAI for Physics AI.
#PhysicsNemo
https://t.co/6AmSO4qqwQ
When #RAG gets too expensive & inaccurate for large corpus, #MeMo (Memory as a Model) steps in.
Computationally efficient (though 1st time training, eats up lots of #GPU hours), transferrable to other models & retains accuracy. #AI
🚨 LLMs are frozen after pretraining, but the world keeps changing. How do you give an LLM new knowledge without retraining it, bloating its context, or breaking what it already knows?
Existing methods hit a wall:
🔸 RAG is brittle to retrieval noise and struggles with cross-document reasoning;
🔸 Fine-tuning is expensive and causes catastrophic forgetting;
🔸 Latent memory is tightly coupled to the model that produced it.
👉 Key question: Can we encode knowledge into a small, dedicated memory model that any LLM can query without accessing the LLM itself?
🚀 Introducing MeMo (Memory as a Model) 🚀
We train a dedicated MEMORY model on a reflection Question-Answer dataset synthesized from the target corpus. At inference, a frozen EXECUTIVE model (any LLM, including closed-source models) queries the MEMORY model through a structured 3-stage protocol that decomposes complex queries into targeted sub-queries to retrieve precise, noise-robust knowledge and reasons over the responses.
🔥 Key Highlights
🧠 5-step data synthesis pipeline captures explicit facts, implicit relationships, and cross-document connections as reflections;
🛡️ Robust to retrieval noise: where RAG drops up to 6.22% with added distractors, MeMo holds steady;
🔌 Plug-and-play with any LLM, no weights, gradients, or logits required;
📦 Fixed inference cost, independent of corpus size;
🔄 Continual integration via model merging: 33% compute savings over full retraining and scaling benefits grow with the number of corpora.
📊 Strong results across BrowseComp-Plus, NarrativeQA, and MuSiQue, matching or outperforming retrieval baselines (BM25, NV-Embed-V2, HippoRAG2) with gains of up to 27% on NarrativeQA when paired with Gemini-3-Flash.
💡 Why this matters
MeMo decouples knowledge from reasoning: Train memory once with a small open model, then plug it into the frontier LLM of your choice. No retraining as new corpora arrive, no fragile retrieval pipelines, and full compatibility with proprietary APIs, paving the way for scalable knowledge-aware AI systems.
🤝 Joint work with @workryanq_nus, @961014dltkdg, @alfredleongwl, Alok Prakash, Nancy F. Chen, @arun_v3rma, Daniela Rus, and Armando Solar-Lezama
📄 Paper: https://t.co/9FrL4CH9O2
💻 Code: https://t.co/wiOnH0LKll
🌐 Project page: https://t.co/xsRHFxQIwY
🤗 Huggingface: https://t.co/HZTSC1s81X
#LLMs #KnowledgeIntegration #MemoryAugmentedLLMs #RAG #ModelMerging
Windows users, this one’s for you.
Computer use now works on Windows, so Codex can take action on your Windows computer.
And with Windows support for Codex in the ChatGPT mobile app, you can start, review, and steer tasks on the go while work continues on your Windows machine.
An early experience, but we’re working on more ways to keep your work moving, wherever you are.
This benign & honest post has triggered #racist NAM #HR into #blacklisting action, again. How much on the psychological brink can a 'Quandrant-Hype Cycle' #CHRO really be to self-combust at every frank opinion?
Maybe #AI knows.
Plagiarize but atleast TRY to get it right.
(Hint, thsi role has Nothing to do w/ Analyst Relations or the world of Quadrants, Waves, MarketScape ... or other archaic "thought leader" practices). Let go.
#AI serves better.
@huggingface made #ReinforcementLearning token, compute & memory efficient.
(Similar work is also underway at proprietary models & architecture,, as well; selective weight update, KVcache mngmnt., quantization, sparsity- all of it together, etc.)
The HF science team just made async RL weight sync ~100x cheaper on bandwidth, and you don't need a shared cluster anymore.
The problem: every RL step, the trainer typically has to sync fresh weights to the inference engine. for a 7B in bf16 that's ~14GB. for a frontier 1T fp8 checkpoint, that's ~1TB; in bf16 it would be ~2TB. per sync.
The insight: between two RL steps, ~99% of bf16 weights are bit-identical. at RL learning rates, the optimizer is whispering and bf16 literally cannot hear most of it. the stored bf16 bits don't change.
What they shipped in TRL: only the changed elements get encoded as a sparse safetensors file, dropped into a Hugging Face Bucket, and fetched by vLLM. on Qwen3-0.6B, per-step payload goes from 1.2 GB to 20 to 35 MB. This is exactly what we built Buckets for: S3-like object storage on the Hub, Xet-backed (so even full snapshots only transfer the changed chunks).
The cherry on top: we ran a FULL disaggregated training where:
- the trainer lived on one box
- vLLM ran inside a Hugging Face Space
- the Wordle environment ran in another Space
- weights flowed through one Hub bucket
no shared cluster. no RDMA. no VPN. no NCCL across clouds. just HTTPS and a bucket.
one GPU + a Hugging Face account is now enough to do real disaggregated RL. multi-replica inference fleets across regions become a small devops exercise, not a research project.
Full write-up: https://t.co/CG115IjT0q
Open source RL keeps eating the moat!
@pervaizalam: Thank you for sharing Prof. Mukulika Banerjee's thoughts on Indian economics.
Clearly & brazenly ill-informed lady. Lacks basic understanding of GST.
Reflects very poorly on @LSEEcon .
Please vet your speakers & fact check their statements in the future. Thank you.
India’s poorest pay higher effective tax rates than the rich, finds major new study by Prof Mukulika Banerjee of the LSE.
Speaking on cine ink podcast London Vārta: New World Order, Prof Banerjee notes:
Lower-income Indians are shouldering a disproportionately heavy tax burden compared with the wealthiest sections of society, according to new research that challenges widely held assumptions about the country’s tax system.
Dr Mukulika Banerjee, a leading political anthropologist, has revealed that while everyone pays the same Goods and Services Tax (GST) on everyday items, the impact falls far more heavily on the poor when measured as a proportion of their income.
“Everyone pays GST – indirect tax – and the poor in India end up paying a higher proportion of their income in tax than the rich,” Dr Banerjee explained. “If you buy a packet of biscuits and a rickshaw puller also buys a packet of biscuits, the GST charged on that packet is exactly the same for both of you. But the rickshaw puller earns far less than you, so in proportion to his income, he is paying a much higher rate of tax. When you aggregate this across the country, the bottom fifty per cent of the Indian population is paying a higher proportion of their income in tax.”
Her findings, part of a British Academy-Leverhulme Senior Fellowship, raise serious questions about taxation, inequality, and the health of India’s democracy. Fieldwork data paints a stark picture of the country’s extreme wealth gap: a daily-wage construction worker, fruit vendor, or pavement tailor earning around ₹30,000 per month qualifies for the top 10 per cent of earners, while half of all Indians survive on just ₹6,000 a month. Meanwhile, the top 1 per cent captures a strikingly large share of national income.
#londonvārta #profmukulikabanerjee #cineinkpodcast #hindipodcast
Plagiarize but atleast TRY to get it right.
(Hint, thsi role has Nothing to do w/ Analyst Relations or the world of Quadrants, Waves, MarketScape ... or other archaic "thought leader" practices). Let go.
#AI serves better.
When launching my professional project (DTInnovate) in 2017, I coined the term " Principal Product Strategist'.
Since 2025, started noticing several Product Strategist role listings on LinkedIn. Fluke? Think not.
Except, tier 2 companies conflate 10 diff. roles & offer peanuts.