Great to see @TIIuae push the boundaries of LLM training with their Falcon‑H1 parallel hybrid design and BitNet ternary training—now integrated into NVIDIA Megatron Core.
Their work shows how foundation model builders can innovate on top of our scalable framework while improving efficiency and sharing reusable tools with the open source community. 🙌
👇 Read the full technical blog: https://t.co/2CloA8zomQ
(1/n) Introduce Online Experiential Learning toward the era of experience. Beyond offline pre-constructed training data, models can learn online from their own deployment experience across infinite, unsimulable real-world environments. Accumulate, consolidate, self-improve 🔄
💥 New example out!
Deploy @Microsoft VibeVoice-ASR on Microsoft Foundry with @huggingface for multi-lingual STT!
Structured output with Who (Speaker), When (Timestamps), and What (Content), up to 60 minutes in a single pass.
Step-by-step in the thread 🧵
LLM-in-Sandbox
Microsoft Research puts LLMs in a virtual computer to unlock agentic intelligence for non-code tasks.
No extra training needed—models spontaneously access resources and run scripts.
Works across math, physics, chemistry, biomedicine and more.
Microsoft just released VibeVoice-ASR on Hugging Face
A unified speech-to-text model that transcribes hour-long audio in one pass
With built-in speaker diarization, timestamps, and customizable user context
Microsoft just released VibeVoice-ASR on Hugging Face
A unified speech-to-text model that transcribes hour-long audio in one pass
With built-in speaker diarization, timestamps, and customizable user context
Introduce Differential Transformer V2 (DIFF V2), an improved version of Differential Transformer. This revision focuses on inference efficiency, training stability, and architectural elegance. We verify the design on production-scale LLMs.
Anyways check out this 60s clip I made with a single image of @lexfridman and a 20s audio recording of his voice.
Full continuous shot with no cuts created with LongCat Avatar in ComfyUI.
VibeVoice for the audio.
🚀 We propose Generative Adversarial Distillation (GAD)
🤖 Designed to perform on-policy distillation from proprietary black-box LLMs.
➡️ Requires neither access to teacher logits nor alignment of tokenizer vocabularies.
(1/n)
🚨 Microsoft Research just launched something that might define the next era of AI systems.
They call it 'Agentic Organization' and it’s not just a new model. It’s a new way for intelligence itself to organize.
Here’s what’s wild:
Most large language models still “think” like a single brain.
Step-by-step. Linear. Slow. Even “parallel thinking” just runs the same process twice and merges answers later.
Agentic Organization changes the entire game.
They built a new reasoning protocol called AsyncThink, where a model plays both roles an Organizer that breaks a complex problem into sub-queries, and Workers that solve those sub-parts at the same time.
Think of it like this:
Instead of one mind grinding through steps, AsyncThink forms a mini civilization of minds delegating, merging, adapting in real time.
And it learns this behavior through reinforcement learning literally learning how to organize its own thoughts.
The results are insane:
→ 28% lower inference latency than parallel thinking
→ Higher accuracy on math reasoning tasks
→ Zero-shot generalization to unseen problems like Sudoku
→ Learned organizational policies that evolve dynamically during reasoning
It’s like scaling from “an intelligent agent” → to “an intelligent organization.”
AsyncThink models don’t just reason faster they reason like teams do.
Fork. Think. Join. Verify. Iterate.
This is a glimpse of post-LLM intelligence systems that don’t just think, they coordinate thought.
And if that holds, the future of AI might look less like a single brain… and more like a company of minds.
Paper: The Era of Agentic Organization: Learning to Organize with Language Models
New @Microsoft paper teaches LLMs to organize reasoning into concurrent subtasks for faster, more accurate answers.
It shows 28% lower wait time than typical parallel thinking while also boosting math accuracy.
The big deal is simple, it turns coordination into a skill the model learns, so it decides when to split work, when to wait, and when to merge.
The usual single chain wastes time because each step blocks the next.
Fixed parallel plans also waste time because they cannot adapt to each query.
The fix is an organizer that writes simple Fork and Join tags to start and merge worker thoughts.
Workers chase sub-queries in parallel while the organizer keeps thinking and only pauses to Join.
All control lives in plain text, so the base model stays unchanged.
Training happens in 2 stages, first supervised traces that teach the tag format.
Then reinforcement learning rewards correct final answers, clean format, and real concurrency.
Speed is measured by the critical path through the Fork-Join graph, which matches true waiting.
Across countdown puzzles, math questions, and Sudoku, the learned policy runs faster and fails less.
The big idea is to learn organization itself rather than hard-code a script.
----
Paper – arxiv. org/abs/2510.26658
Paper Title: "The Era of Agentic Organization: Learning to Organize with Language Models"
Introducing Thinking Augmented Pre-Training (#TPT) as a simple, general, scalable and effective technique for future mid-training and/or pre-training recipes.
Thinking Augmented Pre-training
"we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens more learnable through step-by-step reasoning and decomposition."
"Notably, TPT enhances the data efficiency of LLM pre-training by a factor of 3. For a 3B parameter model, it improves the post-training performance by over 10% on several challenging reasoning benchmarks."
#Microsoft's VibeVoice-1.5B just turned my rig into a podcast studio.
4 voices. Zero API costs. Running locally on a consumer GPU.
Generated 5 test podcasts instantaneously - they sound surprisingly human. Setup took 30 minutes: clone repo, load model (most of the time - rural Germany), feed script, press play.
The open-source podcast revolution is here, and it fits in your home rig.
Who needs cloud subscriptions when innovation runs at localhost? 🎙️
#GenAI #LocalAI #Podcasting #VibeVoice
Microsoft just dropped VibeVoice on Hugging Face
A novel framework generating expressive, long-form, multi-speaker conversational audio like podcasts from text.
Synthesizes up to 90 minutes of speech with up to 4 distinct speakers!
https://t.co/Yg4gzs3hp7
Reinforcement Pre-Training
New pre-training paradigm for LLMs just landed on arXiv!
It incentivises effective next-token reasoning with RL.
This unlocks richer reasoning capabilities using only raw text and intrinsic RL signals.
A must-read! Bookmark it!
Here are my notes: