FW @thegenerality - Twitter Profile

FW @thegenerality

about 2 months ago

More #BitNet_LLMs and models

NVIDIA AI Developer

@NVIDIAAIDev

about 2 months ago

Great to see @TIIuae push the boundaries of LLM training with their Falcon‑H1 parallel hybrid design and BitNet ternary training—now integrated into NVIDIA Megatron Core. Their work shows how foundation model builders can innovate on top of our scalable framework while improving efficiency and sharing reusable tools with the open source community. 🙌 👇 Read the full technical blog: https://t.co/2CloA8zomQ

0

33

8

4K

0

2

0

1

108

FW @thegenerality

3 months ago

Online Experiential Learning

Tianzhu Ye @ytz2024

3 months ago

(1/n) Introduce Online Experiential Learning toward the era of experience. Beyond offline pre-constructed training data, models can learn online from their own deployment experience across infinite, unsimulable real-world environments. Accumulate, consolidate, self-improve 🔄

ytz2024's tweet photo. (1/n) Introduce Online Experiential Learning toward the era of experience. Beyond offline pre-constructed training data, models can learn online from their own deployment experience across infinite, unsimulable real-world environments. Accumulate, consolidate, self-improve 🔄 https://t.co/NFKndRIHSp

18

306

48

261

29K

0

3

0

1

187

FW @thegenerality

3 months ago

VibeVoice-ASR is now officially in HF Transformers and Microsoft Foundry.

Alvaro Bartolome

@alvarobartt

3 months ago

💥 New example out! Deploy @Microsoft VibeVoice-ASR on Microsoft Foundry with @huggingface for multi-lingual STT! Structured output with Who (Speaker), When (Timestamps), and What (Content), up to 60 minutes in a single pass. Step-by-step in the thread 🧵

alvarobartt's tweet photo. 💥 New example out!

Deploy @Microsoft VibeVoice-ASR on Microsoft Foundry with @huggingface for multi-lingual STT!

Structured output with Who (Speaker), When (Timestamps), and What (Content), up to 60 minutes in a single pass.

Step-by-step in the thread 🧵 https://t.co/f6D0QvUixA

9

89

12

76

18K

1

6

3

1

1K

FW @thegenerality

4 months ago

Experiential Learning -- Part I: On-Policy Context Distillation for Experiential Learning

Li Dong @donglixp

4 months ago

On-Policy Context Distillation for Experiential Learning: learning from experience (consolidated from trajectories) at test time.

1

69

10

41

10K

0

3

0

1

142

Who to follow

Ryan Yuan

@RainbowYuhui

Research Director@Canva; ex-MSR. Build a research team focused on fundamental research for world-leading graphic design generation. Email: [email protected]

Liang Chen

@liangchen5518

Cofounder of @UniPat_AI. I worked at Moonshot AI, Alibaba Qwen and Microsoft Research Asia.

Lei Li

@_TobiasLee

Ph.D. student @hkunlp2020. Prev. @RekaAILabs @PKU1898

FW @thegenerality

5 months ago

LLM-in-Sandbox

DailyPapers

@HuggingPapers

5 months ago

LLM-in-Sandbox Microsoft Research puts LLMs in a virtual computer to unlock agentic intelligence for non-code tasks. No extra training needed—models spontaneously access resources and run scripts. Works across math, physics, chemistry, biomedicine and more.

HuggingPapers's tweet photo. LLM-in-Sandbox

Microsoft Research puts LLMs in a virtual computer to unlock agentic intelligence for non-code tasks.

No extra training needed—models spontaneously access resources and run scripts.

Works across math, physics, chemistry, biomedicine and more. https://t.co/reauQe1Kol

2

39

7

23

2K

0

3

1

0

267

FW @thegenerality

5 months ago

#VibeVoice-ASR - your new ASR model in the era of LLM

DailyPapers

@HuggingPapers

5 months ago

Microsoft just released VibeVoice-ASR on Hugging Face A unified speech-to-text model that transcribes hour-long audio in one pass With built-in speaker diarization, timestamps, and customizable user context

HuggingPapers's tweet photo. Microsoft just released VibeVoice-ASR on Hugging Face

A unified speech-to-text model that transcribes hour-long audio in one pass

With built-in speaker diarization, timestamps, and customizable user context https://t.co/BL46v7BxIc

5

255

36

237

24K

0

3

0

132

thegenerality retweeted

DailyPapers

@HuggingPapers

5 months ago

Microsoft just released VibeVoice-ASR on Hugging Face A unified speech-to-text model that transcribes hour-long audio in one pass With built-in speaker diarization, timestamps, and customizable user context

5

255

36

237

24K

FW @thegenerality

5 months ago

Differential Transformer V2 (DIFF V2)

Tianzhu Ye @ytz2024

5 months ago

Introduce Differential Transformer V2 (DIFF V2), an improved version of Differential Transformer. This revision focuses on inference efficiency, training stability, and architectural elegance. We verify the design on production-scale LLMs.

ytz2024's tweet photo. Introduce Differential Transformer V2 (DIFF V2), an improved version of Differential Transformer. This revision focuses on inference efficiency, training stability, and architectural elegance. We verify the design on production-scale LLMs. https://t.co/SxBrvgHV9b

15

561

49

369

44K

0

2

0

196

5 months ago

6 months ago

Anyways check out this 60s clip I made with a single image of @lexfridman and a 20s audio recording of his voice. Full continuous shot with no cuts created with LongCat Avatar in ComfyUI. VibeVoice for the audio.

20

143

9

152

15K

0

1

0

72

FW @thegenerality

6 months ago

New #VibeVoice model released - VibeVoice Realtime 0.5B for realtime, streaming, robust long-form TTS

AK

@_akhaliq

6 months ago

Microsoft just released VibeVoice-Realtime-0.5B https://t.co/uIqp5yrZM6

24

2K

232

2K

218K

0

2

0

129

FW @thegenerality

7 months ago

Generative Adversarial Distillation for Black-Box On-Policy Distillation of LLMs

Tianzhu Ye @ytz2024

7 months ago

🚀 We propose Generative Adversarial Distillation (GAD) 🤖 Designed to perform on-policy distillation from proprietary black-box LLMs. ➡️ Requires neither access to teacher logits nor alignment of tokenizer vocabularies. (1/n)

ytz2024's tweet photo. 🚀 We propose Generative Adversarial Distillation (GAD)
🤖 Designed to perform on-policy distillation from proprietary black-box LLMs.
➡️ Requires neither access to teacher logits nor alignment of tokenizer vocabularies.

(1/n) https://t.co/db1XDmz4Vb

5

24

10

6

2K

0

6

1

180

thegenerality retweeted

Robert Youssef

@rryssf

7 months ago

🚨 Microsoft Research just launched something that might define the next era of AI systems. They call it 'Agentic Organization' and it’s not just a new model. It’s a new way for intelligence itself to organize. Here’s what’s wild: Most large language models still “think” like a single brain. Step-by-step. Linear. Slow. Even “parallel thinking” just runs the same process twice and merges answers later. Agentic Organization changes the entire game. They built a new reasoning protocol called AsyncThink, where a model plays both roles an Organizer that breaks a complex problem into sub-queries, and Workers that solve those sub-parts at the same time. Think of it like this: Instead of one mind grinding through steps, AsyncThink forms a mini civilization of minds delegating, merging, adapting in real time. And it learns this behavior through reinforcement learning literally learning how to organize its own thoughts. The results are insane: → 28% lower inference latency than parallel thinking → Higher accuracy on math reasoning tasks → Zero-shot generalization to unseen problems like Sudoku → Learned organizational policies that evolve dynamically during reasoning It’s like scaling from “an intelligent agent” → to “an intelligent organization.” AsyncThink models don’t just reason faster they reason like teams do. Fork. Think. Join. Verify. Iterate. This is a glimpse of post-LLM intelligence systems that don’t just think, they coordinate thought. And if that holds, the future of AI might look less like a single brain… and more like a company of minds. Paper: The Era of Agentic Organization: Learning to Organize with Language Models

rryssf's tweet photo. 🚨 Microsoft Research just launched something that might define the next era of AI systems.

They call it 'Agentic Organization' and it’s not just a new model. It’s a new way for intelligence itself to organize.

Here’s what’s wild:

Most large language models still “think” like a single brain.
Step-by-step. Linear. Slow. Even “parallel thinking” just runs the same process twice and merges answers later.

Agentic Organization changes the entire game.

They built a new reasoning protocol called AsyncThink, where a model plays both roles an Organizer that breaks a complex problem into sub-queries, and Workers that solve those sub-parts at the same time.

Think of it like this:

Instead of one mind grinding through steps, AsyncThink forms a mini civilization of minds delegating, merging, adapting in real time.

And it learns this behavior through reinforcement learning literally learning how to organize its own thoughts.

The results are insane:

→ 28% lower inference latency than parallel thinking
→ Higher accuracy on math reasoning tasks
→ Zero-shot generalization to unseen problems like Sudoku
→ Learned organizational policies that evolve dynamically during reasoning

It’s like scaling from “an intelligent agent” → to “an intelligent organization.”

AsyncThink models don’t just reason faster they reason like teams do.
Fork. Think. Join. Verify. Iterate.

This is a glimpse of post-LLM intelligence systems that don’t just think, they coordinate thought.

And if that holds, the future of AI might look less like a single brain… and more like a company of minds.

Paper: The Era of Agentic Organization: Learning to Organize with Language Models

44

1K

218

1K

127K

FW @thegenerality

7 months ago

The Era of Agentic Organization

Rohan Paul

@rohanpaul_ai

7 months ago

New @Microsoft paper teaches LLMs to organize reasoning into concurrent subtasks for faster, more accurate answers. It shows 28% lower wait time than typical parallel thinking while also boosting math accuracy. The big deal is simple, it turns coordination into a skill the model learns, so it decides when to split work, when to wait, and when to merge. The usual single chain wastes time because each step blocks the next. Fixed parallel plans also waste time because they cannot adapt to each query. The fix is an organizer that writes simple Fork and Join tags to start and merge worker thoughts. Workers chase sub-queries in parallel while the organizer keeps thinking and only pauses to Join. All control lives in plain text, so the base model stays unchanged. Training happens in 2 stages, first supervised traces that teach the tag format. Then reinforcement learning rewards correct final answers, clean format, and real concurrency. Speed is measured by the critical path through the Fork-Join graph, which matches true waiting. Across countdown puzzles, math questions, and Sudoku, the learned policy runs faster and fails less. The big idea is to learn organization itself rather than hard-code a script. ---- Paper – arxiv. org/abs/2510.26658 Paper Title: "The Era of Agentic Organization: Learning to Organize with Language Models"

rohanpaul_ai's tweet photo. New @Microsoft paper teaches LLMs to organize reasoning into concurrent subtasks for faster, more accurate answers.

It shows 28% lower wait time than typical parallel thinking while also boosting math accuracy.

The big deal is simple, it turns coordination into a skill the model learns, so it decides when to split work, when to wait, and when to merge.

The usual single chain wastes time because each step blocks the next.

Fixed parallel plans also waste time because they cannot adapt to each query.

The fix is an organizer that writes simple Fork and Join tags to start and merge worker thoughts.

Workers chase sub-queries in parallel while the organizer keeps thinking and only pauses to Join.

All control lives in plain text, so the base model stays unchanged.

Training happens in 2 stages, first supervised traces that teach the tag format.

Then reinforcement learning rewards correct final answers, clean format, and real concurrency.

Speed is measured by the critical path through the Fork-Join graph, which matches true waiting.

Across countdown puzzles, math questions, and Sudoku, the learned policy runs faster and fails less.

The big idea is to learn organization itself rather than hard-code a script.

----

Paper – arxiv. org/abs/2510.26658

Paper Title: "The Era of Agentic Organization: Learning to Organize with Language Models"

18

349

58

285

21K

0

3

0

87

thegenerality retweeted

Li Dong @donglixp

8 months ago

On-policy + Reverse KLD = MiniLLM (https://t.co/MSlVNWGclo). Really nice blog by @thinkymachines. Exciting to see it being offered as a service!

donglixp's tweet photo. On-policy + Reverse KLD = MiniLLM (https://t.co/MSlVNWGclo). Really nice blog by @thinkymachines. Exciting to see it being offered as a service! https://t.co/1ifaucn21r

1

161

25

77

20K

FW @thegenerality

8 months ago

BitDistill finetunes any full-precision LLMs into 1.58-bit for specific tasks with the same peformance

AK

@_akhaliq

8 months ago

Microsoft presents BitNet Distillation

10

463

79

274

62K

0

5

3

2

462

FW @thegenerality

9 months ago

Introducing Thinking Augmented Pre-Training (#TPT) as a simple, general, scalable and effective technique for future mid-training and/or pre-training recipes.

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

9 months ago

Thinking Augmented Pre-training "we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens more learnable through step-by-step reasoning and decomposition." "Notably, TPT enhances the data efficiency of LLM pre-training by a factor of 3. For a 3B parameter model, it improves the post-training performance by over 10% on several challenging reasoning benchmarks."

iScienceLuvr's tweet photo. Thinking Augmented Pre-training

"we propose Thinking augmented Pre-Training (TPT), a universal methodology that augments text with automatically generated thinking trajectories. Such augmentation effectively increases the volume of the training data and makes high-quality tokens more learnable through step-by-step reasoning and decomposition."

"Notably, TPT enhances the data efficiency of LLM pre-training by a factor of 3. For a 3B parameter model, it improves the post-training performance by over 10% on several challenging reasoning benchmarks."

10

496

77

386

50K

0

3

0

122

FW @thegenerality

10 months ago

A cool demo of VibeVoice (together with Wan 2.2). This will enable many new scenarios and applications. #The_Era_of_Vibe_Media

Wildminder

@wildmindai

10 months ago

VibeVoice-7B-Preview with 32K Context Length. Wan2.2 + VibeVoice is pretty nice https://t.co/i4RQeDMv6z

0

16

0

7

3K

0

4

0

153

FW @thegenerality

10 months ago

#VibeVoice Vibd Podcasting

Axel Dittmann

@DittmannAxel

10 months ago

#Microsoft's VibeVoice-1.5B just turned my rig into a podcast studio. 4 voices. Zero API costs. Running locally on a consumer GPU. Generated 5 test podcasts instantaneously - they sound surprisingly human. Setup took 30 minutes: clone repo, load model (most of the time - rural Germany), feed script, press play. The open-source podcast revolution is here, and it fits in your home rig. Who needs cloud subscriptions when innovation runs at localhost? 🎙️ #GenAI #LocalAI #Podcasting #VibeVoice

1

2

0

2

467

0

1

0

74

FW @thegenerality

10 months ago

Vibe Podcasting with VibeVoice - examples at https://t.co/MOpQSc3Huk

DailyPapers

@HuggingPapers

10 months ago

Microsoft just dropped VibeVoice on Hugging Face A novel framework generating expressive, long-form, multi-speaker conversational audio like podcasts from text. Synthesizes up to 90 minutes of speech with up to 4 distinct speakers! https://t.co/Yg4gzs3hp7

12

732

114

593

56K

0

2

0

199

FW @thegenerality

12 months ago

Reinforcement Pre-training #RPT

elvis

@omarsar0

12 months ago

Reinforcement Pre-Training New pre-training paradigm for LLMs just landed on arXiv! It incentivises effective next-token reasoning with RL. This unlocks richer reasoning capabilities using only raw text and intrinsic RL signals. A must-read! Bookmark it! Here are my notes:

omarsar0's tweet photo. Reinforcement Pre-Training

New pre-training paradigm for LLMs just landed on arXiv!

It incentivises effective next-token reasoning with RL.

This unlocks richer reasoning capabilities using only raw text and intrinsic RL signals.

A must-read! Bookmark it!

Here are my notes: https://t.co/DoWX7mWKIh

15

501

88

595

69K

0

4

0

121

FW

@thegenerality

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users