Dmitry Vengertsev @data_veng - Twitter Profile

about 2 months ago

We’re adding more visibility into where your Claude Code usage goes. Run /usage to see a breakdown of what's driving it: parallel sessions, subagents, cache misses, long context, plus tips to optimize each.

ClaudeDevs's tweet photo. We’re adding more visibility into where your Claude Code usage goes.

Run /usage to see a breakdown of what's driving it: parallel sessions, subagents, cache misses, long context, plus tips to optimize each. https://t.co/wn3HaGomRN

226

7K

454

2K

464K

data_veng retweeted

Andrew Carr 🤸

@andrew_n_carr

about 2 months ago

meta muse spark crushes one of my hard benchmarks "recommended me something good to read that I am certain to have never read before" theres lots of theory of mind involved, most models recommend the same 20 or so pieces of work. everything spark returned was novel, weird, and good. I had to heard of most of them and they were fun reads.

8

238

12

50

40K

data_veng retweeted

Simon Vandenhende @svandenh1

about 2 months ago

Our multimodal perception model MUSE Spark just entered the LMSYS Vision Arena in 2nd place — ahead of GPT and Gemini 🚀🚀🚀 Try it yourself in the updated Meta AI app! https://t.co/6t6Hn3xKhf

svandenh1's tweet photo. Our multimodal perception model MUSE Spark just entered the LMSYS Vision Arena in 2nd place — ahead of GPT and Gemini 🚀🚀🚀

Try it yourself in the updated Meta AI app! https://t.co/6t6Hn3xKhf https://t.co/1Ti2EqRYUA

4

74

6

5

5K

data_veng retweeted

Khairallah AL-Awady

@eng_khairallah1

about 2 months ago

INSTEAD OF WATCHING NETFLIX TONIGHT. Spend 3 hours with this. Claude Code FULL COURSE that teaches you how to BUILD apps, teams, and anything. The people who watch this tonight will wake up tomorrow with a new skill. Watch it and Bookmark it now.

54

3K

441

5K

230K

Who to follow

Matthew Dubay

@dufray87

Analyst, artist, designer, dreamer, father, husband, learner, reader. Let's go on an adventure!

Rishab Sharma #dominoforever #udp #lcnc

@rishabsharma

Digital Transformation + Collaboration SME working as Director Technical - Digital Solutions - HCL Software. Tweets are personal. RT's are not endorsements.

Michael Eleftheriades

@melven

#Linux will work on #Mars. Working on @pukno_ai - your personal AI 🤖 digital memory. MBA, BSc, MSc Computer science. #Python now working with #Rust.

data_veng retweeted

Nainish Rai @Nain1sh

2 months ago

That clip is the part most people will underestimate. Image-to-code was already impressive What Meta’s Muse Spark seems to be doing is one level higher: it’s not just recreating pixels, it’s inferring product logic. I gave it a calendar screenshot and I am blown

3

95

8

26

71K

data_veng retweeted

Arena.ai

@arena

about 2 months ago

Meta is back in the Arena! Muse Spark debuts as a top frontier model across both Text and Vision: - Text Arena: #3 tied with Gemini-3.1-Pro and Claude-Opus-4.6 - Vision Arena: #2 tied with Claude-Opus-4.6 This marks Meta’s first major release since early 2025. Highlights: - #4 Hard Prompts, #6 Coding, #9 Creative Writing, #10 Instruction Following, #27 Expert - #3 tied for Business, Management, & Financial Ops, #7 Legal & Government, #12 Writing & Literature Meta is back at the frontier. Huge congrats to @AIatMeta on this incredible milestone!

arena's tweet photo. Meta is back in the Arena!
Muse Spark debuts as a top frontier model across both Text and Vision:
- Text Arena: #3 tied with Gemini-3.1-Pro and Claude-Opus-4.6
- Vision Arena: #2 tied with Claude-Opus-4.6

This marks Meta’s first major release since early 2025.

Highlights:
- #4 Hard Prompts, #6 Coding, #9 Creative Writing, #10 Instruction Following, #27 Expert
- #3 tied for Business, Management, & Financial Ops, #7 Legal & Government, #12 Writing & Literature

Meta is back at the frontier. Huge congrats to @AIatMeta on this incredible milestone!

27

801

70

96

149K

data_veng retweeted

Jason Weston

@jaseweston

2 months ago

🏋️Thinking Mid-training: RL of Interleaved Reasoning🎗️ We address the gap between pretraining (no explicit reasoning) and post-training (reasoning-heavy) with an intermediate SFT+RL mid-training phase to teach models how to think. - Annotate pretraining data with interleaved thoughts - SFT mid-training to learn when/what to think alongside original content - RL mid-training to optimize reasoning generation with grounded reward from future token prediction Result: 3.2x improvement on reasoning benchmarks compared to direct RL post-training on base Llama-3-8B, and gains over only prior SFT as well. Introducing reasoning earlier makes models better prepared for post-training! Read more in the blog post: https://t.co/aGJcQKUjRw

jaseweston's tweet photo. 🏋️Thinking Mid-training: RL of Interleaved Reasoning🎗️

We address the gap between pretraining (no explicit reasoning) and post-training (reasoning-heavy) with an intermediate SFT+RL mid-training phase to teach models how to think.

- Annotate pretraining data with interleaved thoughts
- SFT mid-training to learn when/what to think alongside original content
- RL mid-training to optimize reasoning generation with grounded reward from future token prediction

Result: 3.2x improvement on reasoning benchmarks compared to direct RL post-training on base Llama-3-8B, and gains over only prior SFT as well.

Introducing reasoning earlier makes models better prepared for post-training!

Read more in the blog post: https://t.co/aGJcQKUjRw

9

559

71

477

68K

data_veng retweeted

Lisan al Gaib

@scaling01

2 months ago

Muse Spark tied for 1st place on SWE-Bench-Pro, MCP Atlas and Humanity's Last Exam

4

220

11

24

19K

data_veng retweeted

David Hendrickson

@TeksEdge

about 2 months ago

💡 If you're tracking LLM hallucination benchmarks, (which is the most important IMHO) 📊 Qwen3.6-Plus is the best accoridng the BridgeBench!

TeksEdge's tweet photo. 💡 If you're tracking LLM hallucination benchmarks, (which is the most important IMHO) 📊 Qwen3.6-Plus is the best accoridng the BridgeBench! https://t.co/1lJVEjVGzt

11

91

3

32

9K

data_veng retweeted

Leon Lin

@LexnLin

2 months ago

I experimented with Muse Spark a bit more. I take my words back it seems to be quite okay. Muse Spark generated images for the website btw

5

106

7

31

47K

data_veng retweeted

Matt Deitke

@mattdeitke

2 months ago

Excited to share Muse Spark! It's a strong natively multimodal model with many surprising properties that emerged. Here, the model is able to use Python tools to make a playable Sudoku game on the web from an image input of the board. ✨

12

180

22

17

28K

data_veng retweeted

Lisan al Gaib

@scaling01

2 months ago

MUSE SPARK SVGs actually a decent model

8

162

5

9

21K

data_veng retweeted

Design Arena

@Designarena

2 months ago

BREAKING: Muse Spark by @Meta is #6 overall on Design Arena with an Elo of 1324! This is the single biggest improvement we've seen on Design Arena to date, with a jump of 103 positions and 374 Elo points Huge congrats to the @Meta team on the launch!

Designarena's tweet photo. BREAKING: Muse Spark by @Meta is #6 overall on Design Arena with an Elo of 1324!

This is the single biggest improvement we've seen on Design Arena to date, with a jump of 103 positions and 374 Elo points

Huge congrats to the @Meta team on the launch! https://t.co/23kydnoUCV

7

261

16

24

27K

data_veng retweeted

Yuchen Jin

@Yuchenj_UW

2 months ago

Meta released Avocado, they call it Muse Spark. It's not open source (a bit sad). Meta TBD lab rebuilt the entire pretraining stack in 9 months and reached similar capability with >10x less compute than Llama 4 Maverick. I still think infra is the real moat in AI labs. You can train models much faster with a good infra, and it allows researchers to experiment with many more ideas much more quickly.

Yuchenj_UW's tweet photo. Meta released Avocado, they call it Muse Spark.

It's not open source (a bit sad).

Meta TBD lab rebuilt the entire pretraining stack in 9 months and reached similar capability with >10x less compute than Llama 4 Maverick.

I still think infra is the real moat in AI labs. You can train models much faster with a good infra, and it allows researchers to experiment with many more ideas much more quickly.

41

646

31

96

54K

data_veng retweeted

Ritesh @treadon

2 months ago

VERDICT: Meta Muse Spark is the REAL DEAL I ran several tests, including reading a menu. Newly released Meta Muse Spark was on the ONLY frontier AI to get all the items correct. Sorry ChatGPT, there is no "Slapped Wagyu Dog" on the menu💀. https://t.co/YmL52ADOwb

treadon's tweet photo. VERDICT: Meta Muse Spark is the REAL DEAL

I ran several tests, including reading a menu. Newly released Meta Muse Spark was on the ONLY frontier AI to get all the items correct.

Sorry ChatGPT, there is no "Slapped Wagyu Dog" on the menu💀.

https://t.co/YmL52ADOwb https://t.co/upulJapqMC

4

124

19

14

24K

data_veng retweeted

AI at Meta

@AIatMeta

2 months ago

Muse Spark is built from the ground up to integrate visual information across domains and tools. It achieves strong performance on visual STEM questions, entity recognition, and localization, enabling interactive experiences like troubleshooting your home appliances with dynamic annotations.

9

548

38

117

163K

data_veng retweeted

AI at Meta

@AIatMeta

2 months ago

To build personal superintelligence, our model’s capabilities should scale predictably and efficiently. Below, we share how we study and track Muse Spark’s scaling properties along three axes: pretraining, reinforcement learning, and test-time reasoning. 🧵👇 Let’s start with pretraining. Over the last 9 months, we rebuilt our pretraining stack with improvements to model architecture, optimization, and data curation, enabling us to increase the capability we can extract from every unit of compute. To rigorously evaluate our new recipe, we fit a scaling law to a series of small models and compare the training FLOPs required to hit a specific level of performance. The results: we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick, making Muse Spark significantly more efficient than the leading base models available for comparison.

AIatMeta's tweet photo. To build personal superintelligence, our model’s capabilities should scale predictably and efficiently. Below, we share how we study and track Muse Spark’s scaling properties along three axes: pretraining, reinforcement learning, and test-time reasoning. 🧵👇

Let’s start with pretraining. Over the last 9 months, we rebuilt our pretraining stack with improvements to model architecture, optimization, and data curation, enabling us to increase the capability we can extract from every unit of compute. To rigorously evaluate our new recipe, we fit a scaling law to a series of small models and compare the training FLOPs required to hit a specific level of performance.

The results: we can reach the same capabilities with over an order of magnitude less compute than our previous model, Llama 4 Maverick, making Muse Spark significantly more efficient than the leading base models available for comparison.

32

756

78

129

72K

data_veng retweeted

Vals AI

@ValsAI

2 months ago

Meta just released a frontier model, Muse Spark- it takes the #3 spot on our Vals Index.

3

188

29

38

60K

data_veng retweeted

Rayan Krishnan

@RayanKrishnan

2 months ago

@fchollet Actually performs quite well on held out real world tasks! Especially for practical agentic use cases in finance. I'm also pretty doubtful that ARC AGI is a good predictor for "actual usefulness" https://t.co/Ni6YTxKABN

1

105

6

9

30K

data_veng retweeted

Epoch AI

@EpochAIResearch

2 months ago

We had pre-release access to Meta’s new Muse Spark model and evaluated it on FrontierMath. It scored 39% on Tiers 1-3 and 15% on Tier 4. This is competitive with several recent frontier models, though behind GPT-5.4.

EpochAIResearch's tweet photo. We had pre-release access to Meta’s new Muse Spark model and evaluated it on FrontierMath. It scored 39% on Tiers 1-3 and 15% on Tier 4. This is competitive with several recent frontier models, though behind GPT-5.4. https://t.co/avOWcEzQLI

10

372

41

47

87K

Dmitry Vengertsev

@data_veng

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users