Yi Z.

@yz

Curious ML/DL hacker. Love cats, daydreaming, chatting with AI, and pondering over all things logical. Meta; ex: Stakefish, Twitter, Google, CMU, Bowdoin.

San Francisco, CA

Joined September 2009

740 Following

2.6K Followers

402 Posts

yz retweeted

Artificial Analysis

@ArtificialAnlys

about 2 months ago

Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights Muse Spark is a new model from @Meta evaluated on Artificial Analysis. We were given early access by Meta to independently benchmark the model. It is the first frontier-class model from Meta since Llama 4 Maverick was released in April 2025, and notably the first @AIatMeta model that is not being released as open weights. The release follows Meta's reorganization of its AI efforts under Meta Superintelligence Labs, and signals that Meta is re-entering the frontier race after roughly a year of relative quiet. For context, Llama 4 Maverick and Scout scored 18 and 13 respectively on the Artificial Analysis Intelligence Index as non-reasoning models at the time of their release, while Muse Spark scores 52. Muse Spark essentially closes the gap between to the frontier in a single release. The model is not open source and is not yet accessible via an API but Meta has shared they expect this to come soon. Meta is also integrating Muse Spark into their first party products including their Meta AI chat product, Facebook, Instagram and Threads. Key takeaways from our benchmarks: ➤ Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it within the top 5 models we have benchmarked. It sits ahead of Claude Sonnet 4.6, GLM-5.1, MiniMax-M2.7, Grok 4.20 and behind Gemini 3.1 Pro Preview, GPT-5.4 and Claude Opus 4.6 ➤ Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5 (110M) ➤ Muse Spark is the second-most capable vision model we have benchmarked. It scores 80.5% on MMMU-Pro, behind only Gemini 3.1 Pro Preview (82.4%) ➤ Muse Spark performs strongly on reasoning and instruction-following evaluations. It scores 39.9% on HLE, trailing only Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (xhigh, 41.6%). The model also achieved 5th highest in CritPT with a score of 11%, an eval that is focused on difficult physics research questions. This is substantially above above Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%) ➤ Agentic performance does not stand out. On GDPval-AA, our evalaution focused on real world work tasks, Muse Spark scores 1427, behind both Claude Sonnet 4.6 at 1648 and GPT-5.4 at 1676, but ahead of Gemini 3.1 Pro Preview at 1320. On On TerminalBench Hard, Muse Spark trails Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Muse Spark joins others in achieving a high τ²-Bench Telecom score of 92% Key model details: ➤ Modalities: Multimodal including text and vision input, text output ➤ License: Proprietary, Meta's first frontier model not released as open weights ➤ Availability: No public API at the time of publishing. Meta expects to provide API access soon. Meta has started integration into their first party AI offering Meta AI and inside Facebook, Instagram, and Threads

ArtificialAnlys's tweet photo. Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights

Muse Spark is a new model from @Meta evaluated on Artificial Analysis. We were given early access by Meta to independently benchmark the model. It is the first frontier-class model from Meta since Llama 4 Maverick was released in April 2025, and notably the first @AIatMeta model that is not being released as open weights. The release follows Meta's reorganization of its AI efforts under Meta Superintelligence Labs, and signals that Meta is re-entering the frontier race after roughly a year of relative quiet.

For context, Llama 4 Maverick and Scout scored 18 and 13 respectively on the Artificial Analysis Intelligence Index as non-reasoning models at the time of their release, while Muse Spark scores 52. Muse Spark essentially closes the gap between to the frontier in a single release.

The model is not open source and is not yet accessible via an API but Meta has shared they expect this to come soon. Meta is also integrating Muse Spark into their first party products including their Meta AI chat product, Facebook, Instagram and Threads.

Key takeaways from our benchmarks:
➤ Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it within the top 5 models we have benchmarked. It sits ahead of Claude Sonnet 4.6, GLM-5.1, MiniMax-M2.7, Grok 4.20 and behind Gemini 3.1 Pro Preview, GPT-5.4 and Claude Opus 4.6

➤ Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5 (110M)

➤ Muse Spark is the second-most capable vision model we have benchmarked. It scores 80.5% on MMMU-Pro, behind only Gemini 3.1 Pro Preview (82.4%)

➤ Muse Spark performs strongly on reasoning and instruction-following evaluations. It scores 39.9% on HLE, trailing only Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (xhigh, 41.6%). The model also achieved 5th highest in CritPT with a score of 11%, an eval that is focused on difficult physics research questions. This is substantially above above Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%)

➤ Agentic performance does not stand out. On GDPval-AA, our evalaution focused on real world work tasks, Muse Spark scores 1427, behind both Claude Sonnet 4.6 at 1648 and GPT-5.4 at 1676, but ahead of Gemini 3.1 Pro Preview at 1320. On On TerminalBench Hard, Muse Spark trails Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Muse Spark joins others in achieving a high τ²-Bench Telecom score of 92%

Key model details:
➤ Modalities: Multimodal including text and vision input, text output
➤ License: Proprietary, Meta's first frontier model not released as open weights
➤ Availability: No public API at the time of publishing. Meta expects to provide API access soon. Meta has started integration into their first party AI offering Meta AI and inside Facebook, Instagram, and Threads

324

427

504K

Yi Z. @yz

12 months ago

My 14-years old cat passed away on my birthday, after battling lymphoma for 8 month.

yz retweeted

Tiezhen WANG

@Xianbao_QIAN

over 1 year ago

Let’s be clear: DeepSeek r1 isn’t about who “races faster”—it’s about the inherent flaw in closed models. The future of AI isn’t owned by those who hide code or stockpile chips. It’s built on trust, and trust requires transparency. When models are black boxes, you surrender control over data privacy, culturally aligned ethics, and post-training customization for real world scenarios. That’s not leadership—it’s liability. China’s pretraining consolidation proves a simple truth: AI is becoming a commodity. The real value lies not in the model but in what you do with it. Why waste billions reinventing closed-source base models when the market craves applications that solve poverty, climate crises, or healthcare gaps? Labs clinging to secrecy risk irrelevance—like doubling down on fax machines as email took over. Consider the irony: Today, free models come from quant firms while “nonprofits” charge premiums for access. OpenAI’s pivot from “open” to walled gardens betrays the very ethos that birthed modern AI. Meanwhile, market economy principles prevail: restrict access, and replacements emerge. The lesson? Trust beats control. Open models engage developers. Closed models breed suspicion—and suspicion fuels replacement. How to “maintain leads”? Stop gatekeeping. Build open infrastructure the world trusts, like the internet. History doesn’t reward those clinging to scarcity. It rewards those who empower the many. The choice is yours.

440

410

209K

yz retweeted

Philipp Schmid

@_philschmid

about 3 years ago

Introducing StarCoder ⭐️ a 15B open-source Code-LLM created by @huggingface and @ServiceNow through @BigCodeProject 🔡 8192 token context window 📊 trained on 1 trillion token 💭 80+ Programming languages 🔐 only permissive licensed data ✅ commercial use https://t.co/gEgKeUL1vN

307

39K

Who to follow

Tian Wang 王天

@wangtian

From the panda capital of the world.

Joe (Zhiyong) Xie

@Joe_Xie

AI Infra at Google. Alum: @X, Twitter, @Amazon, @Facebook, @Microsoft, @UW, and Nanjing Univ. Love tech, food and invest. Opinions are my own :)

almost 4 years ago

@wangtian Our team wanted to write some cool code, so we decided to get together in the arctic to do it.

Yi Z. @yz

almost 4 years ago

Ghost town Pyramiden: once a busy USSR mining town in the arctic with a library, theatre, canteen, and swimming pool; now abandoned, with a Lenin statue watching over the glaciers in solitude.

yz's tweet photo. Ghost town Pyramiden: once a busy USSR mining town in the arctic with a library, theatre, canteen, and swimming pool; now abandoned, with a Lenin statue watching over the glaciers in solitude. https://t.co/Q1LrhI5PNt

Yi Z. @yz

almost 4 years ago

@pavan_ky @wangtian Agree.

Yi Z. @yz

almost 4 years ago

@wangtian They are on Twitter as well! > 50% of engineers I interview fail to tell me correctly how many bytes is 0xDEAD. > 80% of engineers I interview cannot calculate the decimal value of 0x01FF without Googling for a hex converter. And these are engineer candidates.

yz retweeted

Chun

@satofishi

almost 4 years ago · Austin

.@f2pool and @stakefish team heading to #Consensus2022.

Yi Z. @yz

almost 4 years ago

@qiqicoin @stakefish @Ledger You need to come to our Consensus booth to pick it up 😃. And I forgot to mention, f2pool or stakefish employees don't qualify for prizes!)

Yi Z. @yz

about 4 years ago

Staking & relaxing with @stakefish

Yi Z. @yz

about 4 years ago

Would you hire an engineer who could build beautiful frontend React apps but tries to "sudo cd Desktop"? 😂

Yi Z. @yz

over 4 years ago

Bought a guitar with a touch screen and apps. Given that it has Wi-Fi, speakers, and a microphone, let's see if future software updates will bring a phone app. Can't wait to call my mom from a guitar. 🤔

yz's tweet photo. Bought a guitar with a touch screen and apps. Given that it has Wi-Fi, speakers, and a microphone, let's see if future software updates will bring a phone app. Can't wait to call my mom from a guitar. 🤔 https://t.co/XJXhBapD98

Yi Z. @yz

over 4 years ago

@squarecog @__lucab @satanjeev @cayley @kevinweil @sritchie @posco @niels @thesteggie 😂 Wow! You still remember that incident.

yz retweeted

Square @Square

over 4 years ago

We’re changing our company name so we can give the full @Square brand to our Seller business. So now we need a name to tie @Square, @CashApp, @TIDAL, and @TBD54566975 together into one. That name is “Block.” Why?

363

207

yz retweeted

Naval

@naval

over 4 years ago

If you can buy happiness, buy it.

541

22K

512

Yi Z. @yz

about 5 years ago

@julianosiloto Thanks for the ideas!

Yi Z. @yz

about 5 years ago

The pandemic will end. 🌞

yz retweeted

Wenzhe Shi 🐕🐎 @trustswz

about 5 years ago

After a long wait, we are finally announcing the #Recsys2021 challenge! This year we are releasing around 1 billion samples, largest social network dataset by far, with an added focus on fair recommendations. Please checkout our website https://t.co/ZOGH9ejieS for more details.

Yi Z. @yz

about 5 years ago

@nojeshua Definitely interested. Ever since I quit my job, I found myself pondering over all these “unproductive questions”, such as origins of life/DNA, why do we get old, what’s gravity, etc. I just started reading a couple of books other people suggested on these topics.

Yi Z. @yz

about 5 years ago

Quarantine daydreaming: so DNA encodes information about how to interpret & replicate itself, like how to assemble proteins (e.g. helicase, primase) required for replication. This is like a writing a C compiler in C. How did the "first DNA sequence" bootstrap?

Yi Z.

@yz

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users