Curious ML/DL hacker. Love cats, daydreaming, chatting with AI, and pondering over all things logical.
Meta; ex: Stakefish, Twitter, Google, CMU, Bowdoin.
Meta is back! Muse Spark scores 52 on the Artificial Analysis Intelligence Index, behind only Gemini 3.1 Pro, GPT-5.4, and Claude Opus 4.6. Muse Spark is the first new release since Llama 4 in April 2025 and also Meta's first release that is not open weights
Muse Spark is a new model from @Meta evaluated on Artificial Analysis. We were given early access by Meta to independently benchmark the model. It is the first frontier-class model from Meta since Llama 4 Maverick was released in April 2025, and notably the first @AIatMeta model that is not being released as open weights. The release follows Meta's reorganization of its AI efforts under Meta Superintelligence Labs, and signals that Meta is re-entering the frontier race after roughly a year of relative quiet.
For context, Llama 4 Maverick and Scout scored 18 and 13 respectively on the Artificial Analysis Intelligence Index as non-reasoning models at the time of their release, while Muse Spark scores 52. Muse Spark essentially closes the gap between to the frontier in a single release.
The model is not open source and is not yet accessible via an API but Meta has shared they expect this to come soon. Meta is also integrating Muse Spark into their first party products including their Meta AI chat product, Facebook, Instagram and Threads.
Key takeaways from our benchmarks:
➤ Muse Spark scores 52 on the Artificial Analysis Intelligence Index, placing it within the top 5 models we have benchmarked. It sits ahead of Claude Sonnet 4.6, GLM-5.1, MiniMax-M2.7, Grok 4.20 and behind Gemini 3.1 Pro Preview, GPT-5.4 and Claude Opus 4.6
➤ Muse Spark is notably token efficient for its intelligence level. It used 58M output tokens to run the Intelligence Index, comparable to Gemini 3.1 Pro Preview (57M) and notably lower than Claude Opus 4.6 (Adaptive Reasoning, max effort, 157M), GPT-5.4 (xhigh, 120M) and GLM-5 (110M)
➤ Muse Spark is the second-most capable vision model we have benchmarked. It scores 80.5% on MMMU-Pro, behind only Gemini 3.1 Pro Preview (82.4%)
➤ Muse Spark performs strongly on reasoning and instruction-following evaluations. It scores 39.9% on HLE, trailing only Gemini 3.1 Pro Preview (44.7%) and GPT-5.4 (xhigh, 41.6%). The model also achieved 5th highest in CritPT with a score of 11%, an eval that is focused on difficult physics research questions. This is substantially above above Gemini 3 Flash (9%) and Claude 4.6 Sonnet (3%)
➤ Agentic performance does not stand out. On GDPval-AA, our evalaution focused on real world work tasks, Muse Spark scores 1427, behind both Claude Sonnet 4.6 at 1648 and GPT-5.4 at 1676, but ahead of Gemini 3.1 Pro Preview at 1320. On On TerminalBench Hard, Muse Spark trails Claude Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. Muse Spark joins others in achieving a high τ²-Bench Telecom score of 92%
Key model details:
➤ Modalities: Multimodal including text and vision input, text output
➤ License: Proprietary, Meta's first frontier model not released as open weights
➤ Availability: No public API at the time of publishing. Meta expects to provide API access soon. Meta has started integration into their first party AI offering Meta AI and inside Facebook, Instagram, and Threads
Let’s be clear: DeepSeek r1 isn’t about who “races faster”—it’s about the inherent flaw in closed models. The future of AI isn’t owned by those who hide code or stockpile chips. It’s built on trust, and trust requires transparency. When models are black boxes, you surrender control over data privacy, culturally aligned ethics, and post-training customization for real world scenarios. That’s not leadership—it’s liability.
China’s pretraining consolidation proves a simple truth: AI is becoming a commodity. The real value lies not in the model but in what you do with it. Why waste billions reinventing closed-source base models when the market craves applications that solve poverty, climate crises, or healthcare gaps? Labs clinging to secrecy risk irrelevance—like doubling down on fax machines as email took over.
Consider the irony: Today, free models come from quant firms while “nonprofits” charge premiums for access. OpenAI’s pivot from “open” to walled gardens betrays the very ethos that birthed modern AI. Meanwhile, market economy principles prevail: restrict access, and replacements emerge.
The lesson? Trust beats control. Open models engage developers. Closed models breed suspicion—and suspicion fuels replacement. How to “maintain leads”? Stop gatekeeping. Build open infrastructure the world trusts, like the internet.
History doesn’t reward those clinging to scarcity. It rewards those who empower the many. The choice is yours.
Introducing StarCoder ⭐️ a 15B open-source Code-LLM created by @huggingface and @ServiceNow through @BigCodeProject
🔡 8192 token context window
📊 trained on 1 trillion token
💭 80+ Programming languages
🔐 only permissive licensed data
✅ commercial use
https://t.co/gEgKeUL1vN
Ghost town Pyramiden: once a busy USSR mining town in the arctic with a library, theatre, canteen, and swimming pool; now abandoned, with a Lenin statue watching over the glaciers in solitude.
@wangtian They are on Twitter as well!
> 50% of engineers I interview fail to tell me correctly how many bytes is 0xDEAD.
> 80% of engineers I interview cannot calculate the decimal value of 0x01FF without Googling for a hex converter.
And these are engineer candidates.
@qiqicoin @stakefish@Ledger You need to come to our Consensus booth to pick it up 😃. And I forgot to mention, f2pool or stakefish employees don't qualify for prizes!)
Bought a guitar with a touch screen and apps. Given that it has Wi-Fi, speakers, and a microphone, let's see if future software updates will bring a phone app. Can't wait to call my mom from a guitar. 🤔
We’re changing our company name so we can give the full @Square brand to our Seller business. So now we need a name to tie @Square, @CashApp, @TIDAL, and @TBD54566975 together into one. That name is “Block.” Why?
After a long wait, we are finally announcing the #Recsys2021 challenge! This year we are releasing around 1 billion samples, largest social network dataset by far, with an added focus on fair recommendations. Please checkout our website https://t.co/ZOGH9ejieS for more details.
@nojeshua Definitely interested. Ever since I quit my job, I found myself pondering over all these “unproductive questions”, such as origins of life/DNA, why do we get old, what’s gravity, etc. I just started reading a couple of books other people suggested on these topics.
Quarantine daydreaming: so DNA encodes information about how to interpret & replicate itself, like how to assemble proteins (e.g. helicase, primase) required for replication.
This is like a writing a C compiler in C. How did the "first DNA sequence" bootstrap?