Apple didn't, so I did: I made it dead simple to run macOS 27's local and Private Cloud Compute Foundation models in any app that accepts an OpenAI API URL.
Here's fm-proxy. A drop-in replacement, with no extra servers, and no extra keys. It's great in a minimal Pi setup!
The Rio 3.5 model broke the internet this week. The plot twist? It’s essentially our open-source model, Nex N2 Pro, wearing a different hat.
🤯 We analyzed the weights, and the recipe is exact: Rio 3.5 ≈ 0.6 * Nex N2 Pro + 0.4 * Qwen 3.5
It even literally introduces itself as "Nex N2 Pro" if you ask it without initial system prompt!
😂 We are flattered that the City of Rio used our work to achieve SOTA performance. Thanks for the ultimate benchmark validation.
🤝 But in the open-source world, attribution matters.
👇 Full mathematical proof & verify script in the first reply!
Wait what? Rio 3.5 Open 397B, developed by IT company of Rio de Janeiro's city government is now SOTA open source and even outperforming Qwen 3.7?
What is happening today.
Never heard of them before.
Alibaba Qwen3.7 slowly fading into irrelevance at the frontier due to proprietary stance.
In it's place we have Minimax M3 and... *checks notes* Rio 3.5 397b, made by the municipal IT company of Rio de Janeiro's city government.
https://t.co/JgIJYVhoEi
The US has turned hostile to all non-citizens, without realizing that a significant portion of technological advances and research, if not most, comes from that talent pool.
Extremely myopic move.
According to Grok, Andrej Karpathy is an EB-1 extraordinary ability green card recipient, not a US citizen. Thus under these new restrictions he is not permitted to use, or work on, Mythos 5 or Fable 5 as of 5:21pm tonight.
According to Grok, Andrej Karpathy is an EB-1 extraordinary ability green card recipient, not a US citizen. Thus under these new restrictions he is not permitted to use, or work on, Mythos 5 or Fable 5 as of 5:21pm tonight.
I gave Fable 5 one job: write custom WebGPU kernels for Gemma 4 inference.
It climbed to 84 tok/s, then hit a wall, insisting further optimization was impossible.
Hours later, Anthropic rolled back invisible LLM development safeguards, and it hit 255 tok/s.
The next day, access to Fable 5 was suspended globally.
Apple just did something nobody expected.
They turned 2 billion iPhones into local AI machines.
They open-sourced coreai-models, the entire toolkit that lets you export any HuggingFace model and run it natively on iPhone, iPad and Mac with zero cloud.
→ Runs 100% on the Neural Engine
→ No cloud. No API keys. No subscriptions.
→ Fully offline. Your data never leaves the device.
It even ships with skills for Claude Code, Codex, and Gemini, so your coding agent already knows how to use it.
100% Open Source.
mythos will be bad ON PURPOSE on ai "frontier llm research" tasks, this is very very sad for the research community
also the fact that this is un purpose not visible to the user is crazy
It's very cool that Apple shipped a 20B parameter on-device.
You can't put 20B parameters in RAM at any reasonable precision. To make it work they are using pretty exotic architecture by today's standards.
A small model predicts from the query (or prompt) which experts to load from Nand into RAM. The key distinction from a typical MoE is that you do this once per query and then generate all the tokens with the same experts (instead of switching the experts for every token).