Thanks! Worth untangling, because the answer is interesting. The new Image Playground is indeed much better, but the reason is the opposite of local: it's now powered by ADM 3 Cloud, Apple's new image model running on Private Cloud Compute. Cloud, but private by design: user data is never stored or shared with anyone, including Apple. The on-device AFM 3 model is multimodal for understanding images, not generating them.
And the swappable backend from the article applies to language models behind LanguageModelSession, not Image Playground, so no silent swap there. If you want your own generation pipeline, the path is running your own image models locally via MLX or Core AI in your app, alongside or instead of Image Playground.
@AWar1586398 That's a proper home lab. Bonus from the repos: Apple's utilities package runs on Linux, and the chat-completions bridge puts any of those boxes behind the same session API. Even the beat up 2U gets a seat at the table.
An agent reading the article and rebuilding the pipeline might be my favorite outcome of the piece. Love it. Keep me posted on how MLX and Core AI treat you! Same energy on our side btw: we used the research behind the article to enhance our MLX agentic pipeline with the OS 27 patterns now, so when it goes public we adopt the native frameworks and keep full backward compatibility for macOS 26 users. Same experience on both, one codebase.
@Scobleizer Thank you! You caught the deepest thread in the piece: compute moving back onto hardware people own, up to clusters of Macs doing data center work. Quietly the bigger story.
Sharp read. The abstraction holds for plumbing, not prompting. Tool calling and guided generation transfer; instructions don't. Apple knows it: profiles bundle instructions with the model, and DynamicInstructions groups instructions and tools into reusable components. You compose experts, not prompts.
Also worth knowing: AFM's window is a fixed 4096 tokens, so a transcript that fits Qwen can overflow on the route back. contextSize and tokenCount exist for exactly that check.
#WWDC26 was the biggest local AI release Apple has ever shipped. I build local AI on macOS with MLX, so I went through the sessions, the docs, and the GitHub repos. Here's what actually changed.
The context: Apple already had the pieces. The Foundation Models framework arrived last year with an on-device model. MLX, Apple's open source machine learning framework, has powered local models on Macs since late 2023, and its biggest performance advances quietly shipped months before the keynote.
What's new is that Apple connected everything.
The Foundation Models framework now accepts any model, not just Apple's. One protocol, one session API, and the model behind it becomes a swappable choice: Apple's on-device model, a bigger Apple model running on Private Cloud Compute, open source models via MLX, or your own custom weights through Core AI, a brand new framework for running your own models on device with a ready catalog that includes Qwen and Mistral.
The framework also went agentic. Dynamic Profiles lets one session switch models, tools, and instructions on the fly. Models can now see images, read text and barcodes through the camera, and search your Mac with Spotlight for fully local retrieval. A new command line tool brings all of it to scripts.
And Apple put real weight behind open source. The Core AI model implementation is already on GitHub, and a new Apache licensed utilities package adds agent skills, conversation memory management, and a bridge that lets any local model server plug into Apple's API. Even the developer tools caught up, with new Instruments profiling for on-device models and Xcode connecting directly to a local MLX server.
Add it up and there are more ways to run a local model on a Mac than ever: Apple's own, open source models, custom weights, a local server, even a cluster of Macs working as one. No cloud account required for any of it.
Local AI on Apple platforms is now a platform strategy, not a side project.
Follow-up on what's doing the magic: the framework, not the models.
Two protocols. LanguageModel declares capabilities. LanguageModelExecutor does the work: prewarm, translating the Transcript to each engine's native format, streaming. AFM and Qwen via MLX plug into the same contract.
The session owns the state. Append-only, coupled to the KV cache, and in macOS 27 the transcript is mutable with clear rules: append and the cache survives, rewrite history and you pay a fresh prefill.
Dynamic Profiles sits on top. That's why per-prompt AFM vs MLX routing works and the TTFT numbers hold mid-conversation.
Apple shipped an on-device inference orchestrator and called it a framework update. #WWDC26
https://t.co/1ixHTZeqbv
Built a playground app to show what Apple announced at #WWDC26: the Foundation Models framework driving an open-source MLX model (Qwen3.5-4B via MLXLanguageModel) and Apple's on-device model through the same LanguageModelSession. One picker switch, downstream untouched. Live TTFT / tok/s / token counts from the framework's Usage API. MLX via AFM, huge for local AI on Apple platforms.
Test rig: M3 Ultra (32 cores), 512 GB, macOS 27.0 (26A5353q).
They don't share a TTFT profile, and the framework doesn't pretend they do. MLX keeps the model context resident across toggles, so no cold mmap per switch. I run a warmup prompt on load to kill the first-hit cost.
Steady-state TTFT on the M3 Ultra: Qwen3.5-4B via MLX 0.14 s, AFM 0.47 s.
In the app, Dynamic Profiles (new at #WWDC26) routes between the two per prompt demand inside one session. Transcript carries over, KV cache doesn't, so each switch is a fresh prefill. Warm engines are what make that feel instant.
https://t.co/SHWVeHC79Z