🚀 Gemma 4 12B is here!
We partnered with @GoogleDeepMind to bring and optimize their new dense and unifed multimodal model for Apple Silicon.
◈ 12B dense · 256K context
◈ Thinking mode (built-in reasoning)
◈ Vision: dynamic res, OCR, UI + charts
◈ Native audio: ASR + speech translation
◈ Function calling for agents
◈ Text + image + audio, interleaved
Runs local. Get started now ⚡
> uv pip install -U mlx-vlm
https://t.co/7BvnEuzKvj
This is amazing. Do this:
1. Set model to Opus 4.8
2. Reasoning effort to /ultracode
Enables Claude Code's new Dynamic Workflows.
Claude will autonomously detect complex tasks, write an orchestration script, and spawn an agent swarm.
Introducing Claude Opus 4.8: it builds on Opus 4.7 with sharper judgment, more honesty about its own progress, and the ability to work independently for longer than its predecessors.
Available today at the same price.
How do you know your agent is still running after you close the lid?
I became a bit obsessed with putting some RGB LEDs for a status indicator into the SD Card slot on my MacBook Pro.
I never use that slot, so it's a perfect place.
Awesome. NVIDIA dropped PiD - fast high-res latent decoding via pixel diffusion!
- replace VAE
- 4/8x upsampling
- 2k decoding in <1s on RTX 5090
- works with FLUX.1/SD3/Z
- rapid generation previews
sharper details, much lower hardware lag compared to standard methods.
https://t.co/60Pkqze0gR
Gemma 4 is here, and it's optimized for Apple Silicon. This 4-bit quantized model runs fast on your Mac, not just in the cloud. It's a game changer for local AI.
LM Studio with Multi Token Prediction (MTP) is now in beta.
1. Update to 0.4.14+3 in-app
2. Make sure your llama.cpp engine is 2.15.0
3. Turn on MTP when loading a model
Use a model that supports it, like Qwen3.6-35B-A3B-MTP-GGUF or Qwen3.6-27B-MTP-GGUF