oMLX 0.3.9rc1 released.
Highlights:
- Low-memory Macs stay stable instead of getting killed by the OS
- DFlash bumped to v0.1.7 (thanks to @bstnxbt's dflash-mlx). Qwen thinking/GDN fix, Etc.
- Chunked prefill. A long prompt no longer blocks decode for everyone else
- Multi-tasking in the admin chat. Run multiple chats in parallel
- Real-time memory bar in the admin dashboard
- Hermes Agent quick launch, "omlx launch hermes"
Plus a lot of bug fixes and new contributors in this cycle. Thanks everyone!
https://t.co/maWzDJUvsH
4-bit Qwen3.6 MTP GGUF managed to search 70+ sites from a single prompt.
Try this locally on 20GB RAM via Unsloth Studio.
Unsloth now supports auto MTP + speculative decoding & auto-selects the best MTP settings for your device (Mac, CPU, GPU).
GitHub: https://t.co/aZWYAtakBP