2-bit Gemma 4 12B GGUF, only 4.66 GB on disk, managed to cite 15 sites from a single prompt.
Try this locally on >6GB RAM via Unsloth Studio.
GitHub: https://t.co/aZWYAtakBP
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
Super excited to announce seven new world-class MAI models today. They represent what we consider a new era in AI designed to keep you in control and on the frontier.
First is our text foundation model, MAI-Thinking-1, exceptionally strong on reasoning and SWE tasks.
- It’s a 35B active parameter MoE with a 256K context window. Independent human raters on Surge prefer it for overall quality in blind side-by-sides versus Sonnet 4.6, and it’s achieved 97% on AIME 2025, the key measure of its general-purpose reasoning abilities.
- It's at 53% on SWE Bench Pro, placing it right alongside Opus 4.6 on one of the toughest coding benchmarks.
- And since we co-designed our models with our own silicon, MAI-Thinking-1 is optimized on our MAIA 200 chip. Benchmarking head-to-head against the GB200, we see 30% better performance per dollar as well as a 1.4x performance-per-watt gain when running our MAI models on the MAIA 200 end-to-end.
Next is MAI-Image-2.5 and its Flash variant. Two super strong models now at #2 on the leaderboards, surpassing the score of Nano Banana 2 on image editing.
Last for now is MAI-Code-1-Flash, our new inference efficient coding model, especially tuned for VS Code and GitHub Copilot CLI.
- Code-1-Flash achieves 51% on SWE Bench Pro, despite having just 5B parameters, putting it closer to Haiku in size but cheaper in cost.
All of this is the foundation for Microsoft Frontier Tuning. It lets you customize our models to create custom, company-specific agents that only you control. You can make our model, your model. Your data. Your agents. Your moat.
Early adopters are already seeing a difference. When we tuned our models for McKinsey’s tasks, MAI delivered the highest win rate, outperforming GPT-5.5 on quality, while being 10x lower on cost.
Also really excited to be collaborating with the amazing team at Mayo Clinic to jointly train a new frontier AI model for healthcare.
Our announcements today mark another milestone on the road to humanist superintelligence. You can learn more and about our other new models in our latest blog: https://t.co/v65eop5Ixq
Qwen3.6 35B A3B can't fill out a paper form on its own. But give it NVIDIA's LocateAnything-3B — the #1 trending model on HuggingFace — as its eyes, and the two small models get it done together.
(The test: place each element at the right pixel position on a blank form image, not type into a field.)
Setup:
> Qwen is the brain (main model), LocateAnything is the eyes (helper model acting as a tool).
> I gave Qwen a new tool: ask "where's the email field?" and LocateAnything returns the exact x, y, width, height.
> The blue boxes on the screen are its detections. Look how tight they are — it nails every field.
Result:
> Qwen3.6 35B A3B + LocateAnything-3B: form completed, all info correct.
> Name, DOB, ID, gender, marital status, nationality, email, phone, address, postal code: all landed in the right field areas.
> Character-box alignment still a touch loose, but every value is where it belongs.
> 9m10s, 224.5k input, 24.3k output, 21 turns.
Why it matters:
> Qwen alone can't finish this test. Bolt on a 3B model that does exactly one thing > locate > and suddenly it can.
> A combination of small models can do the work of a single large one.
Honest question: has anyone found a use case for Gemini Spark that wasn't something they were already able to do with scheduled tasks and the right connections to certain apps on any other major provider?
Agreed. We are in the early days but over the last year I can see where it is moving and it's exciting times. I also agree that giving models the right tools, and having the models being able to use them reliably, is where the next big step is.
I think AI orchestration is where the next step after that is. Once they can use tools reliably and intelligently, then you create tools with some of that intelligence embedded in the design itself. Lightweight models designed around a specific tool or set of tools which a generalist model can call on rather than a single model trying to do it all.
Our aspiration with Codex is to remove software creation as a limiting factor on the world’s ambition.
Not just for product companies and engineers, but also for users across every role and beyond business use-cases (Codex is for everyone).
Writing code is the first step toward accelerating software creation, but we still haven’t yet seen the true explosion of software that should be possible with advances in AI coding.
Deploying useful software also requires clear specifications, well-designed code, security guarantees, careful deployment, production monitoring, and constant iteration.
Thinking of this system as a software factory feels pretty apt. We’ve already seen exciting examples of this working well within OAI, like the work @_lopopolo wrote about in the Harness Engineering blog post. I’ve also seen great examples from customers across startups and enterprises. However, doing this well requires a lot of laborious work to get right.
I think the model capabilities feel very close to supporting this end-to-end, though we'll make them even better. The limiting factor now is likely giving models and agents access to the right tools, and having them run at the right moments, so they can truly push the whole process forward.
Once we get there, I think we’ll be in an extremely exciting world where software can be both great and disposable.
When we do see that software explosion, I think things will feel significantly different from even what we've seen so far.