Qwopus 3.6 27b-Coder is now live!
Scores a 67% on a full run of SWE bench verified with thinking completely disabled! Q5_K_M
This model is lightning fast for dense class! With a natively finetuned MTP head, it achieves 100 tps on a single 5090! The biggest upgrade here, though, is its stability in programming and tool calling within @NousResearch Hermes agent, with thinking off!
Wall time is crazy fast this way, which makes Hermes feel "native" and snappy, like they were meant for each other. The freedom of running without thinking at all makes you part of the thinking process, and you never get caught waiting 15 minutes for it to finish a thought string, like with the base models.
Thinking on and temp high, .9-1 seems to produce really incredible design and svg results. I reran the Boat survival prompt through a few turns, thinking on, and it seemed to render more fancy models in HTML canvas, but it was much more of a start-a-prompt and wait experience vs the snappy and active iteration with it disabled. It may be worth turning it off and on throughout the build process if you want to get really creative with design.
Really looking forward to seeing how this one performs for y'all! Please post comments with your opinions and use cases below! As always with our fine-tunes, mess with the temperature setting, and run them much hotter than the base!
Please check out the Boat Survival game I posted yesterday, made in 12 turns using Hermes and this model, with thinking off. Link below!
Full swe bench repo-specific breakdown also posted in the comments for those interested!
Happy building, everyone! We're looking forward to your thoughts! Quants uploading now!
https://t.co/kxJE3C39ZZ
Meet Gemma 4 12B!
A unified, encoder-free multimodal model designed to bring high-performance intelligence directly to your laptop, and released under an Apache 2.0 license.
Bridging the gap between edge efficiency and advanced reasoning. Here is what’s new with Gemma 4 12B: 👇
Peter @steipete built OpenClaw. Omar @OmarShahine put a Scout uniform on it and shipped it at Microsoft. And I've been running it as my own digital chief of staff for months.
This is a two-sided revolution:
🏢 Corporate: Microsoft Scout is the next S-curve for workforce empowerment. Always-on Autopilot. Enterprise identity. Policy controls. Your team, amplified. Not a chatbot — an operating layer.
🏠 Personal: NVIDIA RTX Spark puts a 1-petaflop AI superchip in your laptop. Your AI PC, running OpenClaw/Scout, maybe even edge models like DeepSeek locally. Your own private intelligence engine.
One platform. Two worlds. Windows just became the OS for both.
The models change weekly — Gemini, Claude, GPT, DeepSeek. I use them all. Be model-agnostic. Build the layer.
From a side project to running Windows. Not bad, @steipete. Not bad at all.
@OpenClaw
Six open-source LLMs. One sliding puzzle. A brutal test of long-horizon reasoning and tool calling.
Five of them broke. One didn't.
I gave each model a move_tile tool and a scrambled 3×3 board, then asked it to solve the puzzle through pure turn-by-turn reasoning. The deeper the scramble, the more brutal the search.
Five runs per depth, best run kept. A model fails the round if it exceeds 6x the optimal move count.
> Depth 5:
Everyone solves it. Yawn.
> Depth 10:
GLM 5.1 melts down. 43 moves. Cut.
> Depth 12:
Gemma4 26B loses the plot, shuffling tiles in circles. Gone.
> Depth 15:
The wall. DeepSeek V4 Flash, out. DeepSeek V4 Pro, out. Gemma4, out again. GLM 5.1, out.
Two survivors: Qwen3.6 35B-A3B, and Kimi K2.6 with an 11-move solve that looked like cheating.
> Depth 18:
Same two. Everyone else hallucinating tiles that weren't there.
> Depth 22:
Final boss. Kimi, flawless for five rounds, finally cracks. 81 moves. Still scrambled. DeepSeek V4 Pro limps home at 90.
Qwen3.6 35B-A3B solves it in 36.
The smallest model in the room. ~3B active params. Fits on a single 3090. It beat everything.
Kimi was elegant. Qwen3.6 was unstoppable.