Ended up aborting bc it kept stopping too often, like just after it's going to do this and that. Not sure if it's the model or the harness (pi agent). Asked it to prepare a handoff doc to continue with more capable model.
AFAICT, Gemma 4 QAT models were trained with MatFormer which quantize the forward pass to simulate inference with quantized weights and train around the resulting errors. MatFormer has been around for at least two years, since the rise of Matryoshka embeddings.
AI developer community as a whole has a pretty sense of what's capable and what's lacking with frontier models but not what the minimum is on these simple tasks requiring good reasoning and reliable tool-uses. I think we have a better picture now with Gemma 4 QATs.
it just learned: "Basically, I should treat GPT 5.4 as a tool for "Knowledge Retrieval and Code Correction." If I'm stumped or hit an error, I use that link to get the fix and then proceed autonomously."
moving in the right direction.
just asked Gemma4 12B QAT model to tackle writing a long running RLM-based agent that delegates only complex tasks to GPT 5.4, using the task itself as the test case. And off it went. No idea how it'll overcome its 256K context size if at all.
I'm not expecting it to succeed. Instead, I want to see how it behaves when met with challenges. It should be seeking advices from GPT 5.4 after some struggle. ๐ค
@james_clark Same. I know when my code is AI slop and when it isn't. The difference is whether I use AI as a tool or a self-driving car. The former takes attention and craftsmanship.
that was a good test drive of pi agent. found the harness powerful and got a clear understanding of where leading SLMs are. quite capable but easily confused. They're not only smart enough to conduct RLM loop but needs it badly to stay focused.
Trying pi agent. Starts light then adds new capabilities as needed. Starting bare on local Qwen3.6, it was able to analyze an app's LocateAnything integration. Then when I asked for web browsing capability, pi noticed I had agent-browser installed and integrated that. Nice.
ofc, using an LLM for that is quite wasteful so I should've asked it to create a syntax checker tool instead. And it did, although I had to slap it many times as it kept getting confused by the context. I'll use a project dedicated to extending pi agent next time.
This made me want to build a harness to livestream SLM terminal sessions that lets people watch it tackle challenges and steer it when it gets into trouble. I think robotic challenges would be more exciting than coding challenges. True mob at work.