Today the NPC models crossed 1200 downloads in hugging face. Personally the model i like the most is NPC Agentic V3. Try it here: https://t.co/hMpwlpi8SS
Ran NVIDIA's new Nemotron-3-Nano-30B-A3B-Reasoning at UD-IQ2_XXS (2 bits) on my RTX 4050 laptop. 6 GB VRAM.
Full GSM8K test set: 1234/1319 = 93.56%
14.4 tok/s gen, 122.5 tok/s prompt
11h23m, 487k completion tokens
30B reasoning on a thin-and-light. wild.
I was working on creating a small agentic model that runs on my laptop and may be good for OpenClaw or any assistant. This solves the problem of paying for tokens, and I'm looking at at-least 25 tok/sec.
After researching and training NPC Agentic v1 on a Qwen-2.5 7B base (overfitted) and v2 (which exposed an EOS bug), I'm finally able to train the model on Hermes agent traces and Claude agentic traces.
v3 is a success at 25+ tok/sec. agentic model: https://t.co/igiMO8Ljiq
Just going through my feed and found the nemotron omni model 30B seems pretty interesting as a 30B model can do almost anything.
Will benchmark in a while and comeback with my findings.
spent the past few years deep in agent infrastructure and crypto-adjacent ML.
stepping back from that. what I actually want to build is small specialized models and on-device reasoning — and that's the work going forward.
two things shipped so far:
▸ bottensor — small specialized models research
three papers:
• NPC Fast 1.7B — router LoRA, 16K context
https://t.co/enHcKg7QBD
• Fin-PRM 7B — process reward model, Spearman 0.92
https://t.co/ckTaQE6Rap
• NPC Fin 32B — multi-GPU QLoRA, 12× H100
https://t.co/6bXwvAc5GT
site: https://t.co/0NmCafPMoq
▸ polyrt — Python library for calling LLMs across local + cloud from one typed interface
v0.1 just shipped:
• MLX (Apple Silicon), Anthropic, OpenAI backends
• sync + async, schema enforcement
• Apache 2.0
pip install polyrt[anthropic,openai,mlx]
https://t.co/BpkEcTNfFT
more soon.