Rama Krishna Bachu

@ramkbachu

ML researcher. Small models, financial reasoning, on-device inference. Building Bottensor · polyRT.

Joined September 2013

185 Following

181 Followers

5 Posts

Rama Krishna Bachu

@ramkbachu

2 months ago

Today the NPC models crossed 1200 downloads in hugging face. Personally the model i like the most is NPC Agentic V3. Try it here: https://t.co/hMpwlpi8SS

ramkbachu's tweet photo. Today the NPC models crossed 1200 downloads in hugging face. Personally the model i like the most is NPC Agentic V3. Try it here: https://t.co/hMpwlpi8SS https://t.co/3hXisf7NBf

134

Rama Krishna Bachu

@ramkbachu

2 months ago

Ran NVIDIA's new Nemotron-3-Nano-30B-A3B-Reasoning at UD-IQ2_XXS (2 bits) on my RTX 4050 laptop. 6 GB VRAM. Full GSM8K test set: 1234/1319 = 93.56% 14.4 tok/s gen, 122.5 tok/s prompt 11h23m, 487k completion tokens 30B reasoning on a thin-and-light. wild.

ramkbachu's tweet photo. Ran NVIDIA's new Nemotron-3-Nano-30B-A3B-Reasoning at UD-IQ2_XXS (2 bits) on my RTX 4050 laptop. 6 GB VRAM.

Full GSM8K test set: 1234/1319 = 93.56%
14.4 tok/s gen, 122.5 tok/s prompt
11h23m, 487k completion tokens
30B reasoning on a thin-and-light. wild. https://t.co/LKmc0g0B4v

127

Rama Krishna Bachu

@ramkbachu

2 months ago

I was working on creating a small agentic model that runs on my laptop and may be good for OpenClaw or any assistant. This solves the problem of paying for tokens, and I'm looking at at-least 25 tok/sec. After researching and training NPC Agentic v1 on a Qwen-2.5 7B base (overfitted) and v2 (which exposed an EOS bug), I'm finally able to train the model on Hermes agent traces and Claude agentic traces. v3 is a success at 25+ tok/sec. agentic model: https://t.co/igiMO8Ljiq

Rama Krishna Bachu

@ramkbachu

2 months ago

Just going through my feed and found the nemotron omni model 30B seems pretty interesting as a 30B model can do almost anything. Will benchmark in a while and comeback with my findings.

Rama Krishna Bachu

@ramkbachu

2 months ago

spent the past few years deep in agent infrastructure and crypto-adjacent ML. stepping back from that. what I actually want to build is small specialized models and on-device reasoning — and that's the work going forward. two things shipped so far: ▸ bottensor — small specialized models research three papers: • NPC Fast 1.7B — router LoRA, 16K context https://t.co/enHcKg7QBD • Fin-PRM 7B — process reward model, Spearman 0.92 https://t.co/ckTaQE6Rap • NPC Fin 32B — multi-GPU QLoRA, 12× H100 https://t.co/6bXwvAc5GT site: https://t.co/0NmCafPMoq ▸ polyrt — Python library for calling LLMs across local + cloud from one typed interface v0.1 just shipped: • MLX (Apple Silicon), Anthropic, OpenAI backends • sync + async, schema enforcement • Apache 2.0 pip install polyrt[anthropic,openai,mlx] https://t.co/BpkEcTNfFT more soon.

Rama Krishna Bachu

@ramkbachu

Last Seen Users on Sotwe

Trends for you

Most Popular Users