Abheer Singh @abheer - Twitter Profile

1 day ago

Technical report: https://t.co/QMyAWmPmno Hugging Face: BF16 post-trained: https://t.co/f7mAvf6q45 NVFP4 quantized: https://t.co/LrsHVim852

0

1

0

10

Abheer Singh

@abheer

1 day ago

If you're a startup building agentic products, your economics live and die on inference. An agent makes many model calls over a single long trajectory. For agents that reason at length or make many tool calls, those calls generate far more tokens than they read, and the cost of generating tokens is the part that scales hardest. Chain enough of those requests together and output token cost becomes the dominant term in your unit economics. Two things bring that cost down. The first is activating fewer parameters on each token. The second is keeping the memory cost of long context low. Nemotron 3 Ultra, released by NVIDIA today, does both. It is a Mixture-of-Experts model that activates 55B of its 550B parameters per token. Its hybrid Mamba-Attention design keeps the KV cache footprint bounded as context grows, where a standard attention model would see it grow with every token. At the 8K-in, 64K-out setting it runs close to six times the throughput of GLM-5.1 and almost five times Kimi K2.6, at on-par accuracy.

abheer's tweet photo. If you're a startup building agentic products, your economics live and die on inference. An agent makes many model calls over a single long trajectory. For agents that reason at length or make many tool calls, those calls generate far more tokens than they read, and the cost of generating tokens is the part that scales hardest. Chain enough of those requests together and output token cost becomes the dominant term in your unit economics. Two things bring that cost down. The first is activating fewer parameters on each token. The second is keeping the memory cost of long context low. Nemotron 3 Ultra, released by NVIDIA today, does both. It is a Mixture-of-Experts model that activates 55B of its 550B parameters per token. Its hybrid Mamba-Attention design keeps the KV cache footprint bounded as context grows, where a standard attention model would see it grow with every token. At the 8K-in, 64K-out setting it runs close to six times the throughput of GLM-5.1 and almost five times Kimi K2.6, at on-par accuracy.

1

0

30

Abheer Singh

@abheer

1 day ago

The second thing worth noting is how much NVIDIA open-sourced. The base, post-trained, and NVFP4 quantized checkpoints are all released, along with the training data, the recipe, and the RL environments. In practice this means you can self-host the model. You can run inference in the same FP4 precision it was trained in. You can modify the training pipeline rather than treating it as a black box behind an API. The model was also post-trained for agentic work specifically. That training covered terminal use, end-to-end GitHub issue resolution, and search agents that compress their own context to keep operating past the window limit. An efficient open model paired with a fully released training pipeline is rare, and it is what makes owning your own inference stack realistic rather than aspirational.

1

0

16

Abheer Singh

@abheer

3 days ago

the pendulum is swinging local

Perplexity

@perplexity_ai

3 days ago

Today we're announcing that hybrid agentic inference is coming to Perplexity Computer. Computer can split tasks between a local model running on your machine and frontier models in the cloud. This keeps private data on your device and maximizes token efficiency. Coming soon.

145

2K

200

735

328K

0

1

0

58

Who to follow

Rohan Arora

@rohanarora_

creating https://t.co/5p2qx3zSUN / https://t.co/ywHO1fcqhu ML @JHUAPL BioE @Cal_Engineer. YCW23 https://t.co/CKGrLxi0l5

Michael 🙃

@mikaledundee

I still see 😲 your shadows 💩in my room 🛏 can’t take back 👋 the love 💕 that I gave you 🙈

4 days ago

@Winterrose so underrated as a source of sales alpha. X needs to make a competitor to SalesNav and they might kill LinkedIn

0

4

0

23

Abheer Singh

@abheer

4 days ago

@Winterrose 🤙

1

2

0

35

Abheer Singh

@abheer

4 days ago

@xXshaurizardXx what are you most excited to use it for? just as a mbp replacement or more?

0

50

Abheer Singh

@abheer

4 days ago

I guess I'm commenting on your general stance that edge token generators will never be a thing in America, rooted in the opinion that Americans don't care where their data is held. I can tell you objectively that I speak to dozens of American AI workstation consumers every week who care deeply. I also think Americans care more given the "don't tread on me" values, distrust of a central authority holding your data is the same instinct.

1

0

36

Abheer Singh

@abheer

4 days ago

@mweinbach @Midnight_Captl @benitoz You’re underestimating how much people care. iCloud and Google photos can’t be used as a generalization for people’s views on personal data governance. I talk to developers everyday who buy workstations for the sake of data governance.

1

2

0

32

Abheer Singh

@abheer

4 days ago

1. This is a low friction way to get hands on with the full NVIDIA stack in a familiar form factor. 2. You can run Hermes Agent on device behind OpenShell and Microsoft’s new security primitives.

Microsoft Surface @surface

5 days ago

Introducing Surface Laptop Ultra. Built for world makers. Designed for what's next. The most powerful Surface laptop ever. Coming Fall 2026. Sign up to learn more: https://t.co/k8aEX2pTAy

563

12K

1K

3K

11M

0

2

0

120

abheer retweeted

Nous Research

@NousResearch

5 days ago

We have been working closely with @nvidia to ensure Hermes Agent works smoothly on their new @NVIDIARTXSpark superchip and integrates with the new OpenShell runtime, which connects Hermes to @Microsoft's security primitives. Watch our feature in the big announcement at Computex:

312

7K

638

2K

6M

abheer retweeted

NVIDIA AI Infrastructure

@NVIDIAAIInfra

6 days ago

AI is unlocking breakthroughs in quantum computing. 🤝 We collaborated with the University of Innsbruck to tackle a core challenge in building useful quantum supercomputers: automatically designing efficient quantum circuits. Using NVIDIA CUDA-Q, we built a multimodal diffusion model to synthesize quantum circuits from scratch, optimizing circuit structure and gate parameters simultaneously to produce shorter, lower-error circuits. 💡 The standout result: the AI model independently rediscovered the Quantum Fourier Transform — without being told what the solution should look like. 🔗 Learn more: https://t.co/y0nh9i7Tiu

NVIDIAAIInfra's tweet photo. AI is unlocking breakthroughs in quantum computing.

🤝 We collaborated with the University of Innsbruck to tackle a core challenge in building useful quantum supercomputers: automatically designing efficient quantum circuits.

Using NVIDIA CUDA-Q, we built a multimodal diffusion model to synthesize quantum circuits from scratch, optimizing circuit structure and gate parameters simultaneously to produce shorter, lower-error circuits.

💡 The standout result: the AI model independently rediscovered the Quantum Fourier Transform — without being told what the solution should look like.

🔗 Learn more: https://t.co/y0nh9i7Tiu

29

634

104

70

50K

abheer retweeted

Prime Intellect @PrimeIntellect

5 days ago

Nemotron 3 Ultra is coming 💚 Frontier smart 5X faster 30% cheaper Proud to be part of the coalition

11

382

27

35

38K

Abheer Singh

@abheer

20 days ago

@nicoleegong @spencer_linter 💚🤙

0

1

0

29

abheer retweeted

NVIDIA AI

@NVIDIAAI

21 days ago

@xeophon @arcee_ai Open > closed

59

2K

191

126

167K

Abheer Singh

@abheer

22 days ago

@sundeep some of his best rapping in a min tbh, firm friends and make them know was a surprisingly introspective end to the album (iceman)

0

2

0

137

abheer retweeted

NVIDIA RTX Spark

@NVIDIARTXSpark

23 days ago

Run powerful, self-improving AI agents from your desk. @NousResearch's Hermes Agent brings reliable, self-evolving agentic AI to NVIDIA RTX PCs and DGX Spark. Get started. 👇