Inferact @Inferact - Twitter Profile

2 days ago

🚀 Excited to collab with @NVIDIARTXSpark pushing local AI agents forward across RTX + DGX Spark! Sharing our hands-on #vLLM + #DGXSpark blog with the @vllm_project community. We showed it off with a live 20 Questions game—first at our office warming, then at #MLSys2026, where curious attendees took turns stumping the model. Why vLLM + DGX Spark? You get a familiar serving workflow on local hardware: streaming responses, memory-efficient KV-cache management, runtime controls for unified memory, and the metrics to deploy on real workloads. ⚙️📊 Read the full blog and try it on your Spark 👇 https://t.co/GzGDMqRfQm

NVIDIA RTX Spark

@NVIDIARTXSpark

3 days ago

Local AI Agents are leveling up across DGX Spark & RTX PCs. NVIDIA OpenShell is coming to Windows alongside new agentic AI optimizations and creator app updates — including NVIDIA Broadcast 2.2, plus upcoming RTX acceleration for Adobe apps and Blender. More 👇

NVIDIARTXSpark's tweet photo. Local AI Agents are leveling up across DGX Spark & RTX PCs.

NVIDIA OpenShell is coming to Windows alongside new agentic AI optimizations and creator app updates — including NVIDIA Broadcast 2.2, plus upcoming RTX acceleration for Adobe apps and Blender.

More 👇 https://t.co/GroTth3TaM

6

255

22

26

33K

0

2

1

0

293

Inferact

@inferact

4 days ago

Honored to be on this list! 🎉 Cheers to AI infrastructure, @vllm_project, and building the future of inference 🚀

Redpoint @Redpoint

8 days ago

The Redpoint InfraRed 100 is now live. These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on. Congratulations to this year's honorees! Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 https://t.co/Y1y94ZwI5B

Redpoint's tweet photo. The Redpoint InfraRed 100 is now live.

These are the companies building the infrastructure that powers everything happening in AI right now, from world models and agent runtimes to the sandboxes, databases, and security tools agents depend on.

Congratulations to this year's honorees!

Read the full 2026 InfraRed Report: our state of the union on AI and cloud infrastructure 👉 https://t.co/Y1y94ZwI5B

23

312

52

232

148K

0

21

2

2K

Inferact

@inferact

8 days ago

Congrats @modal! 🚀 The shift toward teams owning their models is real, and the open inference layer is a big part of why. Excited to keep building alongside you.

Modal @modal

14 days ago

https://t.co/5DZKxwdGY5

31

503

42

126

560K

0

19

1

2K

Inferact

@inferact

9 days ago

🚀 Proud to see the Rust frontend land upstream in @vllm_project! Huge congrats to @BugenZhao for driving this work and introducing it at @PyTorch Meetup Singapore last week. A great milestone for the team and the vLLM community. 🦀 PR: https://t.co/gvV8eeKpni

vLLM

@vllm_project

9 days ago

🦀 The Rust frontend is officially merged into vLLM! As GPUs get faster, the frontend has become a real share of CPU time. The new Rust frontend is a drop-in alternative to the Python API server — same engine, same ZMQ boundary. Opt in with VLLM_USE_RUST_FRONTEND=1. Early numbers: on a preprocess-heavy workload, ~837 req/s vs ~162 req/s for default Python — ~5x in a single process. A few design choices we're excited about: • Layered crates with clear boundaries • Stream-native pipeline — non-streaming for free • Builds on stable Rust Huge thanks to @BugenZhao from @inferact for introducing the work at @PyTorch Meetup Singapore. https://t.co/Tw8PoIjbH9

26

919

104

250

83K

1

19

2

1

2K

Inferact

@inferact

12 days ago

That's a wrap on #MLSys2026 in Bellevue! 🚢 It was great meeting so many of you this past week — researchers, contributors, and friends of @vllm_project. The energy around inference systems right now is something else, and the conversations reminded us why this community matters. A few highlights from our team: 🎤 @rogerw0108 (co-founder, vLLM core maintainer) gave an invited talk, "Rethinking Open Source Contribution in the Age of AI Agents" — a maintainer's-eye view of how AI-generated PRs are reshaping the economics of open source, with concrete examples from vLLM. 🎤 @yifandotqiao gave a Lightning Talk, "Rethink LLM Inference Abstractions: New Trends and Challenges in LLM Serving" — on the combinatorial explosion across models, hardware, and workloads, and why serving at scale is increasingly a distributed systems problem. And of course — congrats to everyone who played 20 Questions with vLLM at our booth 🎯 Thanks to the MLSys organizers for putting on such a great week. If we missed you in Bellevue, our DMs are open — always happy to talk inference, vLLM, and what we're building. On to the next one. 🛠️

inferact's tweet photo. That's a wrap on #MLSys2026 in Bellevue! 🚢

It was great meeting so many of you this past week — researchers, contributors, and friends of @vllm_project. The energy around inference systems right now is something else, and the conversations reminded us why this community matters.

A few highlights from our team:

🎤 @rogerw0108 (co-founder, vLLM core maintainer) gave an invited talk, "Rethinking Open Source Contribution in the Age of AI Agents" — a maintainer's-eye view of how AI-generated PRs are reshaping the economics of open source, with concrete examples from vLLM.

🎤 @yifandotqiao gave a Lightning Talk, "Rethink LLM Inference Abstractions: New Trends and Challenges in LLM Serving" — on the combinatorial explosion across models, hardware, and workloads, and why serving at scale is increasingly a distributed systems problem.

And of course — congrats to everyone who played 20 Questions with vLLM at our booth 🎯

Thanks to the MLSys organizers for putting on such a great week. If we missed you in Bellevue, our DMs are open — always happy to talk inference, vLLM, and what we're building.

On to the next one. 🛠️

1

46

4

1

3K

Inferact

@inferact

14 days ago

Great cohosting this luncheon with @a16z and Mirendil at MLSys 2026 yesterday! 🙌 We brought together top researchers and AI systems engineers for an afternoon of rich conversations on @vllm_project, the frontier of inference, and where AI systems are headed next. Huge thanks to everyone who joined — the energy in the room was something else. This is exactly the kind of cross-pollination between labs, infra teams, and industry that pushes the whole stack forward. More to come. 👀 #MLSys2026 #vLLM

2

26

8

2

7K

Inferact

@inferact

14 days ago

🚀 Command A+ is ready to serve on vLLM: day-0. Frontier open-source, production-ready. Huge congrats to the Cohere and vLLM teams! Read more 👇 https://t.co/VWAeYGDmAR

Cohere

@cohere

15 days ago

Introducing: Cohere Command A+ We’ve created our most powerful LLM yet, optimized it to run on as little hardware as possible, and released it open-source for all.

103

3K

380

2K

730K

0

18

3

1

2K

Inferact

@inferact

16 days ago

Shoutout to our co-founder @KaichaoYou for making this fix and writing up the full story. From a 2024 hackathon bug → in-tree workarounds in vLLM → PyTorch Foundation TAC → fix landed in PyTorch 2.11.0. This kind of unglamorous, multi-org debugging makes the whole stack better. 👇

PyTorch

@PyTorch

17 days ago

vLLM and PyTorch worked together to fix a long-standing aarch64 install headache — as of PyTorch 2.11.0, pip install torch on GB200 / GB300 / GH200 just works. 🎉 What changed: PyTorch 2.11.0 now publishes CUDA-enabled aarch64 wheels to the default PyPI index. No more custom --index-url flags. No more transitive dependencies silently swapping your GPU build for the CPU wheel. New users on Grace Hopper and Grace Blackwell systems can follow the standard install instructions and have vLLM work the first time. In our latest blog, @KaichaoYou (co-founder @inferact, Lead Maintainer @vllm_project) shares the full story: 🐛 A 2024 hackathon bug bringing up vLLM on GH200 🔧 vLLM's in-tree workarounds (use_existing_torch.py and [tool.uv] build-isolation passthrough) 🤝 From GitHub issue to PyTorch Foundation TAC discussion 🚀 The fix landing in PyTorch 2.11.0, driven by NVIDIA and PyTorch core. A great example of cross-project collaboration under the PyTorch Foundation umbrella — and a reminder that boring infrastructure wins compound. Read the full story: https://t.co/JGnJ1X7sxl ✍️ : Piotr Bialecki (@nvidia) — @ptrblck_de, Alban Desmaison (@Meta), Andrey Talman (@Meta), Nikita Shulga (@Meta)

4

159

19

26

40K

1

43

5

3

4K

Inferact

@inferact

17 days ago

We’re at MLSys 2026 in Bellevue this week! ⛴️ Come find the Inferact team at Booth #2 in the Evergreen Ballroom. Talks: • @rogerw0108 (co-founder at Inferact) — “Rethinking Open Source Contribution in the Age of AI Agents”, Mon 5/18, 11:36 AM • @yifandotqiao (vLLM core contributor) — YPS Sponsor Lightning Talk — Mon 5/18, 11:36 AM At the booth: • 20 Questions with vLLM — a game with vLLM running on DGX Spark, with prizes 🎯 • vLLM + Inferact swag 🧢 • Inferact team members! happy to talk inference and vLLM If you’re attending, come say hi, chat about inference, or learn what we’re building!

inferact's tweet photo. We’re at MLSys 2026 in Bellevue this week! ⛴️

Come find the Inferact team at Booth #2 in the Evergreen Ballroom.

Talks:
• @rogerw0108 (co-founder at Inferact) — “Rethinking Open Source Contribution in the Age of AI Agents”, Mon 5/18, 11:36 AM
• @yifandotqiao (vLLM core contributor) — YPS Sponsor Lightning Talk — Mon 5/18, 11:36 AM

At the booth:
• 20 Questions with vLLM — a game with vLLM running on DGX Spark, with prizes 🎯
• vLLM + Inferact swag 🧢
• Inferact team members! happy to talk inference and vLLM

If you’re attending, come say hi, chat about inference, or learn what we’re building!

1

28

3

5

2K

Inferact

@inferact

20 days ago

We're onto Inferact's second office this year! Yesterday, we finally broke it in with an office warming. It's amazing to see how far we've come. The vLLM ecosystem has been growing at lightning pace, and we've been lucky to scale alongside it: helping teams serve inference faster, cheaper, and at scale. Thank you to everyone who made it out yesterday — customers, partners, friends, and the whole Inferact team. It meant a lot to celebrate this milestone together. We're hiring across all teams. If you want to join one of the fastest-growing AI infra companies and power the next generation of AI, check out our careers page or DM us. Excited for many more office warmings to come!

inferact's tweet photo. We're onto Inferact's second office this year! Yesterday, we finally broke it in with an office warming.

It's amazing to see how far we've come. The vLLM ecosystem has been growing at lightning pace, and we've been lucky to scale alongside it: helping teams serve inference faster, cheaper, and at scale.

Thank you to everyone who made it out yesterday — customers, partners, friends, and the whole Inferact team. It meant a lot to celebrate this milestone together.

We're hiring across all teams. If you want to join one of the fastest-growing AI infra companies and power the next generation of AI, check out our careers page or DM us.

Excited for many more office warmings to come!

11

116

11

10

17K

Inferact

@inferact

23 days ago

Proud of what the team has shipped here! And prouder that all this work is in vLLM main or heading upstream 🚀

vLLM

@vllm_project

23 days ago

vLLM tops the Artificial Analysis leaderboard 🎉 vLLM tops @ArtificialAnlys on DeepSeek V3.2 and ranks among the top deployments of MiniMax-M2.5 and Qwen 3.5 397B. The leading deployments of these models are now open source. How each result was built: 🔹 DeepSeek V3.2 — Aggressive op fusion across the attention path collapsed ~33 per-layer kernels down toward ~10. 🔹 MiniMax-M2.5 — Custom EAGLE3 draft trained against the target's own token distribution via TorchSpec, plus a custom QK-norm fusion for MiniMax's TP-aware attention. 🔹 Qwen 3.5 397B — Targeted fusions plus a QK-norm fix for Qwen's linear-attention path. Every optimization is in vLLM main or on its way upstream. Huge thank you to @inferact, @digitalocean, @nvidia, @RedHat_AI, and the vLLM community 🙏 Full breakdown 👇 https://t.co/MzxANVvhHQ

2

152

30

45

23K

0

15

1

0

2K

inferact retweeted

vLLM

@vllm_project

23 days ago

vLLM tops the Artificial Analysis leaderboard 🎉 vLLM tops @ArtificialAnlys on DeepSeek V3.2 and ranks among the top deployments of MiniMax-M2.5 and Qwen 3.5 397B. The leading deployments of these models are now open source. How each result was built: 🔹 DeepSeek V3.2 — Aggressive op fusion across the attention path collapsed ~33 per-layer kernels down toward ~10. 🔹 MiniMax-M2.5 — Custom EAGLE3 draft trained against the target's own token distribution via TorchSpec, plus a custom QK-norm fusion for MiniMax's TP-aware attention. 🔹 Qwen 3.5 397B — Targeted fusions plus a QK-norm fix for Qwen's linear-attention path. Every optimization is in vLLM main or on its way upstream. Huge thank you to @inferact, @digitalocean, @nvidia, @RedHat_AI, and the vLLM community 🙏 Full breakdown 👇 https://t.co/MzxANVvhHQ

2

152

30

45

23K

inferact retweeted

DigitalOcean

@digitalocean

about 1 month ago

Among the fastest DeepSeek V3.2, MiniMax-M2.5, and Qwen 3.5 397B inference in the market, per Artificial Analysis benchmarks (April 2026). ⚡️🤖 Sub-1-second TTFT. 230 tokens per second. Co-designed every layer of the stack with @Inferact, performance optimized @vllm_project, all on @NVIDIA HGX B300. Live on DigitalOcean Serverless Inference now. Full breakdown in the comments. ⬇️

1

34

9

8

39K

inferact retweeted

Roger Wang

@rogerw0108

about 1 month ago

🚀🚀🚀 https://t.co/SibONtoSCo just got merged to main! Huge shoutout to the entire team @inferact that worked on day-0 support of DeepSeek V4 and our partner @NVIDIAAI for the collaboration on day-0 large scale serving enablement! More optimizations coming soon - stay tuned!

0

61

7

10

4K

inferact retweeted

SemiAnalysis

@SemiAnalysis_

about 1 month ago

DAVIS, APRIL 25, 2026 — InferenceX has added DeepSeekv4 for @vllm_project 's day 0 support for GB200 disagg! Great work to @flowpow123 @rogerw0108 @NVIDIAAIDev @inferact for the fast support and engineering!

SemiAnalysis_'s tweet photo. DAVIS, APRIL 25, 2026 — InferenceX has added DeepSeekv4 for @vllm_project 's day 0 support for GB200 disagg! Great work to @flowpow123 @rogerw0108 @NVIDIAAIDev @inferact for the fast support and engineering! https://t.co/lQ6u1BI8nv

0

93

12

15

60K

Inferact

@inferact

Last Seen Users on Sotwe

Trends for you

Most Popular Users