Aurick Qiao

24 days ago

I will be at MLsys 2026 in Bellevue next week. We are hiring across ML infrastructure including pre-training, post-training, and inference. Come find me and chat!

25 days ago

People talk, listen, watch, think, and collaborate at the same time, in real time. We've designed an AI that works with people the same way. We share our approach, early results, and a quick look at our model in action. https://t.co/AFJZ5kH7Ku

461

16K

12K

aurickq retweeted

Long Lian

@LongTonyLian

24 days ago

🚀 Directionally very good 🚀

24 days ago

Very excited to share a preview of what we’ve been working on!

building @vllm_project at @meta | ex-openai | cs phd @ 🌁 uc berkeley | machine learning system | the real agi is the friends we made along the way

25 days ago

461

16K

12K

aurickq retweeted

Mira Murati

@miramurati

25 days ago

Today we're sharing our work on interaction models. A new class of model trained from scratch to handle real-time interaction natively, instead of gluing it onto a turn-based one. https://t.co/MoS5s4cm60

330

938

Who to follow

Zhuohan Li

@zhuohan123

Yiying Zhang

@yiying__zhang

Founder and CEO of GenseeAI, Associate Professor of Computer Science at UCSD. LLM serving, AI Workflows, Agents

Eric Xing

@ericxing

Researcher, educator, entrepreneur, and administrator in computer science, artificial intelligence, and healthcare.

aurickq retweeted

Ying Sheng

@ying11231

about 1 month ago

Congrats @radixark ! From SGLang @lmsysorg to Miles, and to future products, RadixArk is dedicated to building a crucible capable of repeatedly producing cutting-edge AI, bringing the best of AI into every household. We believe in a future of AI diversity and hope to drive the integration of AI into every aspect of production and daily life. In the future we envision, AI will become a partner to many companies and individuals, finding ways to self-evolve—in production, in daily companionship, and within virtual worlds. Everything we have experienced and will continue to experience in the SGLang and Miles open-source communities is unforgettable and highly anticipated. It has been both demanding and exhilarating, allowing us to see friendship, the world, and the boundaries. Over the past six months, I have witnessed for the first time how a united team moves forward hand in hand, and how deeply passionate they are about creation. Each of us has taken on our respective roles and numerous new tasks for the first time; we are all stepping out of our comfort zones, growing, and creating at a rapid pace. "It’s the step-by-step journey of a thousand miles that has carried us here today, and the same relentless march that will lead us into the tens of thousands of miles yet to come." In an era where AI has made ordinary productivity cheaper, relentless, day-to-day refinement has increasingly become the rare key that drives innovation and the future. We hope this will forever remain the soul of RadixArk's culture: focused, uncompromising, humble, and fearless. The underlying logic of creation is not the deliberate pursuit of novelty, but rather independent thinking that remains unswayed by temptation, paired with a meticulous drive for perfection.

211

19K

aurickq retweeted

Hao Zhang

@haozhangml

3 months ago

Real-time videogen has been something I have been pushing hard at FastVideo Team. And Today, we have a big update -- we just made it: Now you can create a 5s 1080p Video in 4.5s with FastVideo on a Single GPU I believe this is the fastest 1080p text-image-to-audio-video pipeline ever! Try our free demo to feel the speed and quality: https://t.co/xmJgRioM1K and give us feedback Blog: https://t.co/OnplmWsJ1n

109

16K

aurickq retweeted

Neal Wu

@neal_wu

3 months ago

I joined @thinkymachines to work with @johnschulman2, @miramurati, @soumithchintala, @dchaplot, @alexgartrell, and others on the future of collaborative AI! And I can confirm that we do have 45 lb weights. If you're interested in training weights and/or Tinkering with us, join us! https://t.co/4JuXQdCybv

neal_wu's tweet photo. I joined @thinkymachines to work with @johnschulman2, @miramurati, @soumithchintala, @dchaplot, @alexgartrell, and others on the future of collaborative AI!

And I can confirm that we do have 45 lb weights.

If you're interested in training weights and/or Tinkering with us, join us! https://t.co/4JuXQdCybv

206

171K

aurickq retweeted

3 months ago

We are partnering with @nvidia to power our frontier model training and platforms delivering customizable AI. https://t.co/JFlf9hJkKh

thinkymachines's tweet photo. We are partnering with @nvidia to power our frontier model training and platforms delivering customizable AI.

https://t.co/JFlf9hJkKh https://t.co/M23RB8dZxI

101

166

292

664K

aurickq retweeted

Woosuk Kwon

@woosuk_k

4 months ago

Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team

$woosuk_k's tweet photo. Today, we're proud to announce @inferact, a startup founded by creators and core maintainers of @vllm_project, the most popular open-source LLM inference engine. Our mission is to grow vLLM as the world's AI inference engine and accelerate AI progress by making inference cheaper and faster. The Challenge Inference is not solved. It's getting harder. Models grow larger. New architectures proliferate: mixture-of-experts, multimodal, agentic. Every breakthrough demands new infrastructure. Meanwhile, hardware fragments: more accelerators, more programming models, and more combinations to optimize. The capability gap between models and the systems that serve them is widening. Left this way, the most capable models remain bottlenecked and with full scope of their capabilities accessible only to those who can build custom infrastructure. Close the gap, and we unlock new possibilities. And the problem is growing. Inference is shifting from a fraction of compute to the majority: test-time compute, RL training loops, synthetic data. We see a future where serving AI becomes effortless. Today, deploying a frontier model at scale requires a dedicated infrastructure team. Tomorrow, it should be as simple as spinning up a serverless database. The complexity doesn't disappear; it gets absorbed into the infrastructure we're building. Why Us vLLM sits at the intersection of models and hardware: a position that took years to build. When model vendors ship new architectures, they work with us to ensure day-zero support. When hardware vendors develop new silicon, they integrate with vLLM. When teams deploy at scale, they run vLLM, from frontier labs to hyperscalers to startups serving millions of users. Today, vLLM supports 500+ model architectures, runs on 200+ accelerator types, and powers inference at global scale. This ecosystem, built with 2,000+ contributors, is our foundation. We've been stewards of this engine since its first commit. We know it inside out. We deployed it at frontier scale—in research and in production. Open Source vLLM was built in the open. That's not changing. Inferact exists to supercharge vLLM adoption. The optimizations we develop flow back to the community. We plan to push vLLM's performance further, deepen support for emerging model architectures, and expand coverage across frontier hardware. The AI industry needs inference infrastructure that isn't locked behind proprietary walls. Join Us Through the open source community, we are fortunate to work with some of the best people we know. For @inferact, we're hiring engineers and researchers to work at the frontier of inference, where models meet hardware at scale. Come build with us. We're fortunate to be supported by investors who share our vision, including @a16z and @lightspeedvp who led our $150M seed, as well as @sequoia, @AltimeterCap, @Redpoint, @ZhenFund, The House Fund, @strikervp, @LaudeVentures, and @databricks. - @woosuk_k, @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05 and the rest of the founding team$

182

129

353

482K

aurickq retweeted

Woosuk Kwon

@woosuk_k

4 months ago

It still feels a little unreal to look back at how far @vllm_project has come. What started as a small research project that Zhuohan and I launched ended up receiving so much love and connecting me with people who are now some of my closest friends. In so many ways, I already feel incredibly lucky for what this journey has given me. To be honest, my path with vLLM hasn’t been perfectly straight. Over the past three years, my passion dipped at times, and I did spend my energy exploring things I thought were more interesting than vLLM and inference. vLLM is what it is today because of the community, and I’m truly grateful for their commitment. My view on inference also evolved a lot along the way. What once felt mostly “solved” turned out to be far from it. The rapid pace of new models, increasingly complex architectures, diverse hardware setups, and agents have made inference genuinely hard. The need for strong inference infrastructure has only kept growing, and it became clear just how much important work remains. Somewhere along that journey, I realized how special this work really is and how uniquely positioned vLLM is. Now, I’m committed to pushing it all the way. With that, I started @inferact with @simon_mo_, @KaichaoYou, @rogerw0108, @istoica05, and amazing founding team from both inside and outside the vLLM community. I’m deeply grateful to our investors, including @a16z and @lightspeedvp, for believing in us and giving us this opportunity. Excited for this next chapter, and looking forward to sharing more soon.

152

15K

aurickq retweeted

Kevin Kwok

@kevinakwok

5 months ago

Wow TML really *is* taking a different approach from the other AI labs

360

251K

aurickq retweeted

6 months ago

Tinker is now generally available. We also added support for advanced vision input models, Kimi K2 Thinking, and a simpler way to sample from models. https://t.co/nvaJHkGxc0

172

669

6 months ago

@jiangelaa @worktrace_ai @deepakv91 Congrats!!

202

6 months ago

@haozhangml Thanks Hao!

159

6 months ago

@ying11231 @radixark @lmsysorg @slime_framework @jxwuyi @xai Congratulations!

471

Gabriele Oliaro @GabrieleOliaro

6 months ago

After two amazing years at Snowflake AI Research, I have joined @thinkymachines! I am excited to work with the incredible team here and build world-class ML systems for the next generation of multimodal AI

187

25K

aurickq retweeted

Zhihao Jia

@JiaZhihao

6 months ago

Super excited about this work! 🔥 SuffixDecoding accelerates multi-round agent serving by reusing and optimizing over previous agent iterations—5x speedups on AgenticSQL. Come see @GabrieleOliaro’s #NeurIPS2025 Spotlight!

aurickq retweeted

6 months ago

🐢 Are your #LLM #agents too slow? 🚀 Introducing SuffixDecoding: make agentic workloads run up to 5.3x faster! 🎯 Emerging AI workflows suffer high latency. We fix this with extreme speculative decoding using suffix trees. 🌟 Come see our #NeurIPS2025 Spotlight!

GabrieleOliaro's tweet photo. 🐢 Are your #LLM #agents too slow?

🚀 Introducing SuffixDecoding: make agentic workloads run up to 5.3x faster!

🎯 Emerging AI workflows suffer high latency. We fix this with extreme speculative decoding using suffix trees.

🌟 Come see our #NeurIPS2025 Spotlight!

6 months ago

Suffix Decoding is adopted at @Snowflake to accelerate Text2SQL by 1.85x: https://t.co/t7kZnCD2cQ Suffix-tree based speculative decoding is adopted by several works to speed up sampling in RL training: https://t.co/TAYTamV4xx https://t.co/2SljvcLXFm https://t.co/DymZra3OFE

369

6 months ago

Suffix Decoding is at #NeurIPS2025 as a 🏅spotlight! It accelerates LLM inference for coding, agents, and RL. We also optimized its speculation speed by 7.4x and merged it into vLLM (incoming to SGLang). Talk to @GabrieleOliaro or me at poster #816 Friday 11am! Links in🧵

aurickq's tweet photo. Suffix Decoding is at #NeurIPS2025 as a 🏅spotlight! It accelerates LLM inference for coding, agents, and RL.

We also optimized its speculation speed by 7.4x and merged it into vLLM (incoming to SGLang).

Talk to @GabrieleOliaro or me at poster #816 Friday 11am!

Links in🧵 https://t.co/PbqixBBC8r

13K