Weili Xu @_weilix - Twitter Profile

Very sad news for the LLM research and open-source community. Does this mean PhD researchers in frontier LLMs, or contributors to open-source LLM infrastructure like Megatron, FSDP, Verl, SGLang, and vLLM, may be using a degraded Claude model in their daily work without being notified?

guohao_li's tweet photo. Very sad news for the LLM research and open-source community. Does this mean PhD researchers in frontier LLMs, or contributors to open-source LLM infrastructure like Megatron, FSDP, Verl, SGLang, and vLLM, may be using a degraded Claude model in their daily work without being notified?

35

604

37

116

59K

Weili Xu @_weilix

11 days ago

As part of Dynamo 2.0, the program abstraction proposed in ThunderAgent is being standardized as part of the nvext.agent_context protocol in dynamo. Inference scheduling/KV cache management with agent lifecycle awareness isn't future anymore, it’s the trend happening right now!

Hao Kang

@GT_HaoKang

11 days ago

Excited to share that ThunderAgent has been integrated into NVIDIA Dynamo as an experimental router for agentic workloads! ThunderAgent was designed to schedule at the granularity of agent runs, making agentic serving/rl upto 4x faster! Huge thanks to @0xishand , @KranenKyle , and the Dynamo team. They have been exceptionally efficient and proactive — the team had already started pushing this forward even before I officially joined @nvidia . Looking forward to seeing ThunderAgent ideas further evolve within Dynamo. And thanks for the help from @togethercompute Link: https://t.co/CzteUYO0JD @simran_s_arora @Chenfeng_X @_weilix @yinfang_chen #AI #MLsys #Agent #Nvidia

GT_HaoKang's tweet photo. Excited to share that ThunderAgent has been integrated into NVIDIA Dynamo as an experimental router for agentic workloads!

ThunderAgent was designed to schedule at the granularity of agent runs, making agentic serving/rl upto 4x faster!

Huge thanks to @0xishand , @KranenKyle , and the Dynamo team. They have been exceptionally efficient and proactive — the team had already started pushing this forward even before I officially joined @nvidia .

Looking forward to seeing ThunderAgent ideas further evolve within Dynamo. And thanks for the help from @togethercompute

Link: https://t.co/CzteUYO0JD
@simran_s_arora @Chenfeng_X @_weilix @yinfang_chen
#AI #MLsys #Agent #Nvidia

14

86

8

23

16K

1

7

1

3

670

Weili Xu @_weilix

about 1 month ago

fun part in the post: #OpenSourceAl😅

Qwen

@Alibaba_Qwen

about 1 month ago

📢 Official Announcement: Qwen Partners with Fireworks AI to Accelerate Access to Qwen Family Models We are pleased to announce a strategic partnership between Qwen and Fireworks AI to deliver optimized, production-ready deployment of Qwen's closed weights models via the Fireworks Platform. @FireworksAI_HQ This collaboration empowers developers and enterprises to: ✅ Deploy Qwen models with lower latency and reduced fine tuning and inference costs ✅ Leverage enterprise-grade reliability, security, and scalability ✅ Integrate seamlessly into modern AI workflows 🔹 Get started with Qwen on Fireworks: https://t.co/SEGxfJAGM4 #Qwen #FireworksAI #OpenSourceAI #LLM #AIInfrastructure #ResponsibleAI #DeveloperCommunity

58

948

65

154

242K

0

113

_weilix retweeted

James Zou @james_y_zou

about 1 month ago

Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️ You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free.

james_y_zou's tweet photo. Big Update🤩: #paperclip now includes full papers from all of arXiv, PubMed Central and 150 million abstracts!🖇️

You can give your LLM all that knowledge in one line—all optimally indexed for AI agents. Much more thorough and ~100x faster than web search, and free. https://t.co/ZoHWcx7MXg

43

2K

245

2K

185K

Weili Xu @_weilix

about 1 month ago

@Barret_China 有一个疑问，Fork 模式真的能复用父会话的 Prompt Cache嘛？会不会出现fork出来的agent因为能用的tool不一样（比如不能再fork新的agent）导致context从system prompt某处开始就有区别？

0

212

_weilix retweeted

Together AI @togethercompute

about 2 months ago

Introducing Kimi K2.6 from @Kimi_Moonshot, a multimodal agentic model with Agent Swarm scaling to 300 sub-agents and long-horizon coding stability. AI natives can now use Kimi K2.6 on Together AI and benefit from reliable inference for production-scale autonomous agent workflows.

togethercompute's tweet photo. Introducing Kimi K2.6 from @Kimi_Moonshot, a multimodal agentic model with Agent Swarm scaling to 300 sub-agents and long-horizon coding stability. AI natives can now use Kimi K2.6 on Together AI and benefit from reliable inference for production-scale autonomous agent workflows. https://t.co/Fq3lz2vHkp

2

15

3

5

9K

Weili Xu @_weilix

2 months ago

@MIT_CSAIL will code be released?

0

89

Weili Xu @_weilix

3 months ago

@GenAI_is_real can you draft a roadmap to support SSD (speculative speculative decoding, https://t.co/hmSdwEaB8t) in sglang?

0

1

0

49

_weilix retweeted

NICE AI Talk

@academic_nice

3 months ago

NICE Talk 141🌟invites Ph.D. at Georgia Tech Hao Kang @GT_HaoKang to discuss ThunderAgent: 4× Faster LLM Agent Inference! Time ⏰ PST 3.07 18:00–19:00 ⏰ EST 3.07 21:00–22:00 ⏰ Beijing 3.08 10:00–11:00 Watch live: https://t.co/4MUXa6HIKK Register: https://t.co/vP7exZ3tRS In this talk, the speaker will talk about: 🚀 How can we make LLM agent workflows faster, simpler, and more robust? ❌ Traditional request-level engines (vLLM, SGLang) struggle with KV cache thrashing, memory imbalance, and resource leaks. ✅ ThunderAgent introduces Program Abstraction, treating multi-step agent workflows as programs, unifying GPU, CPU, and remote tool scheduling. With just two lines of code, ThunderAgent boosts inference throughput by 1.5–3.6×, rollout throughput by 1.8–3.9×, and saves 4.2× disk space, while ensuring high concurrency stability. Join us to explore a principled, program-level approach to distributed agent inference and RL rollouts. #AI #LLM #AgenticAI #ReinforcementLearning #DistributedSystems #ProgramAbstraction #ThunderAgent

academic_nice's tweet photo. NICE Talk 141🌟invites Ph.D. at Georgia Tech Hao Kang @GT_HaoKang to discuss ThunderAgent: 4× Faster LLM Agent Inference!

Time
⏰ PST 3.07 18:00–19:00
⏰ EST 3.07 21:00–22:00
⏰ Beijing 3.08 10:00–11:00

Watch live: https://t.co/4MUXa6HIKK
Register: https://t.co/vP7exZ3tRS

In this talk, the speaker will talk about:
🚀 How can we make LLM agent workflows faster, simpler, and more robust?
❌ Traditional request-level engines (vLLM, SGLang) struggle with KV cache thrashing, memory imbalance, and resource leaks.
✅ ThunderAgent introduces Program Abstraction, treating multi-step agent workflows as programs, unifying GPU, CPU, and remote tool scheduling.
With just two lines of code, ThunderAgent boosts inference throughput by 1.5–3.6×, rollout throughput by 1.8–3.9×, and saves 4.2× disk space, while ensuring high concurrency stability.

Join us to explore a principled, program-level approach to distributed agent inference and RL rollouts.

#AI #LLM #AgenticAI #ReinforcementLearning #DistributedSystems #ProgramAbstraction #ThunderAgent

2

9

4

0

5K

Weili Xu @_weilix

4 months ago

automatas we're so back!

Xinyu Yang

@Xinyu2ML

4 months ago

I used to be a strong believer in the “Bitter Lesson.” However, my view began to shift once I realized that real-world agentic systems inevitably need to call external tools due to limitations in knowledge acquisition, precision computation, and environment interaction. An important observation is that LLMs, especially when deployed as agents, are not purely connectionist systems. Instead, they are better understood as a hybrid of connectionism and symbolism. While we encode discrete tokens into continuous representations through neural networks, we ultimately decode them back into symbolic forms to operate in the real world. For example, special tokens such as <EOS> serve as explicit symbolic markers that deterministically control termination. This illustrates that even within LLMs, symbolic structure plays a fundamental operational role. This reflects something deeper: humans use discrete symbols to make sense of a continuous world. We impose structure, define rules, and create abstractions so that reasoning and coordination become possible. Symbolism is not a relic of pre-neural AI; it is a mechanism for control. f we want LLMs to be controllable, we cannot ignore their symbolic layer. The question is not whether to use symbols, but how to use them more flexibly. We need better ways to integrate discrete symbolic structure with continuous neural computation, rather than pretending that scaling alone will dissolve the need for structure.

3

40

2

25

11K

0

1

0

114

_weilix retweeted

Dan McAteer

@daniel_mac8

4 months ago

GPT-5.3-Codex + the Codex app is the best AI coding tool available right now. Slept on it for a bit. Likely going to move back to a ChatGPT Pro sub from Claude MAX because of how good it is. It's so precise, accurate and excellent at following instructions. There are trade-offs in that it has a more "machine-like" personality than Claude. I do still love Claude. But for getting software dev work done, Codex is the best option right now. It's two things: 1. OpenAI is clearly investing a lot of their human talent into making Codex better. 2. They are co-designing the model and harness together. And I believe that they have the most rapid post-training capabilities which is why you see a new model iteration every month for the last few months. Endorsing Codex.

daniel_mac8's tweet photo. GPT-5.3-Codex + the Codex app is the best AI coding tool available right now.

Slept on it for a bit.

Likely going to move back to a ChatGPT Pro sub from Claude MAX because of how good it is.

It's so precise, accurate and excellent at following instructions. There are trade-offs in that it has a more "machine-like" personality than Claude.

I do still love Claude.

But for getting software dev work done, Codex is the best option right now.

It's two things:

1. OpenAI is clearly investing a lot of their human talent into making Codex better.

2. They are co-designing the model and harness together.

And I believe that they have the most rapid post-training capabilities which is why you see a new model iteration every month for the last few months.

Endorsing Codex.

42

331

18

96

237K

_weilix retweeted

Chenfeng_X

@Chenfeng_X

4 months ago

Check our ThunderAgent (https://t.co/fMko6C1M1i) and @GT_HaoKang 's post 👇, 2 lines of code, up to 3.9x throughputs improvement, 4.2x disk memory saving on your agentic inference system 😉

Chenfeng_X's tweet photo. Check our ThunderAgent (https://t.co/fMko6C1M1i) and @GT_HaoKang 's post 👇, 2 lines of code, up to 3.9x throughputs improvement, 4.2x disk memory saving on your agentic inference system 😉

0

10

3

0

1K

_weilix retweeted

Simran Arora

@simran_s_arora

4 months ago

Checkout ThunderAgent led by @GT_HaoKang, intern at @togethercompute! An agentic workflow involves multiple model and tool requests, but inference systems make scheduling decisions on a per-request basis. ThunderAgent introduces a simple "program abstraction" to track the end to end workflow state and improve agentic inference throughput! 🔥

1

47

9

12

4K

_weilix retweeted

Hao Kang

@GT_HaoKang

4 months ago

🔥Modifying 2 lines of code and get your agentic serving/rollout up to 3.9x faster losslessly! ⚡️Say hello to ThunderAgent, a fast, simple, and program-aware agentic Inference System. 🥇 We propose a program abstraction to schedule all GPU and CPU resources, the first principled approach for distributed agentic inference and rollout. 🌐 Blog: https://t.co/PAcgTZzlhD 💻 Code: https://t.co/nr7XJj1L7B 📜 Paper: https://t.co/aCD6POzwkU #AI #ThunderAgent #LLMAgent #Mlsys 1/n

3

109

24

58

31K

_weilix retweeted

Cloudflare @Cloudflare

4 months ago

Time to consider not just human visitors, but to treat agents as first-class citizens. Cloudflare’s network now supports real-time content conversion to Markdown at the source using content negotiation headers. https://t.co/B7wYH4PtA8

167

5K

550

3K

2M

_weilix retweeted

alex zhang

@a1zhang

4 months ago

@sama can we start getting free boba if we order through chatgpt

3

78

3

4

8K

_weilix retweeted

Jia-Bin Huang

@jbhuang0604

5 months ago

Beyond softmax attention Linear attention and its variants enable faster inference without growing the KV cache. Let’s learn the core ideas behind efficient sequence modeling. 👇 https://t.co/geNiBXKdlI

jbhuang0604's tweet photo. Beyond softmax attention

Linear attention and its variants enable faster inference without growing the KV cache.

Let’s learn the core ideas behind efficient sequence modeling. 👇
https://t.co/geNiBXKdlI https://t.co/tCDldDuida

13

747

92

733

92K

_weilix retweeted

Together AI @togethercompute

5 months ago

Learn how @cursor_ai partnered with Together AI, the AI Native Cloud, to deliver real-time inference for AI-powered coding. Cursor's in-editor agents generate code while developers actively edit — requiring responses inside the editor's feedback loop. Together AI built the infrastructure to meet those strict latency targets at scale.

1

45

10

9

18K

Weili Xu

@_weilix

Last Seen Users on Sotwe

Trends for you

Most Popular Users