christina

@luoluo

yap gear @mit reasoned @xai

Joined May 2021

1.4K Following

3.7K Followers

396 Posts

Pinned Tweet

christina

@luoluo

11 months ago

Grok 4 is live: https://t.co/78oPd90REk

Daniel

@nearlydaniel

11 months ago

War Room squad locked in

209

174

223

663K

257

98K

christina

@luoluo

about 18 hours ago

love this thesis!! humans are beautiful

carlo agostinelli

@carloagostinel2

about 19 hours ago

Been fun working on this over the past few weeks with the team. We wake up every day looking for people who are still unknown to the world but won’t be forever. The founders we back today will one day have biographies of their own. @novaholdings

12K

christina

@luoluo

3 days ago

@TianweiY Amazing!! will stay tune (hopefully a tech report?)

christina

@luoluo

3 days ago

"With that, we reframed multimodal generation as structured text/code generation" Text is ambiguous but code is not. Would love to see more results in having LLM natively think like its coding.

Tianwei Yin

@TianweiY

3 days ago

1/ Our new @reve image model is now #2 on the @arena text-to-image leaderboard — behind only GPT Image 2, ahead of Nano Banana Pro, Microsoft, xAI and everyone else. And it's a 125 point jump over Reve 1.5 from just 3 months ago. The research story behind it 🧵👇

510

130

Who to follow

Zining Zhu

@zhuzining

Assistant Professor @FollowStevens (2024-) PhD @UofT, @VectorInst Areas: #NLProc #Explainable #AI

luoluo retweeted

Kasey Zhang

@_WEEXIAO

4 days ago

https://t.co/GI1Q5J0rd5

216

254

32K

luoluo retweeted

John Schulman

@johnschulman2

8 days ago

Glad to see this -- renderers are a foundational component of the LLM stack. Renderers map between tokens and messages, which are invariant to tokenizer and formatting details. Most APIs, datasets, and RL environments are defined in terms of messages. Getting the details wrong leads to train-test mismatches, caching inefficiencies, and prompt injection vulnerabilities. We included a renderers module in Tinker Cookbook, but it makes sense as a standalone library.

665

379

75K

christina

@luoluo

14 days ago

@ti_morse wow rlly

127

christina

@luoluo

17 days ago

meow meow meow meow

Danny Lin

@kdrag0n

18 days ago

bouncy terminal

10K

christina

@luoluo

18 days ago

More to come ! Also it’s pretty fun to posttrain a capable 1B model

Sapient Intelligence @Sapient_Int

18 days ago

Download HRM-Text 🔗 Github: https://t.co/GKR8vFJZND Hugging Face: https://t.co/X7DW812tq2

272

258

35K

144

15K

christina

@luoluo

19 days ago

@BrendanFoody pretraining commoditizing soon (now)

152

christina

@luoluo

19 days ago

Been watching the team grinding on this - its revolutionary - its the neolab era.

Sapient Intelligence @Sapient_Int

19 days ago

Tomorrow, we will unveil a new path to general intelligence. Lean. Powerful. Efficient. The countdown is on⏳

606

220

45K

luoluo retweeted

Eric Jang

@ericjang11

22 days ago

For the last few months I've been working on a from-scratch implementation of AlphaGo, a 2016 AI breakthrough that inspired me to get into deep learning. My casual understanding of AlphaGo was "search-augmented deep neural networks trained with self-play", but I wanted to go deeper and understand it by creating it. Frontier deep learning research has always been expensive, but any given capability gets cheaper very quickly. In 2026, you no longer need DeepMind's resources to train a strong Go AI - you can vibe code all of it yourself for just a few thousand dollars of rented compute. It was a huge honor to be invited to teach this with @dwarkesh_sp on @dwarkeshpodcast I am an AlphaGo & Go apprentice, not a master, so all factual errors in the podcast are mine. Web version of tutorial: https://t.co/Xkf9VsgtuT Code: https://t.co/rWKOwclPDg Play the go bot here: https://t.co/aVglJXldVX

182

534K

christina

@luoluo

25 days ago

Visual Coding Full Circle Moment @SeongsikKi5837 🐐

Seongsik Kim

@SeongsikKi5837

25 days ago

1. (System design) - The Interaction Models see your screen and collaborates with you live. Here we're building a scalable system architecture together — no copy-pasting, no switching tabs, just thinking out loud and drawing on the screen together.

318

208

103K

luoluo retweeted

Jiayi Weng

@Trinkle23897

29 days ago

Codex grew programmatic policies with no neural nets: max score on Breakout, and SOTA-level scores on MuJoCo. Maybe heuristics were not too weak. Maybe they were just too expensive to maintain. Maybe it's the next paradigm. https://t.co/1ZaIneleuW

234

luoluo retweeted

RadixArk

@radixark

about 1 month ago

Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital. RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas. RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale. RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI. We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others. Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.

radixark's tweet photo. Today, we are thrilled to officially launch RadixArk with $100M in Seed funding at a $400M valuation. The round was led by @Accel and co-led by @sparkcapital.

RadixArk exists to make frontier AI infrastructure open and accessible to everyone. Today, the systems behind the most capable AI models are concentrated in a small number of companies. As a result, most AI teams are forced to rebuild training and inference stacks from scratch, duplicating the same infrastructure work instead of focusing on new models, products, and ideas.

RadixArk was founded to change that. We are building an AI platform that makes it easier for teams to train and serve the best models at scale.

RadixArk comes from the open-source community. We started with SGLang, where many of us are core developers and maintainers, and expanded our work to Miles for large-scale RL and post-training. We will continue contributing to both projects and working with the community to make them the strongest open-source infrastructure foundations for frontier AI.

We would like to thank our long-term partners, contributors, and the broader SGLang community for believing in this mission. We're also grateful to @Accel and @sparkcapital, NVentures (Venture capital arm of @nvidia), Salience Capital, A&E Investment, @HOFCapital, @walden_catalyst, @AMD, LDVP, WTT Fubon Family, @MediaTek, Vocal Ventures, @Sky9Capital and our angel investors @ibab, @LipBuTan1, Hock Tan, @johnschulman2, @soumithchintala, @lilianweng, @oliveur, @Thom_Wolf, @LiamFedus, @robertnishihara, @ericzelikman, @OfficialLoganK, and @multiply_matrix among others.

Thanks for the exclusive interview with @MeghanBobrowsky at @WSJ about our vision.

637

194

367K

christina

@luoluo

about 1 month ago

@stevenkplus1 @RichardDawkins lmao

845

christina

@luoluo

about 1 month ago

@cognition @Mokshit06 @3blue1brown devin is a good product, sir

christina

@luoluo

about 1 month ago

@blueemi99 try and share the result!

651

christina

@luoluo

about 1 month ago

system 2 should only exist if they make things feel magical

216

101

41K

christina

@luoluo

about 2 months ago

@emilyzsh sf miss u

341

christina

@luoluo

2 months ago

widespread problem of lack of co-design principles. its labs’ responsibility to close the feedback loop of hardware, model, inference, and harness meanwhile people who co-design their personal harness and infra will extract disproportionate value

Chayenne Zhao

@GenAI_is_real

2 months ago

We're Not Wasting Tokens — We're Wasting the Design Margin of the Entire Inference Stack A few days ago I read a post by Fuli Luo on Twitter, discussing Anthropic's decision to cut off third-party harnesses (OpenClaw) from using Claude subscriptions, and the design thinking behind MiMo's Token Plan pricing. Her core argument: global compute capacity is seriously falling behind the token demand created by agents. The way forward isn't selling tokens cheaper in a race to the bottom — it's the co-evolution of "more efficient agent harnesses" and "more powerful, efficient models." I read it several times over. People who build inference engines have long been frustrated by how wastefully agent frameworks burn through tokens. She articulated something the industry has tacitly acknowledged but rarely stated plainly — and she did it with precision and restraint: the compute allocation crisis we face today is not fundamentally about insufficient compute. It's about tokens being spent in the wrong places. I want to push this one layer deeper, from my own perspective. I'm a heavy user of Claude Code — I make no attempt to hide that. You can check that all the latest code in SGLang Omni was built with Claude Code powering my workflow. Its commercial success is beyond question; it genuinely gave many people (myself included) their first real experience of "coding with an agent." But I'm also an inference engine developer — my day job is figuring out how to push prefix cache hit rates higher, how to make KV cache memory layouts more efficient, how to drive down the cost of every single inference request. So when I plugged Claude Code into a local inference engine and started observing the actual request patterns it generates, my reaction was — how to put it — like a water engineer who spent months designing a conservation system, only to watch someone water their garden with a fire hose. I measured Claude Code's cache hit rate on my local serving engine over the course of a day. The numbers were painful. This isn't a case of "decent but room to improve." It's a case of "the prefix cache mechanisms we carefully engineered at the inference layer are being almost entirely defeated." Fuli Luo mentioned that OpenClaw's context management is poor — firing off multiple rounds of low-value tool calls within a single user query, each carrying over 100K tokens of context window. Frankly, Claude Code's own context management is nowhere near making proper use of prefix cache or any of the other optimizations we've built into inference engines. Many people have already noticed — for example, the resume feature has a bug that causes KV cache misses entirely, which is borderline absurd. I'll say it plainly: the way sessions construct their context was never seriously designed with cache reuse in mind from the start. Perhaps Anthropic has internal trade-offs we can't see — after all, they control both ends of the stack, model and inference, and can theoretically do optimizations at the API layer that are invisible to us. But from the external behavior I can observe, enormous volumes of tokens are being spent on: re-transmitting already-processed context, re-parsing already-confirmed tool call results, and maintaining an ever-inflating conversation history with extremely low information density. If this is merely to earn more on inference token charges, I find it genuinely regrettable. But many Claude Code users are on subscriptions — burning more tokens is fundamentally a cost burden for Anthropic, not revenue. I honestly don't understand what purpose such inefficient context management serves for Claude Code. Here's a bold hypothesis: for those long sessions that consume 700K+ tokens, there is certainly a way to restructure the session's context so it accomplishes the exact same task with 10% of the tokens. Not by sacrificing quality, but through smarter context compression, more rational prefix reuse strategies, and more precise tool call scheduling. This isn't theoretical speculation — anyone who has worked on inference engine optimization, upon seeing current agent framework request patterns, would arrive at a similar conclusion. Fuli Luo is right: global compute capacity can't keep up with the token demand agents are creating. But I'd add that a significant portion of that gap is an illusion of prosperity — artificial demand manufactured by the crude design of agent frameworks. Here's an analogy I keep coming back to. I've always liked bringing up RAM bloat — in 1969, 64KB of memory sent Apollo to the moon. In 2026, I open a single webpage and 500MB of memory usage is nothing unusual. Every generation of hardware engineers pushes memory capacity higher, and every generation of software engineers lavishly fills it to the brim. People have gotten used to this cycle, even come to see it as the normal cost of progress. But LLM inference is different. The cost of RAM bloat is your computer running a bit slower, spending a couple hundred bucks on a memory upgrade — users barely notice. The cost of token bloat is real money — GPU cluster electricity bills, user subscription fees, the industry's entire compute budget. And this cost scales exponentially as agent usage grows. If we don't establish the engineering discipline that "tokens should be used efficiently" in the early days of the agent era, the cost of catching up later, once scale kicks in, will be beyond imagination. Fuli Luo notes that Anthropic cutting off third-party harness subscription access is objectively forcing these frameworks to improve their context management. I agree with that assessment, but my gut feeling is that this shouldn't stop at "third-party frameworks need to be more frugal with tokens." It should trigger a more fundamental reflection: what kind of agent-inference co-design do we actually need? Right now, agent frameworks and inference engines are essentially fully decoupled — agent frameworks treat the inference engine as a stateless API, sending the full context with every request. Meanwhile, the inference engine does its best with prefix matching, caching whatever it can. This architecture is simple and general-purpose, but brutally inefficient for long sessions. If agent frameworks could be aware of the inference engine's cache state and proactively construct cache-friendly requests — if inference engines could understand the session semantics of agents and make smarter cache eviction decisions — once that information channel between the two opens up, the potential gains in token efficiency are enormous. Of course, maybe I'm overthinking this. Maybe the market's ultimate answer is: compute gets cheap enough, waste is fine. Just like the RAM story — in the end, everyone chose "memory is big enough, no need to optimize." But I don't think the token economy will follow the same path, at least not in the near term — because the supply elasticity of GPU compute is far lower than that of DRAM. Under compute constraints, token efficiency isn't a "nice to have" optimization — it's the core competitive advantage that determines who survives. Most people love hearing "we made the model bigger," "we stretched the context window to a million tokens," "we stacked HBM to new heights" — these narratives are sexy, shareable, fundable. But I seriously believe that "finding ways to reduce the reckless waste of tokens" is a profoundly underestimated direction. This isn't a defensive optimization. It's an offensive capability — whoever first achieves an order-of-magnitude reduction in token consumption at equivalent quality can serve ten times the users on the same compute budget, or deliver ten times the agent depth to a single user. The agent era doesn't belong to whoever burns the most compute. It belongs to whoever uses it most wisely. This line from Fuli Luo resonates deeply with me. But I want to press further: who gets to define "wisely"? The people building models? The people building inference engines? The people building agent frameworks? I think the answer is — all three must come to the table together. And right now, we're nowhere close.

220

170

38K

christina

@luoluo

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users