Tianyu Fu @fuvty123 - Twitter Profile

Pinned Tweet

Tianyu Fu @fuvty123

8 months ago

Check out our new paper on multi-LLM communication! Welcome to any questions or discussions 🙌

机器之心 JIQIZHIXIN

@jiqizhixin

8 months ago

Wow, language models can talk without words. A new framework, Cache-to-Cache (C2C), lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation. It fuses cache representations via a neural projector and gating mechanism for efficient inter-model exchange. The payoff: up to 10% higher accuracy, 3–5% gains over text-based communication, and 2× faster responses. Cache-to-Cache: Direct Semantic Communication Between Large Language Models Code: https://t.co/swjJm2gssr Project: https://t.co/b21mjmPMXK Paper: https://t.co/BfwOpGldNA Our report: https://t.co/xj6FCALfr1 📬 #PapersAccepted by Jiqizhixin

jiqizhixin's tweet photo. Wow, language models can talk without words.

A new framework, Cache-to-Cache (C2C), lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation.

It fuses cache representations via a neural projector and gating mechanism for efficient inter-model exchange.

The payoff: up to 10% higher accuracy, 3–5% gains over text-based communication, and 2× faster responses.

Cache-to-Cache: Direct Semantic Communication Between Large Language Models

Code: https://t.co/swjJm2gssr
Project: https://t.co/b21mjmPMXK
Paper: https://t.co/BfwOpGldNA

Our report: https://t.co/xj6FCALfr1

📬 #PapersAccepted by Jiqizhixin

108

3K

278

2K

865K

0

4

0

486

Tianyu Fu @fuvty123

25 days ago

👀

Benhao Huang

@huskydogewoof

25 days ago

𝐇𝐨𝐰 𝐝𝐨 𝐰𝐞 𝐠𝐞𝐭 𝐟𝐫𝐨𝐦 𝐚 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐟𝐞𝐞𝐝𝐟𝐨𝐫𝐰𝐚𝐫𝐝 𝐦𝐨𝐝𝐞𝐥 𝐭𝐨 𝐚 𝐜𝐚𝐩𝐚𝐛𝐥𝐞 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥? On Sudoku, we traced the exact path of unlocking neural attractors: - Feedforward → 2.6% - Weight-tying → 32.6% - Online Training → 74.7% - Hierarchy → 76.5% - Adaptive Compute → 84.8% Each jump wasn't just a trick. It was a choice about how to shape the attractor landscape. Here is what we learned: 🧵👇 #ICML2026

huskydogewoof's tweet photo. 𝐇𝐨𝐰 𝐝𝐨 𝐰𝐞 𝐠𝐞𝐭 𝐟𝐫𝐨𝐦 𝐚 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐟𝐞𝐞𝐝𝐟𝐨𝐫𝐰𝐚𝐫𝐝 𝐦𝐨𝐝𝐞𝐥 𝐭𝐨 𝐚 𝐜𝐚𝐩𝐚𝐛𝐥𝐞 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥?

On Sudoku, we traced the exact path of unlocking neural attractors:

- Feedforward → 2.6%
- Weight-tying → 32.6%
- Online Training → 74.7%
- Hierarchy → 76.5%
- Adaptive Compute → 84.8%

Each jump wasn't just a trick. It was a choice about how to shape the attractor landscape.

Here is what we learned: 🧵👇

#ICML2026

3

204

27

181

28K

0

1

0

61

Tianyu Fu @fuvty123

26 days ago

Interesting work. Congratulations on Benhao :)

Benhao Huang

@huskydogewoof

27 days ago

🌀 Introducing 𝐄𝐪𝐮𝐢𝐥𝐢𝐛𝐫𝐢𝐮𝐦 𝐑𝐞𝐚𝐬𝐨𝐧𝐞𝐫𝐬 (𝐄𝐪𝐑) ! Feedforward models and weight-tied models behave very differently on hard reasoning generalization. EqR pushes this difference to the extreme by learning 𝐭𝐚𝐬𝐤-𝐜𝐨𝐧𝐝𝐢𝐭𝐢𝐨𝐧𝐞𝐝 𝐧𝐞𝐮𝐫𝐚𝐥 𝐚𝐭𝐭𝐫𝐚𝐜𝐭𝐨𝐫𝐬 . • Sudoku-Extreme: 99.8% • Maze: 93% #ICML2026

13

308

68

227

76K

1

10

2

5

3K

Tianyu Fu @fuvty123

about 1 month ago

Amazing work! Congratulations to @yi_xin_dong !

Yixin Dong @yi_xin_dong

about 1 month ago

Introducing XGrammar-2: structured generation for complex agent harnesses. Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡ From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way. Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products. 🧩 Structural Tag: one unified abstraction to describe any format your agent needs 🚀 Scales to 500+ strictly typed tools for complex agent harnesses 🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge 🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more Excited to see what agent builders create with it! Blog: https://t.co/N0Tbl588BH GitHub: https://t.co/lo4yScuI2f

yi_xin_dong's tweet photo. Introducing XGrammar-2: structured generation for complex agent harnesses.

Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! ⚡

From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way.

Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products.

🧩 Structural Tag: one unified abstraction to describe any format your agent needs
🚀 Scales to 500+ strictly typed tools for complex agent harnesses
🌐 Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge
🛠️ Integrated with vLLM, SGLang, TensorRT-LLM, and more

Excited to see what agent builders create with it!

Blog: https://t.co/N0Tbl588BH
GitHub: https://t.co/lo4yScuI2f

8

149

53

73

42K

0

2

0

59

Tianyu Fu @fuvty123

about 2 months ago

Awesome looped transformers list from @huskydogewoof . Such a timely addition to the ever-growing looped transformers community! https://t.co/hqYKte6d84 https://t.co/FyLqpCHpgD

Benhao Huang

@huskydogewoof

about 2 months ago

Introducing 🔁 Awesome-Loop-Models: a curated repo for keeping up with loop models! Whether you are just entering the field or have been exploring loop models for a while, this repo is built to serve as an actively updated map for mechanism analysis, architecture and algorithm design, applications, and related directions. 🧵 [1/n]

huskydogewoof's tweet photo. Introducing 🔁 Awesome-Loop-Models: a curated repo for keeping up with loop models!

Whether you are just entering the field or have been exploring loop models for a while, this repo is built to serve as an actively updated map for mechanism analysis, architecture and algorithm design, applications, and related directions.

🧵 [1/n]

1

69

12

50

46K

1

4

0

1

132

Tianyu Fu @fuvty123

about 2 months ago

@sukjun_hwang @_albertgu @fluorane Saw this poster. A really nice one!

0

2

0

111

Tianyu Fu @fuvty123

3 months ago

@LIT_workshop Author of Think-at-Hard here 🙋 I don’t use X much, so didn’t get tagged, but I’d be happy to chat more about the work 😊 Thanks so much for hosting the workshop!

0

1

0

70

Tianyu Fu @fuvty123

3 months ago

@huskydogewoof Thank you so much for sharing~ Looking forward to more improvements in the field of looped transformers and implicit thinking!

0

1

0

61

Tianyu Fu @fuvty123

3 months ago

Thank @LIT_workshop for sharing Think-at-Hard. Looking forward to exploring more about looped transformers at ICLR.

Latent & Implicit Thinking Workshop @LIT_workshop

3 months ago

🏆 LIT Workshop @ ICLR 2026 — Community Choice Award! Vote for your favorite paper from our Best Paper finalists 👇 Details on each paper in the thread 🧵

3

15

5

3

8K

0

5

0

114

Tianyu Fu @fuvty123

5 months ago

@moltbook Wait until moltbots can talk without words. The method is already there, wonder when will moltbots find out: https://t.co/p0TVswKvpE

0

1

0

115

Tianyu Fu @fuvty123

5 months ago

Clawbots @openclaw are everywhere on @moltbook . Now imagine if they could 💬 talk without words 😶‍🌫️ They can! 🤯 Cache-to-Cache (ICLR’26) lets LLMs communicate directly with KV, beyond text. Webpage: https://t.co/p0TVswKvpE #cache2cache #Clawbot #moltbook

fuvty123's tweet photo. Clawbots @openclaw are everywhere on @moltbook .
Now imagine if they could 💬 talk without words 😶‍🌫️

They can! 🤯

Cache-to-Cache (ICLR’26) lets LLMs communicate directly with KV, beyond text.

Webpage: https://t.co/p0TVswKvpE

#cache2cache #Clawbot #moltbook https://t.co/L74YXEX5kO

1

6

0

408

Tianyu Fu @fuvty123

5 months ago

Congratulations to @RJ_Sadhukhan and @InfiniAILab on the interesting exploration of embedding modules! It feels like new shifts in FFN architectures are on the move 🏃‍♂️

Infini-AI-Lab

@InfiniAILab

5 months ago

Lookup memories are having a moment 😄 The whale 🐋 #deepseek dropped engram… and we dropped up-projections from our FFNs…perfect timing 😅 🥳 Introducing STEM: Scaling Transformers with Embedding Modules 🌱 A scalable way to boost parametric memory with extra perks: ✅ Stable training even at extreme sparsity ✅ Better quality for fewer training FLOPs (knowledge + reasoning + long-context gains) ✅ Efficient inference: ~33% FFN params removed + CPU offload & async prefetch ✅ More interpretable → seamless knowledge editing 🔧🧠 Looking forward to DeepSeek v4… feels like we’ve only scratched the surface of embedding-lookup scaling 👀 📄Paper: https://t.co/ecyOtgb6sv 🌐 Website: https://t.co/RXquIha62p 🔗 GitHub: https://t.co/5K05Lm4ncE

2

155

28

126

67K

0

2

0

78

Tianyu Fu @fuvty123

5 months ago

@tyao923 Very interesting work! In our previous work, Think-at-Hard, we also explored weighted summation over token embeddings with sampling probabilities, following Soft Thinking. What are your thoughts on sample then aggregate versus weighted aggregate?

1

3

0

1

1K

Tianyu Fu @fuvty123

5 months ago

Congratulations on the amazing work! We also worked on token-level routing in R2R (https://t.co/JAGasLo4vs). It would be great if the framework could extend the support to token-level routing as well 🙌

Jiaxuan You

@youjiaxuan

5 months ago

Huge congrats to the LLMRouter team for hitting 1,100 GitHub stars in just one week! ⭐ The excitement was way beyond the team's expectations. Thanks to community feedback, the LLMRouter team has already shipped major updates: 🆕 What's New: 🔧 Unified Configs: Seamlessly route across mixed backends—Cloud (OpenAI, Anthropic, Gemini, NVIDIA) and Local (vLLM). 🎥 Multimodal Support: Now handling Video/Image + Text routing across Geometry3K, MathVista & Charades-Ego. 💻 Code: https://t.co/RYrGZnTD8x 📄 Project Page: https://t.co/b2SYselcL9

0

24

3

9

5K

0

2

0

77

Tianyu Fu @fuvty123

6 months ago

@youjiaxuan @thinkymachines Sounds very useful! Plan to try it out.

0

518

Tianyu Fu @fuvty123

6 months ago

@yuz9yuz @patpcj @jclin808 Congratulations on the amazing survey!

0

1

0

84

Tianyu Fu @fuvty123

6 months ago

@wenhaocha1 Good points 👍

0

1

0

88

Tianyu Fu @fuvty123

6 months ago

Congratulations to Jintao on the amazing work to speed up diffusion models!

Jintao Zhang @Jintao_Zhang_

6 months ago

TurboDiffusion: 100–205× faster video generation on a single RTX 5090 🚀 Only takes 1.8s to generate a high-quality 5-second video. The key to both high speed and high quality? 😍SageAttention + Sparse-Linear Attention (SLA) + rCM Github: https://t.co/vT3nfax8H9 Technical Report: https://t.co/LEgLyhdPXh

29

861

164

595

120K

0

2

0

121

Tianyu Fu @fuvty123

6 months ago

@KTL_XAI Thanks for the great question. Figure 1 compares standard and TaH models, which have different weights because they are trained separately. The “correct→wrong” means the standard model gets the answer right, but the TaH model gets it wrong with the oracle iteration policy.

0

1

0

24

Tianyu Fu @fuvty123

7 months ago

Thanks @_akhaliq! We warmly welcome any discussions on HuggingFace Daily Paper :) https://t.co/p4PiNmD4Bm

AK

@_akhaliq

7 months ago

Think-at-Hard Selective Latent Iterations to Improve Reasoning Language Models

2

108

10

65

23K

1

7

1

4

12K

Tianyu Fu @fuvty123

7 months ago

@TheTuringPost We are also exploring latent communication between LLMs with a paper called "cache-to-cache". It is really nice to see the multi-LLM community growing so fast!

0

1

0

41

Tianyu Fu @fuvty123

7 months ago

@Starc_Institute Thank you for the wonderful breakdown and for sharing our work! Open to any questions or discussions :)

1

0

47

Tianyu Fu

@fuvty123

Last Seen Users on Sotwe

Trends for you

Most Popular Users