Wow, language models can talk without words.
A new framework, Cache-to-Cache (C2C), lets multiple LLMs communicate directly through their KV-caches instead of text, transferring deep semantics without token-by-token generation.
It fuses cache representations via a neural projector and gating mechanism for efficient inter-model exchange.
The payoff: up to 10% higher accuracy, 3โ5% gains over text-based communication, and 2ร faster responses.
Cache-to-Cache: Direct Semantic Communication Between Large Language Models
Code: https://t.co/swjJm2gssr
Project: https://t.co/b21mjmPMXK
Paper: https://t.co/BfwOpGldNA
Our report: https://t.co/xj6FCALfr1
๐ฌ #PapersAccepted by Jiqizhixin
๐๐จ๐ฐ ๐๐จ ๐ฐ๐ ๐ ๐๐ญ ๐๐ซ๐จ๐ฆ ๐ ๐ฌ๐ญ๐๐ง๐๐๐ซ๐ ๐๐๐๐๐๐จ๐ซ๐ฐ๐๐ซ๐ ๐ฆ๐จ๐๐๐ฅ ๐ญ๐จ ๐ ๐๐๐ฉ๐๐๐ฅ๐ ๐ข๐ญ๐๐ซ๐๐ญ๐ข๐ฏ๐ ๐ฆ๐จ๐๐๐ฅ?
On Sudoku, we traced the exact path of unlocking neural attractors:
- Feedforward โ 2.6%
- Weight-tying โ 32.6%
- Online Training โ 74.7%
- Hierarchy โ 76.5%
- Adaptive Compute โ 84.8%
Each jump wasn't just a trick. It was a choice about how to shape the attractor landscape.
Here is what we learned: ๐งต๐
#ICML2026
๐ Introducing ๐๐ช๐ฎ๐ข๐ฅ๐ข๐๐ซ๐ข๐ฎ๐ฆ ๐๐๐๐ฌ๐จ๐ง๐๐ซ๐ฌ (๐๐ช๐) !
Feedforward models and weight-tied models behave very differently on hard reasoning generalization.
EqR pushes this difference to the extreme by learning ๐ญ๐๐ฌ๐ค-๐๐จ๐ง๐๐ข๐ญ๐ข๐จ๐ง๐๐ ๐ง๐๐ฎ๐ซ๐๐ฅ ๐๐ญ๐ญ๐ซ๐๐๐ญ๐จ๐ซ๐ฌ .
โข Sudoku-Extreme: 99.8%
โข Maze: 93%
#ICML2026
Introducing XGrammar-2: structured generation for complex agent harnesses.
Strict tool-calling formats. Built-in DeepSeek-V4 and Qwen-3.6 support. Up to 80x speedup over XGrammar. Ready-to-use integrations with vLLM, SGLang, TensorRT-LLM, and more! โก
From Claude Code to OpenClaw, agents are defining more complex harnesses. XGrammar-2 ensures LLMs always interact with them in the right way.
Built in collaboration with DeepSeek, Databricks, and leading frontier AI labs to bring XGrammar-2 into latest models and products.
๐งฉ Structural Tag: one unified abstraction to describe any format your agent needs
๐ Scales to 500+ strictly typed tools for complex agent harnesses
๐ Native APIs in Python, C++, Rust, and JS, running everywhere from cloud to edge
๐ ๏ธ Integrated with vLLM, SGLang, TensorRT-LLM, and more
Excited to see what agent builders create with it!
Blog: https://t.co/N0Tbl588BH
GitHub: https://t.co/lo4yScuI2f
Awesome looped transformers list from @huskydogewoof . Such a timely addition to the ever-growing looped transformers community!
https://t.co/hqYKte6d84
https://t.co/FyLqpCHpgD
Introducing ๐ Awesome-Loop-Models: a curated repo for keeping up with loop models!
Whether you are just entering the field or have been exploring loop models for a while, this repo is built to serve as an actively updated map for mechanism analysis, architecture and algorithm design, applications, and related directions.
๐งต [1/n]
@LIT_workshop Author of Think-at-Hard here ๐ I donโt use X much, so didnโt get tagged, but Iโd be happy to chat more about the work ๐ Thanks so much for hosting the workshop!
๐ LIT Workshop @ ICLR 2026 โ Community Choice Award!
Vote for your favorite paper from our Best Paper finalists ๐
Details on each paper in the thread ๐งต
Clawbots @openclaw are everywhere on @moltbook .
Now imagine if they could ๐ฌ talk without words ๐ถโ๐ซ๏ธ
They can! ๐คฏ
Cache-to-Cache (ICLRโ26) lets LLMs communicate directly with KV, beyond text.
Webpage: https://t.co/p0TVswKvpE
#cache2cache#Clawbot#moltbook
Congratulations to @RJ_Sadhukhan and @InfiniAILab on the interesting exploration of embedding modules! It feels like new shifts in FFN architectures are on the move ๐โโ๏ธ
Lookup memories are having a moment ๐
The whale ๐ #deepseek dropped engramโฆ and we dropped up-projections from our FFNsโฆperfect timing ๐
๐ฅณ Introducing STEM: Scaling Transformers with Embedding Modules ๐ฑ
A scalable way to boost parametric memory with extra perks:
โ Stable training even at extreme sparsity
โ Better quality for fewer training FLOPs (knowledge + reasoning + long-context gains)
โ Efficient inference: ~33% FFN params removed + CPU offload & async prefetch
โ More interpretable โ seamless knowledge editing ๐ง๐ง
Looking forward to DeepSeek v4โฆ feels like weโve only scratched the surface of embedding-lookup scaling ๐
๐Paper: https://t.co/ecyOtgb6sv
๐ Website: https://t.co/RXquIha62p
๐ GitHub: https://t.co/5K05Lm4ncE
@tyao923 Very interesting work! In our previous work, Think-at-Hard, we also explored weighted summation over token embeddings with sampling probabilities, following Soft Thinking. What are your thoughts on sample then aggregate versus weighted aggregate?
Congratulations on the amazing work! We also worked on token-level routing in R2R (https://t.co/JAGasLo4vs). It would be great if the framework could extend the support to token-level routing as well ๐
Huge congrats to the LLMRouter team for hitting 1,100 GitHub stars in just one week! โญ The excitement was way beyond the team's expectations.
Thanks to community feedback, the LLMRouter team has already shipped major updates:
๐ What's New:
๐ง Unified Configs: Seamlessly route across mixed backendsโCloud (OpenAI, Anthropic, Gemini, NVIDIA) and Local (vLLM).
๐ฅ Multimodal Support: Now handling Video/Image + Text routing across Geometry3K, MathVista & Charades-Ego.
๐ป Code: https://t.co/RYrGZnTD8x
๐ Project Page: https://t.co/b2SYselcL9
TurboDiffusion: 100โ205ร faster video generation on a single RTX 5090 ๐
Only takes 1.8s to generate a high-quality 5-second video.
The key to both high speed and high quality?
๐SageAttention + Sparse-Linear Attention (SLA) + rCM
Github: https://t.co/vT3nfax8H9
Technical Report: https://t.co/LEgLyhdPXh
@KTL_XAI Thanks for the great question.
Figure 1 compares standard and TaH models, which have different weights because they are trained separately. The โcorrectโwrongโ means the standard model gets the answer right, but the TaH model gets it wrong with the oracle iteration policy.
@TheTuringPost We are also exploring latent communication between LLMs with a paper called "cache-to-cache". It is really nice to see the multi-LLM community growing so fast!