Can we turn part of an LLM's weights into long-term memory that continuously absorbs new knowledge?
We took a small step toward this with In-Place Test-Time Training (In-Place TTT) β accepted as an Oral at ICLR 2026 π
The key idea: no new modules, optional pretraining. We repurpose the final projection matrix in every MLP block as fast weights. With an NTP-aligned objective and efficient chunk-wise updates, the model adapts on the fly β complementing attention rather than replacing it.
π Paper: https://t.co/mtfkbptevk
with amazing @Guhao_Feng@Roger98079446 Kai @GeZhang86038849 Di @HuangRubio