Excited to share that I've started my summer internship at SystemsResearch@Google in Sunnyvale, working on agentic environment generation!
Always happy to chat about coding agents or LLM memory too. If you're around the Bay Area, would love to meet up.
Love this. The Fourier addition circuit isn't just for math. Llama reuses the same base-10 mechanism for cyclic concepts, then maps the sum back into concept space. Mechanism generalizes further than we showed in https://t.co/M9Q0hKNIEf. Congrats to the team! 🎉
Excited to share that I’ve started @GoogleResearch as a student researcher today. I'll be working on tabular foundation models. Come and chat if you are around at Google or at the Bay Area.
@EkdeepL@GoodfireAI Honestly feels way harder to me. Numbers have rich structures like orderings and cyclic groups under mod (Z/nZ). Higher-order cognitive concepts don’t have any obvious group or symmetry to start with. It’s obvious how to decode the geometry. Curious if you have any intuitions.
@GoodfireAI Cool work!!!
It shows that Fourier features are also used for cyclic tasks beyond computing addition (something we studied in our NeurIPS ‘24 paper, https://t.co/IOYEiCI0DO). In our recent preprint, we tried to understand how this behavior emerge (https://t.co/f7u7dsXVBV).
Glad to share that this paper is accepted to #ICML 2026 @icmlconf with an updated title "Transformers Provably Learn Algorithmic Solutions for Graph Connectivity, But Only with the Right Data". 🥳
Why do Transformers fail at algorithmic reasoning? We find it's not a lack of power, but a capacity mismatch.
Our new preprint proves a tight, non-asymptotic bound: an L-layer model can only solve graph connectivity on graphs with a diameter up to exactly 3^L. https://t.co/JwKKl64709
🧵(1/N)
[CL] Convergent Evolution: How Different Language Models Learn Similar Number Representations
D Fu, T Zhou, M Belkin, V Sharan… [University of Southern California & UC San Diego] (2026)
https://t.co/zoXVMn8cYL
I'm at #ICLR2026 and will present two posters on Day 3:
1⃣ On how Transformer-based language models disentangle and compose latent concepts for in-context learning
🕙 Poster session 5, 10:30am-12pm
📍Pavilion 4 #4008
> Where am I going? the use
Then the "actionable interpretability" question: how can a phenomenon lead to improvement? We show that representing numbers with Fourier features teaches models better numeracy (Zhou et al., ICLR 2026).
https://t.co/CfnGTUJIQu
fantastic work! it's also great to see new multimodal models using our Zebra-CoT dataset for interleaved text and image generation.
speaking of which, Zebra-CoT will appear this Friday at @iclr_conf#ICLR2026
After two months of teamwork, we’re excited to share our team’s latest achievement — LLaDA2.0-Uni, InclusionAI’s first multimodal LLaDA.
A unified discrete diffusion LLM built for both understanding and generation across text and images.
Highlights:
● One paradigm for VQA, doc understanding, and image generation
● Efficient inference with a new decoding strategy + 8-step distilled decoder
● Interleaved text-image generation enabled by unified discrete representations
(SGLang support soon)
🤗 Hugging Face: https://t.co/bDiucSDEN7
📷 ModelScope: https://t.co/qnztdVyl7U
🧵 1/8
What should an LLM assistant remember across conversations?
Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem.
Introducing BEHEMOTH 🦣 + CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction.
📄 Paper: https://t.co/szLIOdA4bm