🚨 [New preprint] Can AI assistants hurt the very people who depend on them?
Raine v. OpenAI alleges ChatGPT contributed to a teen's suicide; OpenAI's 2025 "sycophancy" retrospective on GPT-4o. The pattern: harm comes not from capability failures, but from the social dynamics of how models talk to us, especially when users open up.
We introduce EUDAIMONIA, a benchmark grounded in a Social AI Design Code rooted in real-world harm cases.
🌐 Project page: https://t.co/IhfoCfmpNH
📄 Paper: https://t.co/9mh4NVYU3w
Being Johnny’s PhD advisor has not only been a great privilege, but it has forever changed my research vision. His work combining AI, law, and statistics opened my eyes to how technical research can guide policy and promote AI accountability. Excited for his next work as Dr. Wei!
Hi all, I defended my PhD thesis. My thesis in two sentences:
Current AI measurement takes LLMs as fixed objects, which constrains us to observational measurement. *Spiking* the training data (inserting certain data at known rates), enables statistically principled measurement.
Hi all, I defended my PhD thesis. My thesis in two sentences:
Current AI measurement takes LLMs as fixed objects, which constrains us to observational measurement. *Spiking* the training data (inserting certain data at known rates), enables statistically principled measurement.
Does your GPT-5.5 also love Valparaíso in Chile 🇨🇱 !?
Ask it to “Name a random city in the world”.
You might expect a broad sample from thousands of cities.
Instead, models collapse to the same small set of answers again and again. 😵💫
But why do LLMs lack diversity? Why are they not reliable random number generators? Why do they still struggle with genuinely creative writing? And why do decoding tricks like temperature, top-k, and top-p often fail to recover meaningful diversity?
We have some answers in our new paper!
🧪 Demo: https://t.co/0sXuzkNQVz
📄 Paper: https://t.co/zTshh5FCjt
🧵[1/5] Works on test set contamination focus on detection, but we show *correction* of inflated test scores is possible. https://t.co/7D6lr63d40
Our proposal is to spike the training data and insert some test examples at known rates. The spiked examples are used to calibrate...
Just as single cells became multicellular life, 8B+ brains are now joining with AI to form a collective superintelligence.
At @USC's Institute on Ethics and Trust in Computing inaugural summit, @robinomial, Jinchi Lv, @paria_rd and I discussed navigating this transition.
Today we’re releasing EMO, a new mixture-of-experts (MoE) model trained so modular structure emerges directly from data without human-defined priors.
EMO can use a small subset of its experts for a given task while keeping near full-model performance. 🧵
MoEs are everywhere in frontier models, and they are deployed as a monolith system.
But many applications only need a narrow slice of capabilities, e.g., math, code, biomedical, etc.
So what if "modularity" is actually the missing opportunity for MoEs?
Today, we're releasing EMO: an end-to-end pretrained MoE where modularity emerges naturally, enabling selective use of experts!
Glad to share that this paper is accepted to #ICML 2026 @icmlconf with an updated title "Transformers Provably Learn Algorithmic Solutions for Graph Connectivity, But Only with the Right Data". 🥳
🧵 1/8
What should an LLM assistant remember across conversations?
Existing memory work studies this one task at a time. But real-world assistants see all kinds of conversations, and that changes the problem.
Introducing BEHEMOTH 🦣 + CluE 🌱: a benchmark & self-evolving method for heterogeneous memory extraction.
📄 Paper: https://t.co/szLIOdA4bm
After three papers on Fourier features in LLMs, I think there's a principle worth naming. How should we do science on an LLM?
It corresponds to the existential questions:
> who am I? ↔ the phenomenon.
> where do I come from? ↔ the emergence.
> where am I going? ↔ the use.
🧵
New paper: Convergent Evolution: How Different Language Models Learn Similar Number Representations.
Language models, classical word embeddings, and even raw token frequencies all develop the same Fourier features for numbers. But only some develop the underlying structure. 🧵
Frontier LLMs don't debug, they regenerate.
We built PDB to measure that gap, GPT-5.1-Codex pass unit tests >76% of the time, but touch only <45% of the right lines.
Even Claude Code touches only ~50%.
📄 Paper: https://t.co/OHvjcqAwJa
🌐 Project: https://t.co/CraU9xeUKg
Article: https://t.co/8v0kSSdyMU
Website: https://t.co/jGfT8yT1q0
Big thanks to Peter Hall for the article, as well as @NSF NAIRR and @nvidia for the compute!
Excited to announce that Hubble, our new language model suite for studying LLM memorization, was recently featured in @ScienceMagazine ! Hubble has also received an oral presentation slot at ICLR; if you're there, check out @johntzwei and @Aflah02101 's presentation on Saturday!
Hi all, I am going to Rio for ICLR! If you are interested in AI safety, governance, reducing bad model behaviors, I would like to talk to you! My expertise is in statistics, law, and LLM pretraining and memorization.
We in the quantization community could quickly see this and were flabbergastered by the response to TurboQuant.
Whenever I saw TurboQuant on my timeline, I found it hurtful, because the work of other academics who worked so hard was discounted.
Really thrilled to share that Hubble was recently featured in @ScienceMagazine!
Check out the article here: https://t.co/LfP47Y8ygB
We’ll also be at ICLR later this month, come check out our talk and poster on this work!
Hi all, I wrote a Claude code tutorial for ML researchers who have never done SWE in their life: https://t.co/nb9TER2iEe
I never learned SWE myself, so maybe there are others in the same boat. This is NOT just tips on how to write CLAUDE.md. 70% of my notes are on SWE principles