.@NitCal will be presenting "Empty Shelves or Lost Keys? Recall Is the Bottleneck for Parametric Factuality" at ICML 2026 next month.
(tl;dr: we show encoding is near-saturated on frontier LLMs, but models still struggle to recall encoded facts.)
One recurring piece of feedback we've gotten since posting the paper: "you show LLMs struggle with factual recall, but does that even matter when today's agents can use external retrieval?"
Here's how I currently think about this, and more broadly about the role of parametric knowledge in today's systems:
The theoretical argument for why knowledge matters (true in principle, but I don't know of work that measures this in practice): parametric knowledge is important for making efficient use of search and for knowing how to properly integrate retrieved information. Imagine finding some weird pizza recipe online — can you trust it without knowing a lot about cooking, chemistry, etc.? I think this is going to become a bigger issue moving forward, the more "sloppier" the internet becomes.
The realistic case for why knowledge matters: today's agents are far from producing responses that are fully grounded in external evidence. Even when search triggers properly — which it often doesn't — only the "big" claims tend to be grounded, while models still volunteer a lot of extra information from their parametric knowledge.
Since models are still poor at "knowing what they know" (more on that in my next post, about our other ICML paper...), our best bet is making models actually more knowledgeable — and our paper reveals where the headroom for that actually lies.
Today at 11:30 EST / 16:30 GMT we'll be presenting our poster about our work “On Calibration and Out-of-domain Generalization” at #NeurIPS2021, come visit!
https://t.co/vYIdjeRMHh
@wald_yoav@amir_feder@d_greenfeld
Paper
"What is it like to be a bat?" Thomas Nagel's thought experiment about consciousness is still as relevant today as it was when it was first published in 1974. https://t.co/abesYXI7n1
Took a while (don't ask) but here they are: Notes from "Science of Deep Learning" class co-taught with @KonstDaskalakis now available: https://t.co/0H1SKClPUf. More coming soon (promise!). Feedback very welcome! Thanks to @andrew_ilyas for heroic effort on doing final revisions.
Today we are excited to release video recordings of lectures from "Advanced Deep Learning and Reinforcement Learning", a course on deep RL taught at @UCL earlier this year by DeepMind researchers:
https://t.co/znsWtTxQcN
Enjoy!
Our latest work on ‘Measuring abstract reasoning in neural networks’ has just been published at #icml2018.
As always, it was a privilege to collaborate with @santoroAI, Felix Hill, @arimorcos and Tim Lillicrap.
Paper: https://t.co/lUZKpxvfd2
Blog post: https://t.co/LQaf5MpV04
In 1951, Bertrand Russel took to the @nytimes to argue that the best answer to fanaticism was a calm search for truth. His Ten Commandments of Liberal Inquiry could not be more relevant today.
(Number 6 will blow your mind! ;) )
Thread.
Check out https://t.co/cBigTutSGn, my work with @prafdhar on improving flow-based generative models with invertible 1x1 convolutions. https://t.co/znKj0LnCxm
New paper analyzing sample-based metrics for evaluating generative models, from the Cornell group. Tests if they can detect things like overfitting and mode collapse. Should be required reading for everyone working on generative models.
https://t.co/oPzFYkAZ8o
@gstsdn@goodfellow_ian Great paper. Out of curiosity - why use spectral normalization on the generator and not Jacobian clamping like in the previous paper?
By learning to write programs that generate images our artificial agents can reason about how digits, characters and portraits are constructed. Read the blog: https://t.co/zNDRMAEdOW
Simple GAN inversion experiments easily show that ***all*** real images (except for zero measure subset) have 0 probability of being generated by a GAN (off the manifold). What does this say about the promise (or lack thereof) of training models based on GAN-generated datasets.
Check out Adversarial Logit Pairing, the new state of the art defense against adversarial examples on ImageNet, by @harinidkannan@alexey2004 and I: https://t.co/2JIT1t3ApO
Nesting probabilistic programs allows us to model agents reasoning about other agents, but current inference engines typically give invalid estimates. Check out how to do things correctly in my new paper https://t.co/4EQqwxUdDt
GESTALT PRINCIPLES THREAD!
Gestalt is the idea that we see the whole of something before the individual parts.
PROXIMITY (1/8)
When objects are close to each other, they tend to be perceived together in a group. Use white space to separate groups. Reduce it to group elements.
Encoder-decoder GANs architectures still don't fix the theoretical problems in GANs framework such as mode collapse. Encoders may produce nonsense codes and the discriminator is none the wiser. Blog post https://t.co/oQBxIaVEri and ICLR'18 paper https://t.co/64nmaNsef3