@JacobHHilton Are you interested in scientific problems where you can map interpretability of parameters or compositions of parameters to ground truth?
Super interesting to see this. I've been testing how this can play out in science and scientific decision making — we really aren't vigilant about how wrong stuff in the scientific literature might create self-fulfilling loops that extend stagnation
It's so great that there are now multiple orgs doing transparent, rigorous testing of basic premises about how LLMs work and how their behavior can be influenced. So glad that Geodesic exists and excited to work with them on more like this!
I wish more people asked themselves
"What would John Tukey do?"
He sure as hell would have been exciting things, not stuck on old problems. I mean he worked on information retrieval in his retirement in the 90s.
I found myself similarly disoriented. I suspect we haven't found new abstractions that actually make sense for theory to be cool again.
But you have to develop a taste for different research problems.
I found myself similarly disoriented. I suspect we haven't found new abstractions that actually make sense for theory to be cool again.
But you have to develop a taste for different research problems.
🌶️ Some (perhaps) spicy thoughts. It’s been a while since my last tweet, but I wanted to write about how disorienting it has been from academia to an LLM lab 😅
The kind of research I was trained to do during my PhD almost doesn’t exist here. The obsession with mathematical elegance and novelty is mostly gone. Everything is about scaling data and compute. For a while, that really got to me. At my lowest point, I felt like I’d lost interest in building LLMs altogether. I didn’t feel intellectually challenged anymore.
What made this even stranger was that, at a technical level, things worked. If there was a capability I wanted to teach a model, scaling the right data and compute always got me there, no exception (so far).
But recently, I found a way to reconcile with myself..
I realized the real competition isn’t in the ML recipe anymore. Most teams do roughly the same thing. What actually matters is how fast you can iterate, test ideas, and recover from mistakes. And that speed is mostly backed by infrastructure 🏗️ Faster loops, fewer bugs, better tooling.
Seeing this made me excited again! Infra is its own deep, hard, and intellectually fun problem space.
In 2026, I want to become an ML researcher who’s really good at infra. And I'll come back to ML problems with that edge, and will be excited to share what I find 😌
🧵
Personal Hamming Problem #1:
Raise the bar for quantifying risk-benefit tradeoffs in drug development.
By risk-benefit tradeoff I mean instantiation of the therapeutic index at some stage of the pipeline.
#Neurips2025
But modern ML and statistics has a much richer set of solutions I am excited about.
Easier solution: Use generalized probability indices that trade off two outcomes both of which have their own sources of measurement error
Even better: modern multi-multicalibration
More 👇
@NeuroStats 🔥 Exactly! Genetic variation is comparatively clean to interpret, but dynamic biological measurements (transcriptomic, epigenetic, imaging, physiological) are deeply entangled with life-course factors: environment, development, health status, social, & reverse causation...
1/3
One an start with what the mathematics of a proper scoring criterion ought to be for a problem, what kinds of properties a transformation ought to have, etc..
I've seen this problem emerge in many bio/health competitions over 10 years across kaggle, DREAM, etc..
@AnthropicAI@Google The recent very welcome @Arcinstute challenge made this painfully clear: defining evaluation metrics is hard. In some cases, trivial data transformations—and even random data—can score astonishingly high.
Great AI performance ≠ biological meaning. 4/6
@AnthropicAI@Google The recent very welcome @Arcinstute challenge made this painfully clear: defining evaluation metrics is hard. In some cases, trivial data transformations—and even random data—can score astonishingly high.
Great AI performance ≠ biological meaning. 4/6
OMG yes. Glad to see someone else making this point.
Measurements of dynamic biological processes are subject to more novel kinds of confounding and selection bias than genetic markers. 'omics/imaging in biology ignores these challenges of life-course epidemiology
@anshulkundaje I was just making that point in a 3-tweet thread here. In addition to my closing suggestions there, I would mention the need for life course molecular (omics) epidemiology - high powered.
Being fast has advantages when high quality feedback is really quick. But surely deep thinking / pondering / working from first principles has its place for problems that are either have long-horizon/low quality feedback.
Where is the Jim Simon of pharmaceutical forecasting?
@ltrd_ You also don’t need to be a gold medalist of the IMO to be a great mathematician. He isn’t drawing a contrast between smart and stupid. It’s a contrast between shallow and deep thinking.
@ltrd_ You also don’t need to be a gold medalist of the IMO to be a great mathematician. He isn’t drawing a contrast between smart and stupid. It’s a contrast between shallow and deep thinking.