Good advice, but key correction: AI will shine in connecting far-flung ideas.
Humans specialize after college, much like athletes specialize in sports. Career incentives reward depth, not breadth.
Yet many insights reflect cross-domain knowledge, "aha" moments from bridging two seemingly unrelated ideas.
CRISPR exemplifies this. A bacterial immune system was transformed into a foundational technology for biology and drug development only after researchers connected microbiology, RNA biology, and molecular engineering.
Generally, only exceptional humans develop expert knowledge across fields, but AI has already broken the breadth constraint.
While AI today still struggles to consistently discover on its own, this is a limitation of current harnesses, models, and compute.
Barbell strategy for killing it in an age of superhuman AI:
Simultaneously get as close to AND stay as far away from AI as humanly possible.
1. Get close — play with AI models, use them to help you think, ask them to teach you about the world, get them to help you create, work with them to write code, understand what makes them tick, embed them into your everyday life, have fun.
2. Stay far away — learn to tell stories, make eye contact, build a team, lead with courage, connect far-flung ideas, build lifelong friendships, debate persuasively, think forbidden thoughts, handwrite ideas, confess your fears, fall in love.
Spend less time trying to master mental transformations that are purely mechanical — building spreadsheets, analyzing trades, balancing accounts, writing code by hand, following playbooks, searching for needles in haystacks. These are the emerging no-man's land, squarely the domain of AI.
Venture to the extremes. That’s where all the fun is anyway.
The stated conclusion has long been true in biomedicine and sparse-data domains.
Consider medical images, where general scaling recipes transfer less readily. In natural images, a cat can be recolored or cropped and stay a cat, but these transformations may distort clinical meaning.
Whether a round opacity is malignant, for example, may depend on factors such as proximity to blood vessels or the chest wall.
Now take cancer genomics.
Viruses already cause multiple types of cancers, including cervical cancer (HPV), some Burkitt lymphomas (EBV), and liver cancer (HBV).
How about other cancer types? TCGA and PCAWG are extremely valuable for viral studies and recapitulate many known associations.
However, the underlying assays and protocols naturally did not optimize for every viral signal, including non-polyadenylated RNA.
Scale cannot recover what the dataset never included.
This story is sadly too common.
What’s the best way to publish such information?
The method must be fair for the accused but transparent enough to protect founders and coach VCs on the need to end this practice.
At a minimum, VCs have an obligation to share who else they are considering and let founders assess the risk of disclosure.
VCs are soccer stars, but founders play basketball.
Capital confers authority, but authority is not expertise.
Sans capital, would you hire the VC to lead your startup?
Good VCs accelerate and enable great outcomes.
Hire wisely.
You can’t fire them, but they can fire you.
A junior partner at a top vc told me he was worried i wasnt coachable. I said, i dont see what you can teach me given youve never founded or run a successful company.
Play idea: “Wemby curls” where he moves like Curry around screens or picks n pops -- but in the paint.
Lob passes not for dunks, but for in-the-paint catch and shoots/skyhooks.
With his height and agility, these are good shots from 3-8 feet away and reduce how much defenders can body him.
This is right for preserving the status quo, but wrong for changing healthcare. The recurring playbook throughout history is inventing technology to slash costs, from which prices and new entrants naturally follow.
The right question is how to develop life-changing therapies at a fraction of the cost, not to preserve existing cost structures.
This is not a criticism of Eli Lilly. It is the 800-lb gorilla, an exceptional company creating and delivering critical drugs to patients. Expecting them to disrupt the very industry they are leading is not reasonable. Nor is this a criticism of Dave Ricks. He is a terrific CEO, someone who strikes the right balance between crisp execution and bold innovation.
It is incumbent on outsiders to spearhead change.
Following @cremieuxrecueil’s post 🧵👇about $LLY excellent readout of its LDL Gene Editing program - Verve-102, many people were wondering about the high price tag of such a treatment. During his excellent conversation with Stripe’s co-founder John collision in the “Cheeky Pint” podcast, Dave Ricks - Eli Lilly’s CEO, described $LLY recent Gene Editing asset - $VERV-102 which was acquired by Lilly as part of the Verve Theraputics acquisition and also presented a new and interesting business model - a multi-year licensing model for genetic medicines based on success which may not only be more efficient but also commercially favourable for patients, insurers and for the companies themselves. Dave Ricks’ licensing model is highly interesting and could bridge the gap between innovation, the patient’s well being and a much needed commercial upside for BioTech and Pharma companies which is a fundamental pillar in any drug R&D process. $XBI
Our research suggests EBV could drive more diseases, particularly cancer, than what is known today.
Negative studies may reflect poor protocols more than viral absence.
Not only can AI accelerate drug discovery here, but hopefully this exit will embolden more labs.
Early in 2022, I emailed someone from a prestigious institute to ask why she hadn’t published follow-ons to a promising study that had hinted at potential mechanistic links between EBV and breast cancer.
Her response?
No funding.
EBV wasn’t perceived as an issue for American populations, despite the global burden, theory, and empirical results otherwise.
If someone with elite credentials couldn’t secure funding after a promising experiment in a high-profile disease, who can?
As incredible as the NIH is, the unavoidable biases of centralization and efficiency create key gaps in the research ecosystem.
First, minimizing waste and maximizing innovation are generally competing objectives. Every organization, whether public or private, grapples with this challenge. No single budget can optimize the equation on behalf of a nation.
On the shortcomings of centralization, look no further than the fraud in neurological disorders. Academic misconduct has set back the field years, if not decades.
Even worse, these decisions delay advancements for patients who so desperately need them.
Yet if thought leaders claim X and Y are worth investigating but not Z, who can overrule them? The blame certainly isn’t with the NIH. After all, it’s merely following the science and respecting mandates to wisely spend taxpayer dollars.
The solution, naturally, isn’t to reduce NIH grants but rather to couple centralized funding with other thoughtful methods.
Outcomes like this help immensely.
Congratulations to ARCH, Vaccine Company, and all involved on this most impressive achievement.
ARCH founded Vaccine Company because we believed better vaccines could be created. Almost no one believed in the vision except the amazing Vaccine Company and ARCH team who stayed with it, and Luma Group, Pfizer Ventures, Wellcome Trust, and GHIC.
This is why OAI could win long-term, even if Anthropic leads in research. OAI made hard tradeoffs long ago. Those choices should yield a better "car" (model + infra + product, model = engine).
But models remain the product for AI images and drug discovery.
With AI images, current models are too expensive to serve 1B users.
With drug discovery, which must cover clinical trial strategy and broader drug development, models still struggle to generalize and demonstrate deeper biomolecular understanding -- revealing a much larger gap.
Rather than force AI on arXiv, which is a scientific treasure and should follow its ideals, this is a chance for a complementary platform, one inspired by nanoGPT and where AI-driven research is embraced.
Human papers themselves fall on a spectrum and often reflect a divergence in resources and priorities, not nefarious intent.
Should you benchmark against 5, 15, or 25 models Should you report all negative results or none? Is the goal to PoC a novel idea or validate against every use case?
Advancing science and discovering knowledge can manifest in multiple forms. There is no objective standard.
Even small experiments, tiny bricks in the winding scientific road, can save time for others and accelerate progress if conducted properly and shared openly.
Under a new model, the site could revolve around the scientific ideal: reproducibility.
1. Any paper can be published -- no gates -- but all papers are initially labeled "Unvalidated" until others reproduce the results.
2. An open-source model would vet experimental methods, highlighting novel contributions and areas for improvement.
3. A separate model would automatically tag related work, replacing the related work section that no one does perfectly anyway.
4. Another pass would flag unsubstantiated claims and surface an alternative discussion -- not to replace the authors' words, but to provide a standardized review, the same evidentiary lens applied to every paper.
5. All papers must include a reproducibility manifest -- model weights, hparams, hardware specs, random seeds -- verified on submission, so reproduction is possible by default rather than by request.
6. Authors can push updates -- new baselines, extended datasets, bug fixes -- and the paper's validation status resets, creating a living document rather than a frozen snapshot.
For ML, the natural beachhead is small models capable of local inference. Look at the activity emerging from nanoGPT. Reproducing larger papers would require open-source compute, an initiative I outlined earlier but one still far from reality.
Rather than force AI on arXiv, this is a chance for complementary platform inspired by nanoGPT and one where AI-driven research is the norm.
Human papers themselves fall on a spectrum and often reflect a divergence in resources and priorities, not nefarious intent.
Should you benchmark against 5, 10, or 20 models? Why not 30? Should you report all negative results or none? Is your goal to PoC a novel idea or ensure validation in every use case possible?
Advancing science and discovering knowledge can manifest in multiple forms. There is no universal standard.
Even a small experiment, a tiny brick in the winding scientific road, can save time for others and accelerate progress if conducted properly and shared openly.
Under a new model, the site could revolve around the scientific ideal: reproducibility.
1. Any paper can be published -- no gates -- but all papers are initially labeled "Unvalidated" until others reproduce the results.
2. An open-source model would vet experimental methods, highlighting novel contributions and areas for improvement.
3. A separate model would automatically tag related work, replacing the related work section that no one does perfectly anyway.
4. Another pass would flag unsubstantiated claims and surface an alternative discussion -- not to replace the authors' words, but to provide a standardized review, the same evidentiary lens applied to every paper.
5. All papers must include a reproducibility manifest -- model weights, hparams, hardware specs, random seeds -- verified on submission, so reproduction is possible by default rather than by request.
6. Authors can push updates -- new baselines, extended datasets, bug fixes -- and the paper's validation status resets, creating a living document rather than a frozen snapshot.
For ML, the natural beachhead is small models capable of local inference. Look at the activity emerging from nanoGPT. Reproducing larger papers would require open-source compute, an initiative I outlined earlier but one still far from reality.
In some companies, the CEO imparts minimal value. These are the ideal Buffett companies.
In others, the CEO imparts most of the value. These are startups.
The deeper the CEO understands the company's differentiating technology, the better the company generally performs.
This is because deeper understanding enables the CEO to hire more discriminately and set priorities more prudently (i.e., allocate resources in Investorese).
The dynamic is unfolding in real-time with the public AI frontier labs, with the worst decisions generally correlating with the CEOs with the weakest ML/LLM background.
Biotech and drug development are nearing a tipping point.
Semiconductor companies were once vertically integrated, combining design and manufacturing. While the model produced incredible advancements, it also restricted design to a few organizations and locked out potential innovators.
The rise of fabless firms and pure-play foundries like TSMC in the late 1980s disrupted the industry. This approach decoupled design from manufacturing, empowering design specialists -- companies like NVIDIA -- to iterate faster on breakthroughs for AI, gaming, and other applications.
A similar shift is coming to biotech. Today, pharma outsources heavily to CROs and CDMOs for trials and manufacturing, but the most valuable intellectual steps remain locked inside large organizations: target identification, therapeutic design, and clinical trial design.
Although design is only a small part of drug development, the same was said of chip design before the fabless revolution.
On the surface, chips represent a collection of transistors, but they actually represent the embodiment of how to organize compute to solve a specific problem. Similarly, biologics and small molecules do not simply represent a collection of atoms but rather the embodiment of which biological problem to solve.
The therapeutic hypothesis -- which target, which mechanism, which patient population, which biomarkers to enrich for -- is the architectural framework that dictates whether the billions spent downstream on trials and manufacturing are prudently invested.
Most drug failures reflect problems of design not manufacturing. AI-native companies already demonstrate the promise of computational approaches to identify targets, design molecules, and stratify patient populations faster than traditional teams.
Of course, much work remains. Frontier models like AlphaFold3 richly deserve their acclaim and are catapulting structural biology forward, but they still generalize poorly and lack deep biochemical understanding. Unfortunately, the worst critics simultaneously claim biology is difficult and yet demand out-of-the-box perfection.
The progress is undeniable and the trajectory clear. The missing piece is validating ideas against reality. Countless hypotheses go untested because researchers and startups lack wet lab capacity.
Automated and cloud labs will break open this bottleneck. As robotic experimentation becomes available on demand, a new class of company will emerge.
These companies will work backward from disease biology, excelling in the design of therapeutic strategies targeting specific pathophysiology, while automated labs, CROs, and CDMOs handle wet work, trials, and manufacturing.
The parallel to semiconductors isn't merely outsourcing, but expanding the pool of people who can attack problems via the decoupling of design and execution.
Correct. And not just interviews. All articles. Open-source news, effectively. To restore trust, increase transparency. Or as Brandeis said, “Sunlight is the best disinfectant.”
There is absolutely no good reason that every press interview — whether for newspaper, magazine, TV show, documentary, blog, whatever — isn’t published online, in its entirety, every single time.
Cooking show idea: fusion competition.
Three teams, each with two chefs of different ethnicities, tasked with inventing a fusion dish reflecting their cultures in 60 minutes.
Could also start with just three individuals, all tasked with inventing a fusion dish anchored to one culture.
Hi Omar. Could you share whether Gemma 4 included medical imaging, and which modalities?
We’d like to open-source a medical version focused on human anatomy understanding.
MedGemma 1.5 underperformed, but Gemma 4's vision stack seems promising.
Any high-level detail on CT/MRI/X-ray/ultrasound exposure or image preprocessing would be helpful. I couldn’t find this in the public docs.
NanoGPT and more benchmarks like this will accelerate ML innovation. They offer an approachable substrate for experiments, enabling anyone to ease into ML with real-world relevance. This is great work.
Other ideas to lower barriers:
1. Open-source job scheduler for shared GPU pools, mirroring how Google and frontier labs multiplex compute across researchers.
2. Monthly community compute donated by AWS, Runpod, and others motivated to see open research match the frontier. Frontier labs also arguably benefit.
Scheduling policy is tricky. Since everything is open, perhaps start simple with community voting and fair-share defaults tied to pseudonymous-verified accounts.
"Sunlight is the best disinfectant" feels apt. Transparency will promote trust and fairness, and should be the default.
Once the job scheduler is built, community compute could PoC this with something like $100K ($1M?) of shared compute. At COGS, this is a rounding error against the dev-relations budget for any cloud provider.
New modded-NanoGPT optimization benchmark result: @wen_kaiyue has improved upon both the Muon and AdamW baselines, by replacing their weight decay with hyperball optimization. The new record is 3325 steps.
@alexrives Amazing! Thanks for leading this.
Could you kindly consider phosphoproteomics, non-polyadenylated RNA-seq, and ubiquitylomics?
TCGA and other datasets generally lack these modalities, but they would be deeply insightful for cancer and other diseases as you know.
Use cases -> inference requirements -> data -> model.
If a 10M CNN is required, build that.
If a 888B MoE model is required, build that.
Critics love bashing OpenAI, but they have generally pursued the right strategy and have the right team mix to win.
Really excellent work by the inference team to serve this model so efficiently!
To a significant degree, we have to become an AI inference company now.
Tim Cook faced the unenviable task of following Steve Jobs. Despite doubts by many a critic, he did exceptionally well and led Apple with grace and a clear vision. Kudos to one of the greatest CEO tenures of all time.
The most trusted journalism acts like a judge: it neither hypes nor dooms, but contextualizes and lets the jury -- readers -- decide.
Like good software, good journalism can come from anyone on any topic, provided the individuals are sufficiently objective and well-informed.