These are challenging times for science, but I'm happy to share a bright moment: our work is on the cover of Cell today!
We built a compendium of human gut microbiomes integrating 168K worldwide samples, revealing patterns of microbiome variation across the globe
The new issue is out👉https://t.co/DWf0VlDvUt
Featuring an engineered Newcastle disease virus that selectively and safely lysed tumors in refractory cancer patients, the identification of rhodoquinone as a mammal mitochondrial electron carrier, and a metagenome-informed metaproteomics approach unravelling the interplay between the host, gut microbiome, and diet to identify disease biomarkers.
📷credit: SciStories
Reminder: giving an AI the prompt "You are a world expert in X" doesn't make it a world expert.
It doesn't give the model additional knowledge or improve its reasoning.
All it does is change the *style* of the output, so it sounds like an expert, even when it's wrong.
1. An embryo can't have an IQ score. This is like saying the embryo has an 800 credit score or a 2000 chess Elo. It's nonsense
2. These polygenic scores have negligible predictive value for an individual's IQ - they explain a small proportion of variance at the population level
I've said it before: the bottleneck is not speed.
This DNA model is 275x faster, but it creates a tree of life that groups drosophila with plants. How is AI going to solve cancer if it can't get the most basic biology right?
We are releasing Carbon: a crazy fast DNA model
Carbon is 275x faster than the next best model. So fast you can process the whole human genome on a single GPU in <2 days.
Here are the tricks we used:
When modelling DNA sequences a lot of the performance comes down to tokenizing the sequences in a smart way. BPE tokenizer struggle because there are no whitespaces and character (called base in DNA) level tokenizers waste a lot of compute on too many tokens.
Carbon is built with a unique tokenizer: we split sequences in chunks of 6 bases, but during both training and inference we can work with single base resolution. That's similar to having word tokens but resolving them at the character level. All possible thanks to the DNA tokens unique structure.
The architecture combined with the tokenizer makes the model 275x faster than the previous SoTA (Evo2) at this size.
We built an interactive demo so you can explore how the model can generate DNA sequences, investigate the structure of genes, predict the effect of mutations, generate and fold proteins and even reconstruct parts of the tree of life.
https://t.co/OWEUoxAFjG
Some thoughts on where AI in genomics stands right now, building on a symposium we recently hosted.
Seven points, ranging from why scaling DNA models hasn't delivered, to why metadata is the real bottleneck, to the weird backlash against AI in academia
https://t.co/rMQZxP5S4o
This is genuinely funny. ArXiv exists because peer-review is slow, arbitrary, and not trustworthy.
Now arXiv is making peer-review its judge, treating it as the only real trust signal.
The penalty is a 1-year ban from arXiv followed by the requirement that subsequent arXiv submissions must first be accepted at a reputable peer-reviewed venue. 4/
🧬 AI in Genomics Symposium next week!
We're bringing together leading scientists from across academia and industry pushing the boundaries of AI in genomics research
Join us May 8 at the University of Chicago - KCBD Auditorium, 9:30–5:00, reception to follow
🔬How do clinical exposures shape the gut microbiome, and how do longitudinal clinical–microbiome signatures predict patient outcomes?
Come find out at my poster #194 today at 7:30 pm at @cshlmeetings Biology of Genomes 2026! #bog26@blekhman
Looking forward to #DDW2026!
I will be presenting new work on host–microbiome interactions at single-cell resolution, led by Liz Gibbons in my lab
Sunday, 8-9:30, W190a, session "AI-based approaches to unraveling gut microbiome complexity"
Come find me - happy to talk!
🧬 AI in Genomics Symposium next week!
We're bringing together leading scientists from across academia and industry pushing the boundaries of AI in genomics research
Join us May 8 at the University of Chicago - KCBD Auditorium, 9:30–5:00, reception to follow