Arguably the most boring step in genomics is the first one: normalization. Settled science. Scale + log. Move on.
Except that here's been a huge blind spot in the field. And it matters for AIxBio. A 🧵about what I think may be one of the most important papers I've written. 1/
We saw an eccentric allergist with my son when he was 3 months old, and he recommended that we immediately start throwing as many allergens at him as possible, as often as possible. We did and now he's not allergic to anything 👐
Happy to see the work, GigaTIME, from my close friend, Hoifung Poon, highlighted by @satyanadella !
This is the @CellCellPress paper from Dec 2025 (paper: https://t.co/4fW1PFBam3). It's now open source on Hugging Face + Azure Foundry Labs!
A bit more technical background for GigaTIME:
Spatial proteomics tells you which proteins are active, where, at single-cell resolution. It's the closest thing we have to reading the immune "grammar" of a tumor. It also costs thousands of dollars per sample and takes days to run.
GigaTIME just broke that constraint.
Input: a routine H&E pathology slide. ~$5. Every hospital has millions of them.
Output: a full 21-channel spatial proteomics map. In seconds.
How it works:
— Trained on 40 million cells with paired H&E + mIF images (Providence health system)
— Learns cross-modal translation: cell morphology → cell state, at single-cell spatial resolution
— Not CycleGAN — beat prior SOTA by a wide margin on both Dice score and Pearson correlation
What they did with it:
— Applied to 14,256 cancer patients across 51 hospitals
— Generated ~300,000 virtual mIF images spanning 24 cancer types, 306 subtypes
— Discovered 1,234 statistically significant protein-biomarker associations (novel ones: KMT2D, KRAS pan-cancer links)
— Validated externally on 10,200 TCGA patients: Spearman r = 0.88 across subtypes
The real unlock: population-scale tumor immune microenvironment (TIME) analysis was previously infeasible. mIF data is too scarce and expensive to collect at scale. GigaTIME generates a virtual population — validated against real mIF — from what every clinic already has.
One modality (cheap morphology) implies another (expensive cell state). The model learns the grammar connecting them.
Next step they hint at: integrating with LLaVA-Med so you can literally query the spatial proteomics map conversationally. "Talk to the tumor."
Dad: *offers me a hand full of walnuts*
8 year old me: I can’t have those, I’m allergic.
Dad: What will happen?
Me: They make my mouth itch.
Dad: That’s just what walnuts do.
And that’s when my dad learned that he’s also allergic to walnuts.
Congratulations to Kaiyue Zhang on the publication @ScienceMagazine. In the paper, we built an “RNA factory” on muscle for heart repair. https://t.co/DRDhOLKbLl