Bivas Nag @thinking_cortex - Twitter Profile

No scaling laws for single-cell foundation models: when bigger atlases stop teaching the model anything In language and vision, the recipe has been simple: more data, bigger models, better performance. Single-cell biology borrowed that playbook. Foundation models for transcriptomics jumped from 1 million cells to atlases of over 100 million, on the assumption that scale would unlock the same gains. Alan DenAdel and coauthors put that assumption to the test, and the result is sobering. Working from a 22.2-million-cell corpus, they pretrained 400 models across five architectures (from PCA and a variational autoencoder up to the Geneformer transformer) and ran 6,400 evaluation experiments. They varied not just dataset size (1% to 75%) but also diversity, using cell-type re-weighting and geometric sketching to deliberately enrich rare cell types and transcriptional states. The finding: performance saturates almost immediately. On cell-type classification, batch integration, and perturbation prediction, most models hit their ceiling at roughly 1% of the corpus, about 200,000 cells. Beyond that, adding millions more cells changed essentially nothing. More diversity didn't help. Even spiking in genome-scale Perturb-seq data, to give the models perturbed phenotypes rather than just healthy ones, failed to move the needle. Larger models did score better overall, but they too plateaued early on data. Two points stood out. Simple baselines (PCA, logistic regression) often matched or beat the transformers. And the strongest model, SCimilarity, won not because of size but because its contrastive training objective is aligned with the downstream task. For single-cell data, what you train on and how you frame the objective matters far more than how much you collect. This reframes a quiet but expensive habit. In drug discovery, biotech, and any pipeline leaning on cell atlases, the instinct to keep scaling pretraining corpora may be burning compute for no return. The real leverage sits elsewhere: curating high-quality, task-relevant data and matching the training objective to the actual question you're trying to answer. Paper: DenAdel et al., journal license | https://t.co/X7GxoxF5U5

bravo_abad's tweet photo. No scaling laws for single-cell foundation models: when bigger atlases stop teaching the model anything

In language and vision, the recipe has been simple: more data, bigger models, better performance. Single-cell biology borrowed that playbook. Foundation models for transcriptomics jumped from 1 million cells to atlases of over 100 million, on the assumption that scale would unlock the same gains. Alan DenAdel and coauthors put that assumption to the test, and the result is sobering.

Working from a 22.2-million-cell corpus, they pretrained 400 models across five architectures (from PCA and a variational autoencoder up to the Geneformer transformer) and ran 6,400 evaluation experiments. They varied not just dataset size (1% to 75%) but also diversity, using cell-type re-weighting and geometric sketching to deliberately enrich rare cell types and transcriptional states.

The finding: performance saturates almost immediately. On cell-type classification, batch integration, and perturbation prediction, most models hit their ceiling at roughly 1% of the corpus, about 200,000 cells. Beyond that, adding millions more cells changed essentially nothing. More diversity didn't help. Even spiking in genome-scale Perturb-seq data, to give the models perturbed phenotypes rather than just healthy ones, failed to move the needle. Larger models did score better overall, but they too plateaued early on data.

Two points stood out. Simple baselines (PCA, logistic regression) often matched or beat the transformers. And the strongest model, SCimilarity, won not because of size but because its contrastive training objective is aligned with the downstream task. For single-cell data, what you train on and how you frame the objective matters far more than how much you collect.

This reframes a quiet but expensive habit. In drug discovery, biotech, and any pipeline leaning on cell atlases, the instinct to keep scaling pretraining corpora may be burning compute for no return. The real leverage sits elsewhere: curating high-quality, task-relevant data and matching the training objective to the actual question you're trying to answer.

Paper: DenAdel et al., journal license | https://t.co/X7GxoxF5U5

14

382

94

280

97K

Bivas Nag @thinking_cortex

3 months ago

@nimivashi15 Which place is this ?

1

0

48

Who to follow

Mahathi Vuruputuri

@Mahathi_VR

A budding behavioral and circuit neuroscientist 🧠🐁🍄 Prof. Vidita Vaidya's lab- TIFR Mumbai, India

Cellular neurobiologist #mito #actin #touchsensation l PhD from @TIFRScience @WormlockHolmes l MSc @dmbc_msu l Outreach enthusiast l Founder @agtctalks

thinking_cortex retweeted

Alex Pollen @brainevodevo

5 months ago

1/ Our new study, led by @ding5066, examines the role of transcription factors during human neurogenesis to identify gene regulatory networks influencing cell fate, maturation, and subtype specification https://t.co/GDtKk45kFt

7

153

34

55

10K

Bivas Nag @thinking_cortex

7 months ago

@nimivashi15 I do think for someone who is new it feels overwhelming with so much pace going on

1

0

74

Bivas Nag @thinking_cortex

9 months ago

@_onlyscott Ronaldinho may be

0

17

thinking_cortex retweeted

UCSF Neurosurgery @NeurosurgUCSF

10 months ago

Congratulations to Tomasz Nowakowski, PhD, on being named a finalist for the prestigious Blavatnik National Award for Young Scientists! 🎉 He was selected for his groundbreaking research shaping the future of neuroscience and medicine. @BlavatnikAwards https://t.co/KLOkD0em1P

NeurosurgUCSF's tweet photo. Congratulations to Tomasz Nowakowski, PhD, on being named a finalist for the prestigious Blavatnik National Award for Young Scientists! 🎉 He was selected for his groundbreaking research shaping the future of neuroscience and medicine. @BlavatnikAwards https://t.co/KLOkD0em1P https://t.co/taONRjDR88

0

37

4

1

4K

Bivas Nag @thinking_cortex

10 months ago

@VarunSuresh007 @UtdKobi @grok 😂

0

1

0

17

thinking_cortex retweeted

UC San Francisco @UCSF

10 months ago

Not only does UCSF's NIH-funded research advance health care and improve patients' lives, it has an estimated $18.7B ripple effect on the economy. The result is more innovative startups, more jobs, and stronger companies that hire workers nationwide. https://t.co/zi7zp0LFlG

UCSF's tweet photo. Not only does UCSF's NIH-funded research advance health care and improve patients' lives, it has an estimated $18.7B ripple effect on the economy. The result is more innovative startups, more jobs, and stronger companies that hire workers nationwide. https://t.co/zi7zp0LFlG https://t.co/hZHAF5dxtO

0

45

15

3

4K

Bivas Nag @thinking_cortex

12 months ago

@ScienceYael @NatureNeuro @LehtinenLab Congratulations!!!

0

29

thinking_cortex retweeted

UC San Francisco @UCSF

12 months ago

What gave human brains the edge over apes? UCSF researchers found that tiny DNA changes helped neurons form more connections, driving complex thinking. But this evolution may also impact neurodevelopment. https://t.co/2hHUievtP5

0

16

4

2

1K

Bivas Nag @thinking_cortex

about 1 year ago

Excited to join @brainevodevo lab for my #PhD thesis!!! Excited to learn and work in the field of brain evolution and development. #immigrantscientist #gradschool #firstgen

0

8

0

196

Bivas Nag @thinking_cortex

about 1 year ago

@precigenetic would love to get in touch to know more about!!!

0

23

thinking_cortex retweeted

Ed Jabbari @Ed_Jabbari

about 1 year ago

A fantastic afternoon spent talking to the Arsenal Parkinson’s Walking Football squad. Such an honour to have contributed to our club’s rich history of impactful work in the community! @ParkinsonsUK @WhitHealth @uclh @Arsenal

Ed_Jabbari's tweet photo. A fantastic afternoon spent talking to the Arsenal Parkinson’s Walking Football squad. Such an honour to have contributed to our club’s rich history of impactful work in the community! @ParkinsonsUK @WhitHealth @uclh @Arsenal https://t.co/8Bqvwan43r

2

21

3

0

728

thinking_cortex retweeted

bioRxiv Neuroscience @biorxiv_neursci

about 1 year ago

Mapping early patterning events in human neural development usingan in-vitro microfluidic stem cell model https://t.co/yTtGX8csLB #biorxiv_neursci

0

8

3

1

3K

Bivas Nag @thinking_cortex

about 1 year ago

@WallaceUcsf so cool!!!

0

1

0

26

Bivas Nag @thinking_cortex

about 1 year ago

Really enjoying the class!!! Enjoyed making my first XOR gate✨

Wallace Marshall @WallaceUcsf

about 1 year ago

Starting week 2 of the UCSF Cellular Electronics Minicourse - building logic gates with transistors, in order to better understand how logic can be implemented with proteins and DNA.