Brad Chapman

@chapmanb

Biologist and programmer @chapmanb.bsky.social @[email protected]

Boston, MA

Joined December 2008

594 Following

2.7K Followers

1.6K Posts

chapmanb retweeted

Zane Koch @zanehkoch

2 days ago

long-lived species and humans living 100+ years are known to have increased DNA repair capacity but how DNA repair could be boosted to extend human lifespan is largely unknown our research in Nature Aging today proposes a target to enable such a boost: the DREAM complex

zanehkoch's tweet photo. long-lived species and humans living 100+ years are known to have increased DNA repair capacity

but how DNA repair could be boosted to extend human lifespan is largely unknown

our research in Nature Aging today proposes a target to enable such a boost: the DREAM complex https://t.co/oJTrqzofIl

391

chapmanb retweeted

Pranam Chatterjee

@pranamanam

3 days ago

Today, we’re excited to share mRNAutilus, our experimentally-validated framework for multi-objective generation of full-length mRNAs, jointly optimizing coding sequences and UTRs for expression, stability, and translation!⚓️ 📜: https://t.co/jld3bjXIOH 💻: https://t.co/z2K4l60THy

159

11K

chapmanb retweeted

Alex Rives

@alexrives

8 days ago

Today we're announcing ESMFold2, an open scientific engine to power prediction, design, and discovery across protein biology. The new model delivers state of the art performance on protein interactions, especially antibodies, a critical modality for therapeutics. We have designed and validated miniprotein binders and single chain antibodies across five therapeutic targets that are important in cancer and immunology. We are seeing very high success rates, and affinities at levels consistent with therapeutic activity. We’re also releasing an atlas of 6.8 billion proteins, and 1.1 billion predicted structures. ESMFold2 is built on a state of the art language model that has been trained on billions of protein sequences. A world model of protein biology emerges through language modeling. We’ve used the techniques of mechanistic interpretability developed to understand large language models to understand the concepts ESM uses to represent proteins. The model’s representation space has a compositional organization of features across scales, levels of complexity, and abstraction, that reflects and mirrors the understanding of protein biology developed through a century of empirical science. This understanding emerges without prior knowledge, just from language modeling of protein sequences. Language models are becoming a powerful substrate to understand and program biology. The design of protein interactions is one of the most fundamental problems in biophysics, and has critical implications for the discovery of new medicines. A simple gradient based search with the model was able to discover high-affinity protein binders. I'm excited by the potential this has to accelerate basic science and the understanding of proteins. And especially for the new avenues it opens up for therapeutic design and medicine.

444

698

494K

chapmanb retweeted

🧬Jacob L Steenwyk

@jlsteenwyk

9 days ago

NEW preprint: Sparse autoencoders were trained on ESM-2 (650M, seq-only) and ESM-3 (1.4B, multimodal) to ask a simple question -- when two protein language models look at the same protein, do they "see" the same biology? Answer: Yes, mostly

jlsteenwyk's tweet photo. NEW preprint: Sparse autoencoders were trained on ESM-2 (650M, seq-only) and ESM-3 (1.4B, multimodal) to ask a simple question -- when two protein language models look at the same protein, do they "see" the same biology?

Answer: Yes, mostly https://t.co/H0FvOcRIvO

204

130

19K

Who to follow

Torsten Seemann

@torstenseemann

Microbial genomics, bioinformatics software, training. A/Prof ORCID: 0000-0001-6046-610X Retro 8-bit computers; VIC=20, C=64, C=128.

brent pedersen

@brent_p

computational biologist. Building humane tools for large-scale genomics and rare-disease. Contact me if you have genomics/bioinformatics contracting work.

Erik Garrison

@erikgarrison

(pan)genomes from many points of view. Assistant Professor UTHSC Memphis

chapmanb retweeted

Jorge Bravo Abad

@bravo_abad

10 days ago

Foundation models come to mass spectrometry proteomics Identifying proteins from their fragments is a foundational task in biology. A mass spectrometer breaks peptides into pieces, measures the masses, and software then tries to match each spectrum back to a peptide sequence. Two approaches have dominated for decades: database search (match against known proteomes) and de novo sequencing (infer the peptide directly from the spectrum). Both are bottlenecked by the same step, scoring how well a candidate peptide explains a spectrum. Deep learning has been entering this field for years, mostly as feature extractors feeding traditional engines like MaxQuant or MSFragger. Jiale Zhao and coauthors introduce pUniFind, is a multimodal foundation model trained on over 100 million peptide-spectrum matches from open database searches. Spectra and peptides get their own encoders, and the model is pretrained with cross-modality tasks: predict the spectrum from the peptide, predict the peptide from the spectrum, score peptide-spectrum pairs jointly. Database search and de novo sequencing become two views of the same model. The numbers are remarkable. In immunopeptidomics, where peptides come from non-tryptic digestion and are very hard to identify, pUniFind finds 42.6% more peptides than Open-pFind. In modification-rich de novo sequencing, it identifies 60% more peptide-spectrum matches than existing methods, despite working in a 300 times larger search space. In regular de novo, it recovers 38.5% more peptides, including 1,891 that map to the human genome but are absent from reference proteomes. A deep learning quality control filter raises consistency with RNA-Seq evidence from 65.4% to 85.0%. What makes this an ML story is the architecture choice. End-to-end scoring with a shared latent space replaces hand-crafted feature pipelines, and the same backbone serves both database search and de novo sequencing without retraining. For drug discovery, immunotherapy, and antibody engineering, peptide identification is the bottleneck for neoantigen discovery and biomarker pipelines. Tools that find more peptides at the same FDR, especially in non-tryptic and modification-rich settings, expand what is detectable from existing data. That changes what is worth running in the lab. Paper: Zhao et al., Nature Machine Intelligence (2026) — journal license | https://t.co/U05n5GbjZR

bravo_abad's tweet photo. Foundation models come to mass spectrometry proteomics

Identifying proteins from their fragments is a foundational task in biology. A mass spectrometer breaks peptides into pieces, measures the masses, and software then tries to match each spectrum back to a peptide sequence. Two approaches have dominated for decades: database search (match against known proteomes) and de novo sequencing (infer the peptide directly from the spectrum). Both are bottlenecked by the same step, scoring how well a candidate peptide explains a spectrum.

Deep learning has been entering this field for years, mostly as feature extractors feeding traditional engines like MaxQuant or MSFragger.

Jiale Zhao and coauthors introduce pUniFind, is a multimodal foundation model trained on over 100 million peptide-spectrum matches from open database searches. Spectra and peptides get their own encoders, and the model is pretrained with cross-modality tasks: predict the spectrum from the peptide, predict the peptide from the spectrum, score peptide-spectrum pairs jointly. Database search and de novo sequencing become two views of the same model.

The numbers are remarkable. In immunopeptidomics, where peptides come from non-tryptic digestion and are very hard to identify, pUniFind finds 42.6% more peptides than Open-pFind. In modification-rich de novo sequencing, it identifies 60% more peptide-spectrum matches than existing methods, despite working in a 300 times larger search space. In regular de novo, it recovers 38.5% more peptides, including 1,891 that map to the human genome but are absent from reference proteomes. A deep learning quality control filter raises consistency with RNA-Seq evidence from 65.4% to 85.0%.

What makes this an ML story is the architecture choice. End-to-end scoring with a shared latent space replaces hand-crafted feature pipelines, and the same backbone serves both database search and de novo sequencing without retraining.

For drug discovery, immunotherapy, and antibody engineering, peptide identification is the bottleneck for neoantigen discovery and biomarker pipelines. Tools that find more peptides at the same FDR, especially in non-tryptic and modification-rich settings, expand what is detectable from existing data. That changes what is worth running in the lab.

Paper: Zhao et al., Nature Machine Intelligence (2026) — journal license | https://t.co/U05n5GbjZR

chapmanb retweeted

Seth Howes

@SethSHowes

11 days ago

I just sequenced a human genome to 30× coverage entirely at home. As far as I know, this is the first time this has been done. I didn’t step foot in a lab once. Every step - from saliva collection, to running the sequencer - took place in a single room with a dining table + kitchenette. Six weeks ago, I had never done wet lab biology before. I used an Oxford Nanopore P2 Solo - the only commercially available sequencing device portable enough to do 30x human genome sequencing at home. Biggest takeaway - I could build something that combined software, hardware, and molecular biology far faster than I thought was possible. I can name >100 specific instances where AI helped me solve a technical problem that would previously have blocked me because I lacked access to a domain expert. For example: how do I save my sequencing run when my DNA extraction yield is 4x lower than I need it to be, and I have this limited set of reagents to hand? To make this work, I had to navigate multiple disciplines: - writing software to monitor sequencing runs and orchestrate remote GPU infra for basecalling - learning + executing 5 hour long molecular biology protocols - building a hardware device to quantify DNA concentration Apologies for the hyperbole, but I feel super lucky to be living in 2026. A few weeks ago I decided to sequence a human genome to 30x at home. Then I actually did it. And I did it really quickly.

SethSHowes's tweet photo. I just sequenced a human genome to 30× coverage entirely at home.

As far as I know, this is the first time this has been done.

I didn’t step foot in a lab once. Every step - from saliva collection, to running the sequencer - took place in a single room with a dining table + kitchenette.

Six weeks ago, I had never done wet lab biology before.

I used an Oxford Nanopore P2 Solo - the only commercially available sequencing device portable enough to do 30x human genome sequencing at home.

Biggest takeaway - I could build something that combined software, hardware, and molecular biology far faster than I thought was possible.

I can name >100 specific instances where AI helped me solve a technical problem that would previously have blocked me because I lacked access to a domain expert.

For example: how do I save my sequencing run when my DNA extraction yield is 4x lower than I need it to be, and I have this limited set of reagents to hand?

To make this work, I had to navigate multiple disciplines:

- writing software to monitor sequencing runs and orchestrate remote GPU infra for basecalling
- learning + executing 5 hour long molecular biology protocols
- building a hardware device to quantify DNA concentration

Apologies for the hyperbole, but I feel super lucky to be living in 2026.

A few weeks ago I decided to sequence a human genome to 30x at home.

Then I actually did it. And I did it really quickly.

251

469

383K

chapmanb retweeted

good papers @paperperday

14 days ago

A plasmid toolbox for the easy autodisplay of recombinant proteins and its optimization "a toolbox of 81 plasmids, accessible by Addgene, facilitates tailor-made surface display of proteins in bacteria" https://t.co/1cCKiDPA50

paperperday's tweet photo. A plasmid toolbox for the easy autodisplay of recombinant proteins and its optimization

"a toolbox of 81 plasmids, accessible by Addgene, facilitates tailor-made surface display of proteins in bacteria"
https://t.co/1cCKiDPA50 https://t.co/ksjDa2N29a

chapmanb retweeted

rcsb pdb 💉🧬💻🔬💊🌱🧠🦠 @buildmodels

19 days ago

Try this new online training game from Leandro F. Estrozi designed for newcomers in structural biology, with a focus on cryo-EM and cryo-ET map interpretation: https://t.co/gxhQXmRPWL

buildmodels's tweet photo. Try this new online training game from Leandro F. Estrozi designed for newcomers in structural biology, with a focus on cryo-EM and cryo-ET map interpretation: https://t.co/gxhQXmRPWL https://t.co/kbD8dgs0Yd

373

346

51K

chapmanb retweeted

Sylvain Gariel

@SylvainGariel

21 days ago

AI agents falling in the consensus trap: Just discovered this tweet from 2 months ago. 2 teams of LLM powered agents representing 2 Pharma C suites made the same $20B/yr mistake that BMS made in 2012 while developing their checkpoint inhibitor Opdivo : going for the broader patient population vs a narrower set of biomarker positive patients (what Merck did for Keytruda). Cannot recommend @liangc_science’s piece enough. Methodology seems solid. Will think about how to re run this with a different setup and the newer models/harnesses. Super interesting model eval idea.

SylvainGariel's tweet photo. AI agents falling in the consensus trap: Just discovered this tweet from 2 months ago. 2 teams of LLM powered agents representing 2 Pharma C suites made the same $20B/yr mistake that BMS made in 2012 while developing their checkpoint inhibitor Opdivo : going for the broader patient population vs a narrower set of biomarker positive patients (what Merck did for Keytruda).
Cannot recommend @liangc_science’s piece enough. Methodology seems solid. Will think about how to re run this with a different setup and the newer models/harnesses. Super interesting model eval idea.

chapmanb retweeted

Noelia Ferruz @ferruz_noelia

22 days ago

Generative AI models have transformed protein design, but have they helped us understand proteins better? In our perspective, we review XAI for protein language models and ask how we can open these black boxes and extract scientific knowledge from them. Read our manuscript now out at @NatMachIntell!: https://t.co/d6YFJaY9nm (with Twitter-less Andrea Hunklinger :))

173

106

17K

chapmanb retweeted

Adaptyv Bio

@adaptyvbio

22 days ago

What happens when you let frontier LLMs design proteins, and then synthesize and test them in a wet lab? We ran a protein design competition with @muni_bio where AI agents competed against humans to design molecules that bind TREM2, a key receptor linked to Alzheimer’s. Results: GPT 5.2 and Grok 4.1 both placed in the top 5, with molecules showing strong binding to TREM2 when tested in our lab.

adaptyvbio's tweet photo. What happens when you let frontier LLMs design proteins, and then synthesize and test them in a wet lab?

We ran a protein design competition with @muni_bio where AI agents competed against humans to design molecules that bind TREM2, a key receptor linked to Alzheimer’s.

Results: GPT 5.2 and Grok 4.1 both placed in the top 5, with molecules showing strong binding to TREM2 when tested in our lab.

26K

chapmanb retweeted

Pranam Chatterjee

@pranamanam

23 days ago

What if we could design ligands that not only bind a target, but functionally direct it toward activation or inhibition? In our #ICML2026 Spotlight, we present TD3B, a discrete diffusion framework for directional allosteric binder generation! 🦉 📜: https://t.co/EI8AOWChqY 🤗: https://t.co/kNC0ywflgQ

221

117

17K

chapmanb retweeted

Timothy Gowers @wtgowers @wtgowers

27 days ago

I've recently got in on the act of getting AI to solve open problems in mathematics. More precisely, I gave some questions asked by Melvyn Nathanson to ChatGPT 5.5 Pro, to which I have been given access, and it answered them. 🧵

381

642K

chapmanb retweeted

The Align Foundation

@Align_Bio

about 1 month ago

📊 Engineering better PETases isn’t just a modeling problem, it’s a data problem. In the PETase Engineering Tournament, we partnered to develop three independent assay platforms to measure expression and activity across temperature and pH: cell-free systems, E. coli + Rapid Fire Mass Spec, and microfluidic droplets. The takeaways: → High-throughput ≠ high-quality → Realistic assays ≠ scalable assays → Generating reliable, ML-ready data is still the bottleneck Though challenging, this is the kind of groundwork needed to actually move the field forward. 🔗Full methods + learnings: https://t.co/IzCBunHrES 💪 Many thanks to our sponsor Twist Bioscience for DNA synthesis and to Adaptyv for their valuable collaboration on assay development. #ProteinEngineering #SyntheticBiology #EnzymeEngineering #MachineLearning #Biotech #AssayDevelopment #DataScience #Bioengineering #Sustainability #PlasticRecycling

Align_Bio's tweet photo. 📊 Engineering better PETases isn’t just a modeling problem, it’s a data problem.

In the PETase Engineering Tournament, we partnered to develop three independent assay platforms to measure expression and activity across temperature and pH: cell-free systems, E. coli + Rapid Fire Mass Spec, and microfluidic droplets.

The takeaways:
→ High-throughput ≠ high-quality
→ Realistic assays ≠ scalable assays
→ Generating reliable, ML-ready data is still the bottleneck

Though challenging, this is the kind of groundwork needed to actually move the field forward.
🔗Full methods + learnings: https://t.co/IzCBunHrES

💪 Many thanks to our sponsor Twist Bioscience for DNA synthesis and to Adaptyv for their valuable collaboration on assay development.

#ProteinEngineering #SyntheticBiology #EnzymeEngineering #MachineLearning #Biotech #AssayDevelopment #DataScience #Bioengineering #Sustainability #PlasticRecycling

chapmanb retweeted

Sylvain Gariel

@SylvainGariel

about 1 month ago

https://t.co/S6xX2C0TuQ

245

357

129K

chapmanb retweeted

Avi Roy

@agingroy

about 1 month ago

For more than 20 years, the entire Alzheimer's field chased one target: amyloid. The next wave doesn't all chase amyloid. In 2026, the Alzheimer's pipeline has 11 readouts across 5 mechanism classes. The deepest year on record. Only three of the eight bets still target amyloid. The rest: 1. Remternetug (Lilly): next-gen anti-amyloid, self-injection. 100% plaque clearance in 3 months at top dose. Phase 3 data imminent. 2. Leqembi SC (Eisai/Biogen): at-home version of Leqembi. FDA decision May 24. 3. BIIB080 (Biogen/Ionis): first anti-tau drug to test efficacy in humans. ~60% CSF tau reduction. Q2/Q3 readout. 4. Sabirnetug (Acumen): targets soluble Aβ oligomers, not plaque. Zero ARIA in Phase 1. 5. ACI-24.060 (AC Immune/Takeda): anti-amyloid vaccine. 1-2 shots a year. Interim PET data 1H 2026. 6. AXS-05 (Axsome): oral, treats AD agitation without sedation. PDUFA late April. 7. Xanamem (Actinogen): first oral drug to lower brain cortisol. 60% slowing in high-pTau patients. November data. 8. AL101 (Alector/GSK): boosts progranulin, the brain's clean-up signal. Futility check 1H 2026. One mechanism for two decades. Five in one year.

agingroy's tweet photo. For more than 20 years, the entire Alzheimer's field chased one target: amyloid.

The next wave doesn't all chase amyloid. In 2026, the Alzheimer's pipeline has 11 readouts across 5 mechanism classes. The deepest year on record. Only three of the eight bets still target amyloid. The rest:

1. Remternetug (Lilly): next-gen anti-amyloid, self-injection. 100% plaque clearance in 3 months at top dose. Phase 3 data imminent.

2. Leqembi SC (Eisai/Biogen): at-home version of Leqembi. FDA decision May 24.

3. BIIB080 (Biogen/Ionis): first anti-tau drug to test efficacy in humans. ~60% CSF tau reduction. Q2/Q3 readout.

4. Sabirnetug (Acumen): targets soluble Aβ oligomers, not plaque. Zero ARIA in Phase 1.

5. ACI-24.060 (AC Immune/Takeda): anti-amyloid vaccine. 1-2 shots a year. Interim PET data 1H 2026.

6. AXS-05 (Axsome): oral, treats AD agitation without sedation. PDUFA late April.

7. Xanamem (Actinogen): first oral drug to lower brain cortisol. 60% slowing in high-pTau patients. November data.

8. AL101 (Alector/GSK): boosts progranulin, the brain's clean-up signal. Futility check 1H 2026.

One mechanism for two decades. Five in one year.

260

175

23K

chapmanb retweeted

James

@basedsystems

about 1 month ago

We wrote our own hyper-efficient version of the STAR aligner. Typically it costs around $35 in cloud compute to process 30k cells with Cell Ranger. Ours costs so little that we don't charge for it. Also ours works with nanopore. If you prefer pseudoalignments, then you can run it in pseudoalignment mode -- obviously we don't charge for that too. All results instantly integrated with our other tools. As soon as a run finishes you can open alignments or expression matrices in our viewers without moving anything. Let me know if you want to try it!

basedsystems's tweet photo. We wrote our own hyper-efficient version of the STAR aligner.

Typically it costs around $35 in cloud compute to process 30k cells with Cell Ranger. Ours costs so little that we don't charge for it. Also ours works with nanopore.

If you prefer pseudoalignments, then you can run it in pseudoalignment mode -- obviously we don't charge for that too.

All results instantly integrated with our other tools. As soon as a run finishes you can open alignments or expression matrices in our viewers without moving anything.

Let me know if you want to try it!

131

23K

chapmanb retweeted

Gian Marco Visani @GMarcoVisani

about 1 month ago

We benchmarked Arc Institute’s “MULTI-evolve” and found evidence that it learns a classical additive model, not epistasis. Preprint: https://t.co/qYQxEXbJzM Blog: https://t.co/BKCmRii3G1

264

154

42K

chapmanb retweeted

Georgia Channing

@cgeorgiaw

about 1 month ago

🤗🤗🤗introducing Hugging Science -- the home of AI for science 🤗🤗🤗 open models and datasets are the powerhouse of science (see the PDB), but finding the models and data you actually need for your breakthrough is hard af you shouldn't need to scrape arxiv, own your own wetlab, fight a custom HDF5 parser, build a fusion stellarator, and beg for compute before you've trained a single epoch so we're changing that we've put all the best science on @huggingface in one place: - 78GB of genomics data - 11TB of PDE simulations - 100M cell profiles - 9T DNA base pairs - 13M molecular trajectories - 400k medical QA pairs and much more, all open, and all ready for training (+ you can also now filter and search by domain, task, and keyword) we've put together all the biggest releases from our partners at NASA, Google, OpenAI, Meta FAIR, Arc Institute, Ginkgo, SandboxAQ, Proxima Fusion, NVIDIA, Ai2, OpenADMET, InstaDeep, Future House, Polymathic AI, LeMaterial, Earth Species Project, Merck, and Eve Bio if you're not sure where you fit in -- work on open challenges for problems that matter: including fusion stellarator design, ADMET, antibody developability, multilingual medicine, catalysis and materials, and scientific reasoning. we're already changing how science gets done: a fusion startup needed a benchmark for stellarator plasma confinement that didn't exist. @proximafusion shipped ConStellaration on Hugging Science: a leaderboard, dataset, and eval metrics, all in one place. a drug discovery team wanted to predict hPXR induction. OpenADMET put up a blind challenge: 11,000+ compounds assayed at Octant, 513 held out, two tracks (pEC50 + structure). Anyone in the world can train and submit. an antibody team at @Ginkgo released GDPa1, a developability dataset for stability, manufacturability, and immunogenicity prediction, with a live leaderboard scoring every submission. if you know a problem the ML community should be working on, let us know. make a challenge! this is about putting all the tools for solving science in one place. so we can hillclimb! → https://t.co/T4l4r1lDz0

351

197K

chapmanb retweeted

Sherwin Wu

@sherwinwu

about 1 month ago

We quietly launched the Symphony repo on Github last month, and it's already accumulated 15.5k stars! Excited to share this post that dives into it even deeper: a library that lets you use Codex to orchestrate work normally done by teams of engineers. https://t.co/aoqUvfrWS2

691

637

77K

Brad Chapman

@chapmanb

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users