We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader benchmarking effort across drug-discovery stages and therapeutic modalities.
TxBench-PP tests whether agents can recover accurate conclusions from realistic assay artifacts rather than memorized facts from the literature. The benchmark contains 100 evaluations indexed by program stage, assay type, and task structure, spanning mechanism-of-action (MoA) and pharmacodynamic (PD) reasoning, compound-target engagement, causal target validation, developability and safety, and translational efficacy.
The strongest model-harness configuration was Claude Opus 4.8 + Pi at 59.3%, followed by GPT-5.5 + Pi at 55.3%.
While experiments are rate-limited by natural processes, human decisions and organizational consensus often make up significant components of program timelines in drug discovery. Agents promise to accelerate discovery, development, and translation by compressing these interpretation and decision-making loops.
However, the practical use of agentic systems in industrial workflows requires standardized and trusted methods of evaluating performance. This is especially challenging in drug discovery because the ecosystem is a sprawling landscape of assay categories, development stages, therapeutic modalities, and decision types.
Benchmarks must therefore measure realistic tasks while providing focused treatment of the many local scientific judgments that make up the biotech ecosystem.
We evaluated 16 model-harness configurations, comprising 11 models across three agent harnesses, on 100 preclinical pharmacology tasks. Each configuration was run three independent times per task, yielding 4,800 agent trajectories.
Performance varied by program stage: accuracy ranged from 27% in screening and hit prioritization to 55% in drug response. Difficult program stages involved decisions across QC, statistics, and chemical or biological judgment of molecular candidates.
Trajectory analysis reveals gaps in scientific judgement. Failures included incorrect perception of assay outputs, reliance on literature priors over supplied evidence, and assay-specific reasoning mistakes.
Manuscript, results and subset of evals/trajectories available below:
Have welcomed some talented drug hunters + former biotech founders to our team to start a multi-month benchmarking and agent engineering project a bit closer to development. Releasing our first intermediate result tomorrow.
Introducing SpatialBench-Long, a benchmark for long-horizon spatial biology. Agents must recover biological claims from raw data and realistic experimental context without prescribed methods.
24 evals span primary tumors, organoids, xenograft models, lineage-tracing systems, and aging/intervention biology. The best agents score 11.1%.
Technical talks on engineering challenges with AI and single-cell data in Mission Bay, SF, next Thursday.
Material covering emerging analysis methods for new kits, benchmarks and evaluations for frontier models, and practical AI for drug screening.
Harihara Muralidharan — Technical Staff @ LatchBio
Valentine Svensson — Principal Computational Biology Scientist @ Tahoe Therapeutics
Mikaela Koutrouli — Core Developer @ scverse
Zhen Yang — Technical Staff @ LatchBio
Link below:
Please join us for our free Technology Live Virtual Symposium on Spatial Biology this October 29th! https://t.co/UE5dJb24PK Speakers include Rong Fan, Andrea Radtke, Long Cai, Sarah Teichmann and many more!
Attending #BMES2024 this week! I’ll be presenting my research, "DBiT-seq Spatial Proteomics Analysis of Human Lung Fibrosis Samples," at the BMES Annual Meeting in Baltimore.
Looking forward to reconnecting with colleagues and meeting new researchers in the field.
Thrilled to report Patho-DBiT, just published in Cell 😊. It allows us to directly “see” all kinds of RNA species on the same clinical FFPE tissue slide, including mRNA, miRNA, snRNA, snoRNA, tRNA, etc, and splicing isoforms, genetic alterations (SNV, CNV, etc). Really a fun, cool, and powerful tool to explore human biology 🤩🤩
@CellCellPress
https://t.co/JUMiK16P5O
Check out this exciting webinar!! Leili and colleagues were combining single nuclei seq of cardiac tissues and tissue engineering (in vitro organoids models) with synthetic biology to understand and potentially target human heart diseases such as cardiac fibrosis!
If you are interested to know the limitations of Adoptive Cellular Therapies (ACT) including the engineered T Cell Receptor (TCR) therapy and Chimeric Antigen Receptor (CAR) T cell therapy, read our recently published review paper.
It explains recent findings on the mechanism of T cell suppression by tumor stroma and the curently used engineering approaches to dissect the mechanism of the T cell suppression by various stromal mechanical factors.
Immigrants hired by US corporations must have equal rights to others, which means a green card NOT the temporary H1B visa.
This will afford them mobility in case of employer abuse and protect others from wage suppression.
The Biden bill fixes this: https://t.co/OboVB13vDv
Our #Science2Art reveal party is underway!
Reveal the 2020 #SciArt video ✅ Don't worry, we'll show it again at our annual event. Are you registered yet?
Meet the artists ✅
Learn about #STEAM organizations in our region ✅
Bid on your favorite piece ▶️ https://t.co/UjgbeaIGAi
Really awesome piece from @MhsYzdn of @UMKC! Her work focuses on testing ways to prevent significant environmental damage due to oil spills! I love how #Science2Art presents important research in a way that allows more people to understand it, with @BioNexusKC.
Our 2020 #Science2Art auction benefits #STEAM in KC!
This image by Mahsa Yazdani, @UMKC displays droplets covered w/ water similar to how continents are separated w/ water on Earth. It's used to test prevention of damage done by crude oil spills.
Bid Now https://t.co/UjgbeaIGAi
Science to Art at BioNexus Kansas City is a platform for regional scientists to display and describe their research through the visual arts. All proceeds from the Science to Art auction will be donated to STEAM education in KC.