🚀 Thrilled to see Valence researchers recognized with a Spotlight presentation at @icmlconf's GenBio workshop!
VCR-Agent is designed for explainability, to explain the relationship between perturbation and cellular response. It’s an area that’s underexplored in the research but critically important for practitioners.
If you want to build on this work, we’ve released VC-TRACES, an initial collection of mechanistic traces on the Tahoe dataset.
Links in the thread below 👇
Shoutout to our research team: @yunhuijang_, @LuZhu66 , @FawkesJake, @AlisandraDenton, @dom_beaini and @ENoutahi! 👏
We can’t wait to connect with the GenBio community and anyone else pushing the boundaries of drug discovery at @icmlconf in Seoul!
🧬 Bridging the gap in AI drug discovery.
@AlisandraDenton, Staff Machine Learning Scientist at Recursion and one of the authors on our recent paper in @NatureBiotech, explains how the AI model TxPert predicts how a cell will respond to perturbations.
Predicting a cell’s RNA activity, or transcriptome, is key to bridging the gap between cellular changes and clinical outcomes and advancing the potential for AI drug discovery. As Ali says, “with hundreds of cell types and so much disease variation, the total possibilities are too vast to measure in a lab.”
She describes how TxPert allows us to perform a “Virtual Assay,” taking the mathematical signature of a healthy cell called the Basal State and adding the perturbation’s embedding to deliver a highly accurate prediction of what the cell’s transcriptome will look like after treatment.
TxPert uses layered graph-based models that integrate phenomics — or how a cell looks — and transcriptomics — which genes are expressed — along with massive public biological knowledge resources.
The model can even predict how a perturbation will work in entirely new cell lines it hasn’t seen before as well as accurately forecast the effects of “double perturbations,” consistently identifying "unknown unknowns" that traditional models — and even massive general-purpose AI — often miss.
Ali notes that TxPert is currently predicting genetic perturbations, but more flexible models — including those predicting drug effects — are in the works.
👉 Check out the full paper in Nature Biotech: https://t.co/4bkJhZj2tr
Shocking new result 🙃: telling a deep learning model what we already know about gene interactions helps it predict biology it hasn't seen before. Until high quality biological data catches up, scaling might not be the only path forward.
1/ Our most recent Inside Valence blog post delves into TxPert, a SOTA model published in @NatureBiotech.
The key finding: scale alone is insufficient, even smaller model architectures can achieve top performance when coupled with biological priors.
@IhabBendid35780
Announcing TxPert, a SOTA model for perturbation prediction in transcriptomics, which we just published in Nature Biotechnology. TxPert shows promising progress towards predicting perturbation outcomes in entirely unseen cell lines where no perturbations were observed during training.
Kicked off day 1 at @iclr_conf with our MarS-FM poster presentation.
We're introducing a new class of generative models delivering 600x speedup compared to traditional molecular dynamics simulations, without sacrificing structural accuracy.
If you want to connect with the team and others in the AI for drug discovery space, register for our TechBio social: https://t.co/ym5GSMvMJJ
👉 MarS-FM code: https://t.co/SqMw942l7J
👉 MarS-FM paper: https://t.co/6QhYfwGLmn
@KKapusniak1@CristianGabell1@mmbronstein@TOSSOUPrudencio@Francesco_dgv
☀️See you in Rio!
Recursion and our AI research engine, @valence_ai, will be at @ICLR April 23-27 in Rio to share some of our latest breakthroughs in generative modeling for AI drug discovery.
🔹 Check out our poster presentations on:
▪️ TxFM: Effective Biological Representation Learning by Masking Gene Expression. A state-of-the-art transcriptomics foundation model that combines Recursion’s proprietary dataset with a highly customized Masked Autoencoder (MAE) architecture to represent genes as they appear in nature (i.e., unordered). It provides a much more accurate representation of how genes interact as a system, connects experimental results directly to patient biology, and outperforms models up to 100x larger in terms of data size.
👉 Paper here: https://t.co/ewDJadKuyt
👉 Workshop here: Foundation Models for Science - https://t.co/iiihrUDVX4
▪️ MarS-FM: Generative Modeling of Molecular Dynamics via Markov State Models. We introduce a new class of generative models – MarS-FM (Markov Space Flow Matching) – which offers a a 100x+ speedup compared to traditional MD simulations, without sacrificing structural accuracy.
👉 Code here: https://t.co/C7cm1UveSG
👉 Paper here: https://t.co/NVEzi4WNwg
🎉 And join us for a TechBio Social on April 26, 6-9pm, co-hosted with ICLR’s Learning Meaningful Representations of Life (LMRL) Workshop, featuring a rooftop bar, waterfront views and engaging conversations about how we can advance the frontier of AI-powered drug discovery.
👉 RSVP here: https://t.co/WzwTL13Txq
#ICLR #ICLR26
Modern, large-scale microscopy datasets are sparse, noisy, and subject to batch effects. Downstream analyses and models—such as Virtual Cells—require representations that distinguish biological signal from technical noise. This is where foundation models come in.
However, in our ICLR workshop paper we find that on biologically meaningful tasks, most state-of-the-art foundation models for microscopy perform no better than untrained models.
https://t.co/MdG58OLdOu
@_Suresh2@valence_ai@yunhuijang_@LuZhu66 The hypotheses remain falsifiable in form. Noise can generate wrong hypotheses or weaken the falsification signal, but that is still much better than removing falsifiability altogether.
@anshulkundaje@valence_ai@yunhuijang_@LuZhu66 More generally, this is not that surprising. Recent benchmarks keep showing that most FM for transcriptomics are still a bit short compared to standard baselines across several tasks and datasets, especially in few-shot settings.
@anshulkundaje@valence_ai@yunhuijang_@LuZhu66 Short answer: STATE here is evaluated via its predicted perturbed transcriptomic profile from control. We then call DEGs from that predicted profile and compare them to ground truth.
Connecting molecular mechanisms to phenotypic outcomes is the missing piece for truly useful virtual cells. This work led by @yunhuijang_ shows we can structure LLM bioreasoning into verifiable, grounded mechanistic explanations that improve perturbation response prediction.
To realize our vision of better and more effective drug discovery, we must turn predictions into testable hypotheses and experimental designs.
Dive into the Explain pillar of our Virtual Cell framework: why it matters and how we adapt LLM reasoning to biology.
https://t.co/dDOn2IZLHA
We're excited to announce that MarS-FM has been accepted at ICLR!
We propose a new class of generative models that learn to sample state transitions of biomolecular systems, reproducing the statistics of Molecular Dynamics (MD) with drastic speedups.
https://t.co/mfGm6Kb4bx
Even with today's industrialized labs that can run millions of experiments each week, we can only cover a tiny fraction of all the experiments we may want to run.
In this week's Inside Valence, learn more about virtual assays.
https://t.co/EWgupc2WNN
We’re launching “Inside Valence”—a new series taking you behind-the-scenes of our research, exploring new ways to predict, explain, and ultimately decode biology.
In our inaugural post, we discuss why Virtual Cells are making a comeback and why there’s reason to be excited.
Mauricio Alvarez and I have a fully funded PhD position open to study representation learning and abstraction questions. Deadline next Friday (5 December). Please reach out if you're interested. https://t.co/iYzLoUQgz2
@apsarathchandar Nicely done, congrats to the team! My experience with FCD has been the same (useless metric).
FYI, we explored improving SAFE generative models to get the best of both worlds (out of box fragment-constrained design, and SMILES-level efficiency) here https://t.co/wT6HfZGXzM