Excited to announce ScienceMachineโs partnership with Inoviv to bring AI automation into proteomics workflows. ๐
Together, weโve reduced time spent on QC and reporting by ~80% and increased lab capacity by 20% โ giving scientists more time to focus on science, not admin.
How capable are AI agents at bioinformatics research?
I tested 9 LLMs on BixBench: 205 complex tasks from real-world research โ the kind I do as a compbio PhD
Results:
- 63% accuracy (3x higher than the SOTA 10 months ago)
- The harness mattered as much as the model
So we are asking for your advice: try out our platform with your most tedious scientific tasks and let's schedule a call on how we can improve the platform!
But with her and othersโ expert feedback, we went heads-down building and shipped a fine-tuned product, fit-for-purpose for complex scientific work. And weโre only getting started!
๐จ Studies suggest that writing a thesis or a paper increases your risk of sleep deprivation by 100%
Or at least that was the case for meโฆ
I would be pulling all-nighters, days before the big deadline
And nothing was more nerve-wracking than finding a bit of contradictory evidence or a bug in the data pipeline at the eleventh hour
It was this personal trauma that motivated @SciMac. And so it is super fun to see folks using it to win back those precious sleeping hours!
PhDs, Post-docs and PIs have used our platform to create publication-ready plots, interpret biological mechanisms, and automate painful workflows
Weโve upped the free usage tier, so feel free to give it a spin (here: https://t.co/eF641zTWTq)
[pic is of me after submitting my master's thesis w/ 0h of sleep]
@lorenzosani_@rekatronics@robbiemccorkell
From Richard Littlewood: "ScienceMachine transformed our research process at Flow Health Science Inc., streamlining the complex data analytics needed to launch our first product, Klarioโขโ
๐งฌ๐
๐ ScienceMachine powering real research!
Flow Health Science used SM to analyse and interpret their data, and helped them get published much quicker!
Read the paper here: https://t.co/PACjMWHGL1
This is just the beginning. And we need your help to make it even better.
๐ฃ So here's the ask ๐ฃ: try out our platform with your most tedious scientific tasks!
https://t.co/KAYbVaaUIP
๐๐งฌ Today, weโre launching the next generation of work in life sciences R&D
Over the past months we have been heads-down building, working closely with domain experts and leading biotech companies to define how AI will augment research. And today we're making it public!
To this end, we:
๐ค Built a faster, smarter, and more accurate AI agent, using the latest models and techniques
๐งช Added enhanced support for scientific workflows, such as ELISA, flow cytometry, and proteomics
๐พ Made it easier to work where your data already lives
Over the past year, thereโs been a surge of excitement around agentic AI โ systems that donโt just answer questions, but can act: reading instructions, running code, designing pipelines, and making decisions.
In biomedicine, this raises a provocative question:
๐ก Could the next member of your ML team be an AI agent?
The honest answer โ not yet.
Today, we share BioML-bench, a new open benchmark to measure how far todayโs agents are from this vision, and what it will take to get there.
๐ Paper : https://t.co/MI6Wxq3CWK
๐ป Code: https://t.co/yIG7JIOKjm
Why this matters
Biomedical discovery doesnโt happen in a single step.
Itโs messy, iterative, and deeply interdisciplinary: cleaning data, choosing models, validating results, integrating diverse domains like genomics, imaging, and clinical records.
Existing evaluations โ mostly Q&A or coding challenges โ donโt capture this complexity.
We needed a testbed that reflects the real work of biomedical ML.
What we built
BioML-bench is a suite of 24 real biomedical ML tasks where agents must:
--Parse nuanced task descriptions
--Build and train models end-to-end
--Compete against human leaderboards populated by domain experts
Itโs the first benchmark designed to ask: Can an agent truly operate like a biomedical data scientist?
What we learned
Our experiments with four different agents โ from general-purpose systems to biomedical specialists โ reveal a sobering truth:
--Current agents operate at ~35% of human expert performance.
--Domain specialization alone isnโt enough. Success comes from flexible, creative strategies, not rigid pipelines.
--Even on imaging tasks, deep learning was underutilized, highlighting a gap between human and agent intuition.
Looking ahead
The promise of agentic AI isnโt to replace human scientists โ itโs to amplify them.
Imagine a future where an agent can set up a first-pass analysis overnight, freeing a scientist to focus on questions, not debugging scripts.
Weโre not there yet. But with BioML-bench, we now have a shared yardstick to track progress, spark innovation, and bring accountability to this emerging field.
Grateful to our amazing team โ led by @Henrymiller2012 , with contributions from Matthew Greenig, Benjamin Tenmann, and support from @SciMac.
This work is a small but necessary step toward a future where AI becomes a true partner in biomedical discovery. ๐ฑ
#AI #Biomedicine #Agents #MachineLearning #BioML
๐ค Could anย ๐๐ ๐๐ ๐๐ง๐ญย be your biotech's newestย ๐๐ ๐๐๐ข๐๐ง๐ญ๐ข๐ฌ๐ญ?ย โจ
We built a benchmark to find out.
๐๐๐๐: ๐๐ฉ๐ฆ ๐ฐ๐ฑ๐ฆ๐ฏ ๐ด๐ฐ๐ถ๐ณ๐ค๏ฟฝ๏ฟฝ๏ฟฝ๏ฟฝ ๐ข๐จ๐ฆ๐ฏ๐ต๐ด ๐ธ๐ฆ ๐ต๐ณ๐ช๐ฆ๐ฅ ๐ฅ๐ฐ๐ฏ'๐ต ๐ฒ๐ถ๐ช๐ต๐ฆ ๐ฉ๐ช๐ต ๐ต๐ฉ๐ฆ ๐ฎ๐ข๐ณ๐ฌ ๐บ๐ฆ๐ต. ๐๐ถ๐ต, ๐ฏ๐ฐ๐ธ ๐ธ๐ฆ'๐ท๐ฆ ๐จ๐ฐ๐ต ๐ข ๐ฃ๐ฆ๐ฏ๐ค๐ฉ๐ฎ๐ข๐ณ๐ฌ๐ช๐ฏ๐จ ๐ด๐ถ๐ช๐ต๐ฆ ๐ต๐ฐ ๐ต๐ฆ๐ญ๐ญ ๐ถ๐ด ๐ธ๐ฉ๐ฆ๐ฏ ๐ธ๐ฆ'๐ณ๐ฆ ๐ฐ๐ฏ ๐ต๐ฉ๐ฆ ๐ณ๐ช๐จ๐ฉ๐ต ๐ต๐ณ๐ข๐ค๐ฌ ๐ข๐ด ๐ธ๐ฆ ๐ค๐ฐ๐ฏ๐ต๐ช๐ฏ๐ถ๐ฆ ๐ต๐ณ๐บ๐ช๐ฏ๐จ ๐ฏ๐ฆ๐ธ ๐ข๐จ๐ฆ๐ฏ๐ต๐ด.
๐ ย ๐ช๐ต๐ฎ๐ ๐๐ฒ ๐ณ๐ผ๐๐ป๐ฑ
โข We benchmarked two biomedical specialists (Biomni, STELLA) and two ML generalist agents (AIDE, MLAgentBench).
โข Biomedical specialization did not confer a consistent advantage as Biomni and AIDE generally performed better than STELLA and MLAgentBench overall.
โข Agents consistently underperform human baselines in general (avg. 34-37% of human leaderboard performance).
โข The best-performing agents tried more diverse ML strategies (feature engineering, model selection, stacking) rather than sticking to a single approach.
โข Deep learning was rarely used by agents, even on imaging tasks, despite the fact that human leaderboards were dominated by DL models.
๐ ๏ธ ย ๐ช๐ต๐ฎ๐ ๐๐ฒ ๐ฏ๐๐ถ๐น๐
โข BioML-bench โ a benchmarking suite for agentic BioML.
โข Built upon MLE-bench, agents must parse task descriptions, build & train models, and submit predictions graded against human leaderboards.
โข 24 biomedical-specific ML tasks covering multiple biomedical domains, with human leaderboards mostly populated by experts.
โข A software package to lower benchmarking barriers and enable reproducible evaluation.
๐ญ ๐ช๐ต๐ ๐ถ๐ ๐บ๐ฎ๐๐๐ฒ๐ฟ๐
Agentic AI holds promise for automating biomedical R&D. However, realizing this promise will require agents that can reliably complete end-to-end data analysis workflows and build predictive models. Currently, most agents are evaluated by text-based question answering, leaving a gap where practical evaluations of end-to-end BioML capability are needed. BioML-bench provides the first benchmarking suite to fulfill this need. As agents become a stronger focus for biomedical researchers in the coming years, we hope BioML-bench will serve as the standard benchmark for evaluating them in their biomedical ML capabilities.
๐ ๐๐ถ๐๐ฒ ๐ถ๐ป
โข ๐๐๐ฉ๐๐ซ: https://t.co/hQezVbGeNb
โข ๐๐ผ๐ฑ๐ฒ & ๐ฑ๐ผ๐ฐ๐: https://t.co/kckFkfDwze
Congrats to my co-authors Matthew Greenig, Benjamin Tenmann, and @BoWang87.
Thanks to @g27182818 for invaluable feedback. And thanks to @SciMac for providing compute and LLM API resources for this work.
Nothing more annoying than a plot that is just not quite right ๐คฌ
Weโve added simple *plot editing* for your most important figures, so you can visually edit things that are almost perfect!
https://t.co/KAYbVaamTh
thanks @MartinJBCoulter for writing about ScienceMachine and how we accelerate life sciences research with AI agents! ๐งฌ๐งช
@Siftedeu@lorenzosani_
https://t.co/zrLlBCFPZR