Principal AIDD 💊🤖Scientist at Insilico Medicine 🧬, Retrosynthesis🧪CADD expertise TL // right neo-hegelian, e/acc // All posts reflect my personal views only
Final countdown ⏳ for the Top-K accuracy metric, it should be forgotten forever. Welcome 🧪 ChemCensor 🔎 metric to evaluate single-step retrosynthesis models and LLMs specifically. Read our preprint https://t.co/5cCTnDeQUS and be prepared for new retrosynthesis benchmarks🚀.
Day 13 of #ScienceAIBench!
🧪 Today we are moving from biomedical domain to organic chemistry and specifically single-step retrosynthesis (SSRS). We are assessing how well top-tier LLMs can suggest plausible reactions 🔮 to get a compound from plausible reactants.
In contrast to conventional USPTO-50k-test utilizing ground-truth-based Top-K accuracy metric, the proposed new benchmark assess the models outputs for chemical plausibility framework built on broader chemical context within reaction centers and functional groups compatibility mimicking the way how chemists🧑🔬 review the reactions for their plausibility.
The key data-driven metric for chemical plausibility assessment is ChemCensor, which is a part of URSA (Utilitarian RetroSynthesis Assessment) family of retrosynthesis benchmarks. The brand new URSA-expert-2026 out-of-distribution benchmark set of target molecules is proposed for realistic assessment in real-world medicinal chemistry cases.
📄 Read the ChemCensor for LLMs Preprint: https://t.co/u2Z1UOo5N3
📋 Benchmark Specifications:
· Datasets:
📑 USPTO-50k-test: 4972 target molecules from conventional USPTO-50k set for SSRS models evaluation
🔥 URSA-expert-2026: 100 novel synthetically accessible target molecules assessed by experts
· Metric: max ChemCensor value, average per target (↑), {Av. PT max CC}
· Metric version: ChemCensor-U2, based on publicly available USPTO full set by D.Lowe
· Models Evaluated: GPT 5.1, GPT 5.2, Claude Sonnet 4.5, Claude Opus 4.5, DeepSeek 3.2, Gemini 2.5 Flash, Gemini 3 Flash and Grok 4.1
📊 Observed Performance:
· OOD leader: Gemini 3 Flash achieved the highest {Av. PT max CC} of 1.82, demonstrating superior plausibility of proposed reactions on the OOD URSA-expert-2026 set.
· LLM versions progress: newer LLM versions show substantial progress on both public and OOD sets (GPT 5.2 over 5.1, Gemini 3 over 2.5).
· Performance Gap: All models fail to perform at the OOD URSA-expert-2026 benchmark set as successfully as they perform at the well-reported data. Best performing Claude 4.5 Sonnet at USPTO-50k-test is only 3rd best at the OOD benchmark.
· Proprietary models win: top-tier proprietary models show much reliable performance rather than open-source models (DeepSeek 3.2). Some open-source models (like Kimi K2) were even not included to the chart due to poor (~0) performance.
🔄 Our daily series continues tomorrow.
#ScienceAI #InsilicoBench #MMAI #MMAIGym #DrugDiscovery #Retrosythesis #AIBenchmarks #Biotechnology
The genAI template for virtual screening: generate → screen → enumerate → validate at scale
e.g. @InSilicoMeds's LEGION searches NLRP3 target space: 34k pharmacophore scaffolds ➜ 110M candidates in days, not years.
🚀~60% of a 375k sample survive full 3D SBDD vs 8-26% for combinatorics
The genAI template for virtual screening: generate → screen → enumerate → validate at scale
e.g. @InSilicoMeds's LEGION searches NLRP3 target space: 34k pharmacophore scaffolds ➜ 110M candidates in days, not years.
🚀~60% of a 375k sample survive full 3D SBDD vs 8-26% for combinatorics
I am really impressed by the great work of my colleagues. It was a great fortune for me to see, how TNIK saga progressed and now it stepped into Phase II. Hope there will be more Nature papers by @InSilicoMeds. Congrats, folks! https://t.co/5iElzzSNiA #drugdiscovery
In a new microperspective in @ACSPublications Medicinal Chemistry Letters, researchers from Insilico analyze various AI/ML generative chemistry approaches to produce novel and synthetically feasible molecular structures and provide recommendations. https://t.co/lFzeIUrmiK
Check out the latest issue of #ACSMedChemLett
https://t.co/KskBNistch
▶️Read the cover article by Zhavoronkov et al. about recent developments in AI-driven drug design (AIDD) using medicinal chemistry.