Multimodal AMIE now published @NatureMedicine!
This is from past work @GoogleDeepMind where we studied patients uploading images during diagnostic dialogue. We found that a multimodal reasoning harness that tracks a patient’s state greatly improves history taking and clinical accuracy. We also surpassed doctors across many evaluation axes in diverse primary care settings.
https://t.co/kn71IF9cNN
AI co-clinician is our new research initiative to help explore how multimodal agents could better support healthcare workers and patients. 🩺
Here’s a snapshot of our progress 🧵
GPT-5.5 early access results have been impressive.
25% lift in clinical quality.
30% less verbose.
We orchestrate hundreds of AI tasks, some powered by our proprietary data flywheels, others by frontier models like GPT-5.5.
We test rigorously for clinical accuracy, completeness, reasoning, real-world because benchmarking is a big deal in healthcare.
Grateful to the OpenAI team for the early access and partnership.
Doctors are increasingly using ChatGPT to supercharge their workflows in delivering care.
To support this high impact domain, we take steps to make ChatGPT more accessible, accurate, and measurable for clinical work.
https://t.co/L6onbKavNT
Today we’re introducing two big steps for health at OpenAI:
- ChatGPT for Clinicians, a free version of ChatGPT designed for clinical work
- HealthBench Professional, a new benchmark to evaluate real clinician chat tasks
We’re excited about what this can unlock for care. ❤️
Yesterday we reached an agreement with the Department of War for deploying advanced AI systems in classified environments, which we requested they make available to all AI companies.
We think our deployment has more guardrails than any previous agreement for classified AI deployments, including Anthropic's. Here's why: https://t.co/k1Ge2MqqPr
There is this narrative that up until this week, Anthropic had this wonderful contract that prevented the U.S. government from doing mass domestic surveillance or autonomous lethal weapons, and now all hell will break lose.
As I wrote, I am not a fan of accelerating AI specifically in the national security space. If I had been an Anthropic employee at the time they signed their original deal with the DoW, I would have probably opposed it, especially given the reduced control since they worked through Palantir. And I don't think having some terms of use in the contract is what we can rely on to protect us.
I believe the drama of the last week about these terms of use is more about politics than substance.
The substance is about the details, which I hope more of which will come out soon. But it is wrong to present the OAI contract as if it is the same deal than Anthropic rejected, or even as if it is less protective of the red lines than the deal Anthropic already had in place before.
Obviously I don't know all details of what Anthropic had before, but based on what I know, it is quite likely that the contract OAI signed gives *more* guarantees of no usage of models for mass domestic surveillance or autonomous lethal weapons than Anthropic ever had.
230M people use ChatGPT for health & wellness questions every week. 📈
A recent RCT showed that it improves patient outcomes. ⚕️
And soon... it will be FREE for ALL 👏 (with no ads!)
- @thekaransinghal, @OpenAI Health Lead, on "Universal Medical Intelligence"
@SRSchmidgall@taotu831 It’s near instant but also a prerequisite that has near term impact in health AI. What are your thoughts on how medical AGI evals should evolve? Cc @taotu831
We’re at an inflection point of AGI evaluation where verification gets much harder. We started with multiple choice (instant verification) to research-grade problems (multi-day by a few experts). And soon it will be multi-month and multi-year (e.g., tape-out, clinical trials).
We went from AI systems that struggled to do grade school math to AI systems that can solve research-level math problems in just a few years.
I agree with Jakub this is perhaps the most important eval now.
I am also pretty sure the main reaction will be "it's not that hard" :)
Very excited about the "First Proof" challenge. I believe novel frontier research is perhaps the most important way to evaluate capabilities of the next generation of AI models.
We have run our internal model with limited human supervision on the ten proposed problems. The problems require expertise in their respective domains and are not easy to verify; based on feedback from experts, we believe at least six solutions (2, 4, 5, 6, 9, 10) have a high chance of being correct, and some further ones look promising.
We will only publish the solution attempts after midnight (PT), per the authors' guidance - the sha256 hash of the PDF is d74f090af16fc8a19debf4c1fec11c0975be7d612bd5ae43c24ca939cd272b1a .
This was a side-sprint executed in a week mostly by querying one of the models we're currently training; as such, the methodology we employed leaves a lot to be desired. We didn't provide proof ideas or mathematical suggestions to the model during this evaluation; for some solutions, we asked the model to expand upon some proofs, per expert feedback. We also manually facilitated a back-and-forth between this model and ChatGPT for verification, formatting and style. For some problems, we present the best of a few attempts according to human judgement.
We are looking forward to more controlled evaluations in the next round!
https://t.co/jtLCOhJftv #1stProof
From how the team operates, I always thought Codex would eventually win. But I am pleasantly surprised to see it happening so quickly.
Thank you to all the builders; you inspire us to work even harder.
Scientific discovery and clinical medicine are often treated as distinct phases. But for patients with rare, complex, and undiagnosed diseases, this separation is a luxury they cannot afford. The timeline from understanding a genetic mechanism to accessing subspecialist care is often too long and too fragmented.
Two new @GoogleDeepMind@GoogleResearch collaborations with @StanfordMed , published in Advanced Science and @NatureMedicine respectively last week, demonstrate how AI can bridge this gap.
1. Accelerating discovery (the science)
In Advanced Science, we present one of the first wet-lab validated examples of AI-assisted genetic discovery.
Our AI identified a novel genetic factor for hearing loss (Crym) in mice, which Dr Gary Peltz and team validated using CRISPR knock-in experiments to restore the wild-type gene and rescue the phenotype.
We applied this agentic AI scaffold to human patients with complex, undiagnosed conditions in a retrospective manner. The system analyzed genomic data for rare diseases, such as IRAK4 deficiency and ODC1 mutations, successfully identifying causative variants that matched expert clinical assessments.
2. Scaling expertise (the medicine)
Discovery is only the first step; patients then need access to specialized care.
As we note in our Nature Medicine paper, hypertrophic cardiomyopathy (HCM) is a leading cause of sudden cardiac death, yet ~60% of patients remain undiagnosed due to a lack of specialist centers .
In our RCT (one of the first of its kind) using our research AI system AMIE, we showed AI could help bridge this gap. General cardiologists using AMIE reported the system helped their assessments in 57.0% of cases, missed no clinically significant findings in 93.5% of cases and reduced assessment time in 50.5% of cases. This suggests the AI can act as a helpful co-pilot and help generalists bridge the gap to specialists.
Worth noting that these studies used models like Med-PaLM 2, Gemini 2.0 Flash, and Gemini 2.5 Pro with simple agentic scaffolds.
The potential for Gemini 3 and AI co-scientist to accelerate both the biology of discovery and the delivery of care is profound and we will share more soon.
Its a true privilege to collaborate with @euanashley , Jack W O'Sullivan, Dr Gary Peltz and their teams at Stanford Medicine.
With incredible team mates at Google including @taotu831@apalepu13 , @alan_karthi , @Mysiak and many more.
Advanced Science paper - https://t.co/lkj0soEGuj
Nature Medicine paper - https://t.co/G38tExCakN
AI co-scientist blog - https://t.co/Br5JegvcF6
AMIE blog - https://t.co/iLErUQRsSW
Excited to share our latest research published today in @NatureMedicine, demonstrating how Large Language Models (LLMs) can help bridge the critical shortage of subspecialist medical expertise. https://t.co/5OE5tYw3dE