Multimodal AMIE now published @NatureMedicine!
This is from past work @GoogleDeepMind where we studied patients uploading images during diagnostic dialogue. We found that a multimodal reasoning harness that tracks a patient’s state greatly improves history taking and clinical accuracy. We also surpassed doctors across many evaluation axes in diverse primary care settings.
https://t.co/kn71IF9cNN
Very happy (and relieved) to see our work on multimodal conversational medical AI accepted in @NatureMedicine
https://t.co/HgaVwaBxKx
In the published version, we have substantially expanded on the analysis and evaluation. Kudos to @_cjpark@timstro@JanFreyberg@_khaledsaab
This work also formed an important precusor for our more recent work where we explored a similar problem but in real-time interaction: https://t.co/LjKswtYFrN
Both modes of UX (synchronous and asynchronous) are useful but in different ways.
Also a nice reminder that a prospective evaluation remains as an important future work.
AI co-clinician is our new research initiative to help explore how multimodal agents could better support healthcare workers and patients. 🩺
Here’s a snapshot of our progress 🧵
Sir Demis sharing at the Google IO stage:
- AI Co-scientist - our OG @GoogleDeepMind Gemini agents for accelerating scientific discovery and helping finding cures for complex diseases (acute myeloid leukemia, liver fibrosis and counting)
- AMIE - our research AI doctor system with Nature papers and real-world clinical deployment
More soon :)
Working at @GoogleDeepMind is such a privilege - the chance to convert a lifelong mission & passion into two projects advancing the frontier of AI in medicine. As shared by the boss @demishassabis at #GoogleIO #googleio25 :
- The art of the possible for conversational diagnostic AI: following our 2 full Nature papers with prospective validation studies…
- Accelerating biomedical discovery with AI co-scientist (wet-lab validations of novel hypotheses in multiple diseases including liver fibrosis, AML and antimicrobial resistance…)
Super thankful to our amazing team. And we are just getting started… ❤️
It was great to speak to @Nature news w/ @RyutaroTanno about AMIE’s progress: our research conversational diagnostic AI now reasoning accurately in multimodal medical dialogue - handling common issues like photos of skin lesions, clinical docs & labs. More steps to real utility
New paper alert: AMIE gets vision!
Blog: https://t.co/PHX5MrfkE3
Paper: https://t.co/Kkd9irweBU
Remote healthcare often involves interpreting medical images and documents in addition to patient conversations. (1/n)
Out-the-box gemini 2.0 flash in an agentic framework surpasses doctors in multimodal medical conservations.
This is a pretty big deal for medicine.
Huge congrats to @KhaledSaab11 @RyutaroTanno@JanFreyberg + the whole AMIE team @GoogleDeepMind@GoogleAI on this accomplishment!
Excited to share a big update on AMIE, our research AI doctor from @GoogleDeepMind and @GoogleAI
Now, AMIE can “see” and interpret visual medical data within a diagnostic conversation.
Yes, AMIE exceeded human doctors (PCPs) in many key metrics like diagnostic accuracy and multimodal reasoning in a simulated clinical exam (OSCE) study.
But crucially, how we realised this upgrade was different from our previous works, which relied very much on domain-specific (pre- or post-) training.
We demonstrate that, with no finetuning, the combination of
(1) natively multimodal Gemini 2.0 Flash
(2) domain-specific inference-time algorithm
can result in a capable conversational diagnostic AI.
Through this year long project, we felt the power of the evolving frontier foundation models in this important domain while lots of work still remains to be done.
See the thread from @KhaledSaab11 to learn more:
https://t.co/qcj2RMgzGg
Adding more details and pointers to deep dives from my colleagues below!
Our multimodal AMIE study is out!
This is another exciting step for conversational medical AI – many congrats to the AMIE team!
A brief 🧵on what this means for both doctors and patients:
https://t.co/LzZVeXdZqi
AMIE with vision! Patients can now share visual content such as skin images or ECG tracings with AMIE. In a multimodal OSCE study, we show that AMIE outperforms PCPs in requesting and interpreting that information -- a thread 🧵: https://t.co/yx2GtrSiXm
✨ New paper from our team at @GoogleAI@GoogleDeepMind - AMIE goes multimodal 👀
Our research conversational diagnostic AI now fluently interprets visual photos/tests in multimodal instant messaging. More info at: https://t.co/rg7NoefDde (1/n)
Sharing progress: Our research AI agent, AMIE, now interprets visual medical information (images, test results) within diagnostic conversations.
We introduce a multimodal state-aware reasoning framework, built on @GoogleDeepMind's Gemini models, that aims to better handle complex clinical information.
In simulated clinical evaluations (OSCEs), AMIE met or exceeded human physicians on a broad range of benchmarks, including visual reasoning, diagnostic accuracy, management reasoning, and empathy.
Crucially, these results are from a controlled simulation using patient actors (see paper for full limitations). Proving safety, reliability, and utility requires rigorous testing in real-world settings. Our upcoming study with Harvard BIDMC is the first step in that essential validation.
Blog: https://t.co/MSodxG64aZ
Paper: https://t.co/bqALRUsaNs
A foundational step by a dedicated team.
@GoogleAI, @GoogleDeepMind: @RyutaroTanno, @alan_karthi, @vivnat, @AdamRodmanMD, @KhaledSaab11, @taotu831, @hardyshakerman, @JanFreyberg, @_cjpark, @yasharmaa, @apalepu13, @arkitus, @weballergy, @valentinlievin, @ckbjimmy, @davidstutz92, @dgtbarrett, @yongcheng16@SaraM66905, @dr2w, @ymatias
Gemini powers our multimodal health research! 💙
In our new paper on multimodal AMIE, we're pushing conversational diagnostic AI beyond text to handle images such as skin photos, ECGs, and clinical docs, which provide crucial context in healthcare.
Blog: https://t.co/VAlKoR53Il
Paper: https://t.co/2zHQT0H5Pv
How do we make an AI reason like a clinician during a dynamic, multimodal conversation? One of our key contributions is multimodal state-aware reasoning, built on @GoogleDeepMind Gemini 2.0 Flash.
Instead of just reacting turn-by-turn, AMIE maintains an internal "understanding" of the consultation:
✅ What is known about the patient?
✅ What are the likely diagnoses?
✅ What information (text or visual) is missing?
This internal state allows AMIE to:
👉 Intelligently guide the conversation through phases like history-taking & diagnosis.
👉 Strategically ask for relevant images (like skin photos or screenshots of ECGs/docs) when its internal state shows uncertainty.
👉 Accurately interpret multimodal data and weave the findings back into the ongoing dialogue and diagnostic process.
Essentially, it mimics the adaptive reasoning clinicians use, leading to a more structured and effective consultation.
We evaluated multimodal AMIE against primary care physicians (PCPs) in a demanding, blinded OSCE study using 105 diverse multimodal scenarios.
The results demonstrate clear progress: AMIE achieved similar or superior performance when compared to PCPs across a wide range of metrics, including diagnostic accuracy, empathy, and critically, the handling and reasoning about multimodal data.
While the OSCE results are very promising, it's important to remember this was a test environment with patient actors! Real-world care is more complex. Making sure it's safe, reliable, and actually helpful in the real world needs more work, starting with our upcoming study with Harvard BIDMC.
The work would not have been possible without an amazing team @GoogleAI, @GoogleDeepMind: @RyutaroTanno, @alan_karthi, @vivnat, @AdamRodmanMD, @timstro, @taotu831, @hardyshakerman, @JanFreyberg, @_cjpark, @yasharmaa, @apalepu13, @arkitus, @weballergy, @valentinlievin, @ckbjimmy, @davidstutz92, @dgtbarrett, @yongcheng16@SaraM66905, @dr2w, @ymatias
Building on Articulate Medical Intelligence Explorer — AMIE, our research diagnostic conversational AI agent — today on the blog we share a first of its kind demonstration of a multimodal conversational diagnostic AI agent, multimodal AMIE. Learn more →https://t.co/SdRA5mn6oh
✨New study from our team @GoogleDeepMind@GoogleAI - AMIE goes Multimodal✨
Our research conversational diagnostic AI now fluently considers visual photos/tests. In randomized OSCE study AMIE outperformed PCPs in simulated consultations in which patients uploaded photos of skin concerns, ECG tracings or lab tests. Medical dialogue can hinge critically on multimodal tests like these, so AI systems need to expertly reason about this complex information during a diagnostic conversation. 👀More here: https://t.co/UglrGlueSY (1/n)
Delighted to share ✨Med-Gemini✨ - our new family of multimodal models for medicine unlocking new possibilities for health - https://t.co/7Vqpw33yrK
More accurate multimodal conversations about medical images🩻, surgical videos📽️, genomics🧬, ultra-long health records📚, ECGs🫀 & more with state-of-art performance across multiple benchmarks
More accurate, up-to-date answers to medical questions with advanced reasoning and intelligent use of web-search
Long-context abilities. Summaries or referral letters from long health records, analyses of dozens of long research PDFs & more (1/6)
What unprecedented opportunities can 1M+ context open up in medicine?
Introducing 🩺Med-Gemini, a family of multimodal medical models, that excel in advanced reasoning 🧠, multimodal understanding 👁️🗨️ and long-context processing 📃.
👉https://t.co/i2c1DSD7jX