🧵1/ Our new study on AI and physician reasoning just came out in @ScienceMagazine. As co-senior author, I'm excited about our findings, and I do think AI will reshape medicine. But after seeing some of the discussions, I'm also worried about how our findings may be misinterpreted.
Navigating Gigapixel Pathology Images with Large Multimodal Models
GPT-5 with the right agentic scaffold for navigating whole slide images outperforms slide-level pathology models on the novel MultiPathQA benchmark.
Manrai x 2 (nepo much?) + boat load of AIM graduate students from @harvarddbmi AIM program for GIANT pathology slide images https://t.co/e6YG6RtKCU Scanning images like a pathologist?
I had only skimmed the @NEJM case before of a master clinician vs AI chatobt from @tabuckley_ & @arjunmanrai
Reading closely, two things come to mind
@Gurpreet2015 is a freak of nature (in the best way)
And
Dr. CaBot & other AI like it are generationally transformative today
Great fun to attend Dr. CaBot's presentation today at the @BrighamWomens Clinical Reasoning Conferences (CRCs) led by @tabuckley_, with special thanks to @JamesADiao for making this happen!
Just listen to this AI running a difficult differential diagnosis meant to challenge the best and the brightest. Only 3 years after GPT4 released. Props to @arjunmanrai@tabuckley_@HarvardDBMI and to @NEJM for recognizing the milestone this represents.
The Dr. CaBot AI system was created to generate differential diagnoses in the style of an expert discussant from the Case Records of the Massachusetts General Hospital. The system produces both a written differential diagnosis and a video of a slide-based presentation. As seen in this video, the Dr. CaBot AI system can interpret both text and images.
Learn more in “A 36-Year-Old Man with Abdominal Pain, Fever, and Hypoxemia,” a Case Record of the Massachusetts General Hospital, by G. Dhaliwal et al., from @SFVAMC, @ucsf, Massachusetts General Hospital (@MassGeneralNews), and @harvardmed: https://t.co/op9URHA9Lv
Wow! The @NEJM CPC now has a differential diagnosis provided by AI, Dr. CaBot, created by @arjunmanrai lab @HarvardDBMI, alongside the human expert. Times they are a changing!
I’ve been obsessed with the @NEJM CPCs since I was in grad school. Now, with @tabuckley_, it’s surreal to see @NEJM publish the first AI differential diagnosis in the 100+ year history of the series, generated by our AI system Dr. CaBot, alongside the human expert’s.
Are language models overconfident in their clinical reasoning? Have reasoning optimizations made this problem worse?
Our latest in NEJM AI - we adapt a human-validated, trivially scalable automated benchmark to get to the heart of clinical reasoning in AI systems
How good is AI at making medical diagnoses? Good enough to help—and to hurt.
For @NewYorker, I spent months talking to patients, doctors, and researchers about how to make the most of a powerful new technology and how to minimize the side effects:
https://t.co/FXxaBy80l7
It was surreal (and awesome!) to see @DhruvKhullar cover our AI reasoning system "Dr. CaBot" in the New Yorker!
How does it work? How does it reason? And why does this matter for the future of AI in medicine?
A 🧵explaining the paper and piece ⬇️
https://t.co/T6MW1nMmYp
“For a long time, when I’ve tried to imagine A.I. performing the complex cognitive work of doctors, I’ve asked, How could it?” Dhruv Khullar writes. But a demonstration of a new A.I. bot forced him to confront the opposite question: How could it not? https://t.co/gnQljJxlaG
Lots of fun collaborating with this team on historic clinical reasoning work! 🤖
Give @tabuckley_@arjunmanrai@AdamRodmanMD a follow for more work to come ...
A few months ago, we were both excited and nervous to stage a public face-off at @Harvard between “Dr. CaBot,” our AI system that simulates an @NEJM expert discussant, and a real @NEJM discussant & expert diagnostician.
Today, @DhruvKhullar tells the story in The New Yorker.
Updated paper by physicians at Harvard, Stanford, and other academic medical centers testing o1-preview for medical reasoning & diagnosis tasks: “In all experiments—both vignettes and emergency room second opinions—the LLM displayed superhuman diagnostic and reasoning abilities.”