Daniel Milad, M.D. @DanielMiladMD - Twitter Profile

Daniel Milad, M.D. @DanielMiladMD

5 months ago

Take a look at our latest work on new LLMs in ophthalmology 👇 Thanks to @FaresAntaki for spearheading this project!

Ophthalmology

@AAOjournal

5 months ago

Performance of GPT-5 Frontier Models in Ophthalmology Question Answering https://t.co/JhbGy1HgUi @FaresAntaki @DavidMikhail01 @DanielMiladMD @SumitSharmaMD @pearsekeane, @YihTham @Renaudduval1 #ophthalmology

AAOjournal's tweet photo. Performance of GPT-5 Frontier Models in Ophthalmology Question Answering
https://t.co/JhbGy1HgUi
@FaresAntaki @DavidMikhail01 @DanielMiladMD @SumitSharmaMD @pearsekeane, @YihTham @Renaudduval1
#ophthalmology https://t.co/VjbbcUmQY9

0

6

3

0

723

0

2

0

53

Daniel Milad, M.D. @DanielMiladMD

9 months ago

Check out our new article in JAMA Ophthalmology! @DavidMikhail01 @AndrewMihalache @PopovicMM @Renaudduval1 👇

JAMA Ophthalmology @JAMAOphth

9 months ago

DeepSeek-R1 demonstrated superior performance compared to #OpenAI o1 in clinical diagnosis and management across subspecialties, while also reducing operating costs. https://t.co/FomIrl2uTw @DanielMiladMD @FaresAntaki @theMichaelBalas @pearsekeane

JAMAOphth's tweet photo. DeepSeek-R1 demonstrated superior performance compared to #OpenAI o1 in clinical diagnosis and management across subspecialties, while also reducing operating costs. https://t.co/FomIrl2uTw @DanielMiladMD @FaresAntaki @theMichaelBalas @pearsekeane https://t.co/KFBRa67Btc

0

12

4

0

1K

0

2

1

0

290

Daniel Milad, M.D. @DanielMiladMD

10 months ago

⬇️⬇️

Rohan Paul

@rohanpaul_ai

10 months ago

GPT-5 delivers near‑perfect ophthalmology answers, and the mini‑low mode gives the best accuracy per dollar. The study pits 12 GPT‑5 configurations against o1, o3, and GPT‑4o on 260 closed American Academy of Ophthalmology Basic and Clinical Science Course questions, then checks accuracy and explanation quality. Questions were answered with no examples in the prompt, and each reply had to be a single letter plus a 1‑sentence justification, so grading stayed strict and simple. GPT‑5 exposes a “reasoning effort” control, from low to high, that increases the model’s private thinking tokens before it speaks, the minimal setting underperformed and was dropped. Top result, GPT‑5‑high hit 96.5% accuracy, o3‑high scored 95.8%, o1‑high 92.7%, GPT‑4o 86.5%, while GPT‑5‑nano‑low trailed at 77.3%. Head‑to‑head strength was estimated with a Bradley‑Terry model, which turns pairwise wins into a single “skill” score, GPT‑5‑high was 1.66x stronger than o3‑high and 5.10x stronger than o1‑high on accuracy, and 1.11x stronger than o3‑high on rationale quality. Rationales were graded by an LLM judge that compared each 1‑sentence explanation to the official reference text and picked the closer one, which scales cleanly beyond small human panels. Cost mattered, plotting accuracy against mean cost per question showed a Pareto frontier from GPT‑5‑nano‑low to GPT‑5‑high, and GPT‑5‑mini‑low sat on that frontier as the best low‑cost high‑performance point, meaning nothing else was both cheaper and more accurate. Practical read, GPT‑5‑high fits settings where every point of accuracy matters, GPT‑5‑mini‑low fits budgeted scale, and GPT‑5‑medium tracks close to o3‑high on performance and cost. ---- Paper – arxiv. org/abs/2508.09956 Paper Title: "Performance of GPT-5 Frontier Models in Ophthalmology Question Answering"

rohanpaul_ai's tweet photo. GPT-5 delivers near‑perfect ophthalmology answers, and the mini‑low mode gives the best accuracy per dollar.

The study pits 12 GPT‑5 configurations against o1, o3, and GPT‑4o on 260 closed American Academy of Ophthalmology Basic and Clinical Science Course questions, then checks accuracy and explanation quality.

Questions were answered with no examples in the prompt, and each reply had to be a single letter plus a 1‑sentence justification, so grading stayed strict and simple.

GPT‑5 exposes a “reasoning effort” control, from low to high, that increases the model’s private thinking tokens before it speaks, the minimal setting underperformed and was dropped.

Top result, GPT‑5‑high hit 96.5% accuracy, o3‑high scored 95.8%, o1‑high 92.7%, GPT‑4o 86.5%, while GPT‑5‑nano‑low trailed at 77.3%.

Head‑to‑head strength was estimated with a Bradley‑Terry model, which turns pairwise wins into a single “skill” score, GPT‑5‑high was 1.66x stronger than o3‑high and 5.10x stronger than o1‑high on accuracy, and 1.11x stronger than o3‑high on rationale quality.

Rationales were graded by an LLM judge that compared each 1‑sentence explanation to the official reference text and picked the closer one, which scales cleanly beyond small human panels.

Cost mattered, plotting accuracy against mean cost per question showed a Pareto frontier from GPT‑5‑nano‑low to GPT‑5‑high, and GPT‑5‑mini‑low sat on that frontier as the best low‑cost high‑performance point, meaning nothing else was both cheaper and more accurate.

Practical read, GPT‑5‑high fits settings where every point of accuracy matters, GPT‑5‑mini‑low fits budgeted scale, and GPT‑5‑medium tracks close to o3‑high on performance and cost.

----

Paper – arxiv. org/abs/2508.09956

Paper Title: "Performance of GPT-5 Frontier Models in Ophthalmology Question Answering"

3

31

6

14

7K

0

1

0

57

Daniel Milad, M.D. @DanielMiladMD

10 months ago

Check it out!

elvis

@omarsar0

10 months ago

GPT-5 (with high reasoning effort) achieves near-perfect accuracy on a high-quality ophthalmology question-answering dataset. Based on these other reports, GPT-5 seems to be a very strong model at medical reasoning.

omarsar0's tweet photo. GPT-5 (with high reasoning effort) achieves near-perfect accuracy on a high-quality ophthalmology question-answering dataset.

Based on these other reports, GPT-5 seems to be a very strong model at medical reasoning. https://t.co/asKCWMBwVh

14

219

34

81

18K

0

2

0

38

Who to follow

Marko Popovic

@PopovicMM

Retina Specialist @unityhealthTO, Medical Retina Fellowship @ucla, Ophthalmology Residency @uoftpgme, MPH @HarvardChanSPH, MD @uoftmedicine, BHSc @MacHealthSci

Cybersight

@cybersight_org

Cybersight is a non-profit tech initiative founded by @OrbisIntl to help treat and prevent blindness in developing countries.

Fares Antaki

@FaresAntaki

Founder @simasurgery | Vitreoretinal Surgery Fellow @ClevelandClinic | Ex-AI Fellow @UCL @Moorfields | Ophthalmologist @CHUMontreal | MDCM @McGillU

DanielMiladMD retweeted

Fares Antaki

@FaresAntaki

10 months ago

🚨 Excited to share our new preprint benchmarking OpenAI’s GPT-5 series for ophthalmology question answering. Using the AAO BCSC dataset, we tested GPT-5 (including mini & nano) across four reasoning levels vs three older LLMs. GPT-5 with high reasoning scored an impressive 96.5%, ranking first in our LLM arena for both accuracy and justification quality. The most cost-efficient configuration was GPT-5-mini with low reasoning. We also introduce a scalable new method for evaluating long-form answers using LLM-as-a-judge autograding. 🔗 https://t.co/syLfmzHPFd @DanielMiladMD @SumitSharmaMD @pearsekeane @YihTham

FaresAntaki's tweet photo. 🚨 Excited to share our new preprint benchmarking OpenAI’s GPT-5 series for ophthalmology question answering.

Using the AAO BCSC dataset, we tested GPT-5 (including mini & nano) across four reasoning levels vs three older LLMs. GPT-5 with high reasoning scored an impressive 96.5%, ranking first in our LLM arena for both accuracy and justification quality. The most cost-efficient configuration was GPT-5-mini with low reasoning. We also introduce a scalable new method for evaluating long-form answers using LLM-as-a-judge autograding.

🔗 https://t.co/syLfmzHPFd

@DanielMiladMD @SumitSharmaMD @pearsekeane @YihTham

0

9

4

2

1K

Daniel Milad, M.D. @DanielMiladMD

about 1 year ago

Excited to share our systematic review + meta-analysis published in @AJOphthalmology evaluating AI models for epiretinal membrane (ERM) diagnosis. Check it out! https://t.co/Zu1YGmrCbq

DanielMiladMD's tweet photo. Excited to share our systematic review + meta-analysis published in @AJOphthalmology evaluating AI models for epiretinal membrane (ERM) diagnosis.

Check it out! https://t.co/Zu1YGmrCbq https://t.co/Htb2b2CrG9

0

4

2

0

206

Daniel Milad, M.D. @DanielMiladMD

over 1 year ago

🙏

Ophthalmology

@AAOjournal

over 1 year ago

Code-Free Deep Learning Glaucoma Detection On Color Fundus Images https://t.co/BpbFGShqSH @DanielMiladMD @Renaudduval1 @DavidMikhail01 @FaresAntaki @pearsekeane @thedurreffect #ophthalmology

AAOjournal's tweet photo. Code-Free Deep Learning Glaucoma Detection On Color Fundus Images
https://t.co/BpbFGShqSH
@DanielMiladMD @Renaudduval1 @DavidMikhail01 @FaresAntaki @pearsekeane @thedurreffect
#ophthalmology https://t.co/rTdUukHTV9

0

23

8

4

2K

1

2

0

107

DanielMiladMD retweeted

Ophthalmology

@AAOjournal

over 1 year ago

The Role of Artificial Intelligence in Epiretinal Membrane Care: A Scoping Review https://t.co/X0AxVdrUAI @DavidMikhail01 @DanielMiladMD @FaresAntaki @qiancynthia @Renaudduval1

AAOjournal's tweet photo. The Role of Artificial Intelligence in Epiretinal Membrane Care: A Scoping Review
https://t.co/X0AxVdrUAI
@DavidMikhail01 @DanielMiladMD @FaresAntaki @qiancynthia @Renaudduval1 https://t.co/XW19Yq30Sz

1

17

6

2

2K

Daniel Milad, M.D. @DanielMiladMD

over 2 years ago

@FaresAntaki @dominicwllmsn Very nice work, congratulations!

0

1

0

55

Daniel Milad, M.D. @DanielMiladMD

over 2 years ago

7/7 💡 Conclusion: #LLMs may present a promising tool for medical decision-making in the future, helping guide clinicians in complex cases. #FutureOfWork #Innovation

0

67

Daniel Milad, M.D. @DanielMiladMD

over 2 years ago

1/7 We pushed #GPT-4 to its limits by assessing its medical reasoning skills in complex #Ophthalmology cases. Take a look at what we found 👇 @BMJ_Ophth @FaresAntaki @ciusss_estmtl @chumontreal https://t.co/gxUhPmiKiN

DanielMiladMD's tweet photo. 1/7 We pushed #GPT-4 to its limits by assessing its medical reasoning skills in complex #Ophthalmology cases. Take a look at what we found 👇

@BMJ_Ophth @FaresAntaki @ciusss_estmtl @chumontreal

https://t.co/gxUhPmiKiN https://t.co/wRPEbAfSiF

1

3

0

275

Daniel Milad, M.D. @DanielMiladMD

over 2 years ago

6/7 🆚 Interestingly, comparing GPT-4 to human experts revealed no significant difference in decision-making skills, though senior residents excelled in accuracy. #HumanVsAI

DanielMiladMD's tweet photo. 6/7 🆚 Interestingly, comparing GPT-4 to human experts revealed no significant difference in decision-making skills, though senior residents excelled in accuracy. #HumanVsAI https://t.co/5BhIufUMWA

1

0

80

Daniel Milad, M.D. @DanielMiladMD

over 2 years ago

Take a look at our latest project, RetinaVR — an immersive retinal surgery experience @MetaQuestVR @Apple @RtoVR @UploadVR @MetaQuestGaming

Fares Antaki

@FaresAntaki

over 2 years ago

🚀 Apple Vision Pro is here! A VR revolution is coming. 1/ Introducing RetinaVR - our standalone simulator for vitreoretinal training. Immersive, affordable, portable. A step towards democratising surgical education globally. Details of our pilot work: https://t.co/zr0XFXPdWm