Leon Qi @dmon2048 - Twitter Profile

dmon2048 retweeted

8 days ago

Huge congrats to @Humana's Erius agent taking the #1 spot on CHI-Bench for Prior Auth and 6th for all domains. It outperforms every frontier lab on one of healthcare's hardest workflows.

iscreamnearby's tweet photo. Huge congrats to @Humana's Erius agent taking the #1 spot on CHI-Bench for Prior Auth and 6th for all domains. It outperforms every frontier lab on one of healthcare's hardest workflows. https://t.co/GeBYvlG55w

2

9

5

7

699

Leon Qi @dmon2048

14 days ago

Opus 4.8 is in the lead @claudeai @ClaudeDevs

actAVA AI

@actAVAai

14 days ago

CHI-Bench leaderboard just gets updated with the newest and highest score from @claudeai Opus 4.8. CHI-Bench is world's first long-horizon benchmark for healthcare AI agents. Leaderboard: https://t.co/wjd9wK44eU

0

4

0

270

0

1

0

15

dmon2048 retweeted

actAVA AI

@actAVAai

17 days ago

CHI-Bench is the world's 1st long-horizon healthcare benchmark for AI agents. If you're building or buying AI for healthcare, this is the test that actually matters — real clinical workflows, not toy demos. U.S. healthcare needs this. 🏥🔬

0

3

1

178

dmon2048 retweeted

actAVA AI

@actAVAai

17 days ago

actAVA AI integrates CHI-Bench with @huggingface and @harborframework today. Users can run the CHI-Bench evaluation and RL training from both platforms.

0

4

1

0

214

Leon Qi @dmon2048

18 days ago

Check out our dataset on Hugging Face.

Frank Wang

@FWang9959

19 days ago

Great news. Today we ranked #6 most popular dataset on Hugging Face! Wow! 😊 https://t.co/8r9zg2aLhX

1

4

1

381

0

1

0

33

Leon Qi @dmon2048

21 days ago

Awesome!

Frank Wang

@FWang9959

21 days ago

🚨 Historic moment for @actAVAai ! 📷Just one day after launch, our benchmark dataset is already #10 most popular on Hugging Face — out of 1 million+ datasets! Huge thanks to @iscreamnearby , @HaolinChen11 , Deon Metelski, Leon Qi, Tao Xia, Joon Lee, Steve Brown, Kevin Riley, T. Y. Alvin Liu, M.D., Zhiwei Liu, Qingsong Wen, @CaimingXiong , Sanmi Koyejo, Eric Xing & all our collaborators. 📷📷

FWang9959's tweet photo. 🚨 Historic moment for @actAVAai ! 📷Just one day after launch, our benchmark dataset is already #10 most popular on Hugging Face — out of 1 million+ datasets! Huge thanks to @iscreamnearby , @HaolinChen11 , Deon Metelski, Leon Qi, Tao Xia, Joon Lee, Steve Brown, Kevin Riley, T. Y. Alvin Liu, M.D., Zhiwei Liu, Qingsong Wen, @CaimingXiong , Sanmi Koyejo, Eric Xing & all our collaborators. 📷📷

2

5

2

1

218

0

21

dmon2048 retweeted

The Agent Times

@TheAgentTimes

23 days ago

A new 33-author benchmark called CHI-Bench finds that the best AI agent configuration resolves only 28% of realistic healthcare administration tasks, dropping to 3.8% in continuous-session testing.

TheAgentTimes's tweet photo. A new 33-author benchmark called CHI-Bench finds that the best AI agent configuration resolves only 28% of realistic healthcare administration tasks, dropping to 3.8% in continuous-session testing. https://t.co/BaNiPEvSXu

1

4

3

0

151

Leon Qi @dmon2048

23 days ago

Great work! @HaolinChen11 @iscreamnearby

Haolin Chen

@HaolinChen11

23 days ago

(1/n) After a few months of work with multiple hospitals, universities and research facilities, today we're open-sourcing CHI-Bench: the first long-horizon benchmark for healthcare AI agents on real clinical and healthcare workflows. Best frontier agent overall: 28% pass@1. End-to-end prior authorization: 0%. A thread on what we found 🧵

HaolinChen11's tweet photo. (1/n) After a few months of work with multiple hospitals, universities and research facilities, today we're open-sourcing CHI-Bench: the first long-horizon benchmark for healthcare AI agents on real clinical and healthcare workflows.

Best frontier agent overall: 28% pass@1.
End-to-end prior authorization: 0%.

A thread on what we found 🧵

10

15

7

1

574

0

2

0

29

Leon Qi @dmon2048

23 days ago

@HopkinsMedicine @AlvinLiu_MD @WellstarHealth @YaleMed @StanfordAILab @zeyu1tang @sanmikoyejo @mbzuai @XiangchenSong @LingjingKong @kunkzhang @ericxing @ffeng01 @huang_biwei @SFResearch @JimZhiwei @zixianma02 @hjian42 8/ …and more 🔬 Brown: @FangliGeng Boston College: @YuanYuan_MIT Stony Brook: @Charlesyooo1 Oxford: @qingsongedu ASU: @realhuawei, Yanjie Fu USC: Yue Zhao Emory: @yangji9181 @Recursive_SI: @CaimingXiong UIC: Philip S. Yu

0

2

1

0

722

Leon Qi @dmon2048

23 days ago

1/ Introducing CHI-Bench 🧵 Can AI agents automate U.S. healthcare workflows end to end — given only clinician & insurer apps, operations, and a medical policy library? 75 long-horizon workflows × 30 frontier agents. Best agent solves just 28%. #AIinHealthcare 👇

5

8

3

1

255

Leon Qi @dmon2048

23 days ago

@HopkinsMedicine @AlvinLiu_MD @WellstarHealth @YaleMed 7/ …and university & industry AI research labs 🔬 @StanfordAILab @zeyu1tang, @sanmikoyejo CMU & @mbzuai: @XiangchenSong, @LingjingKong, @kunkzhang, @ericxing UCSD: @ffeng01, @huang_biwei @SFResearch: @JimZhiwei UW: @zixianma02 Northeastern: @hjian42

1

0

220

Leon Qi @dmon2048

23 days ago

Proud to have helped build CHI-Bench 🧵 Can frontier agents run U.S. healthcare workflows end to end? 75 long-horizon tasks, 30 agents — best solves just 28%. We're early, and now we can measure it. Fully open 👇

Weiran Yao

@iscreamnearby

23 days ago

1/🧵Can AI agents automate U.S. healthcare workflows end to end given just clinician & insurer apps and operations, medical policy library? Introducing CHI-Bench: 75 long-horizon realistic healthcare workflows × 30 frontier agents. Best agent solves only 28% #AIinHealthcare 👇

iscreamnearby's tweet photo. 1/🧵Can AI agents automate U.S. healthcare workflows end to end given just clinician & insurer apps and operations, medical policy library? Introducing CHI-Bench: 75 long-horizon realistic healthcare workflows × 30 frontier agents. Best agent solves only 28% #AIinHealthcare 👇 https://t.co/YoEtfHlVbu

12

45

24

25

64K

0

3

1

180

Leon Qi @dmon2048

23 days ago

@CaimingXiong @CaimingXiong thanks for your collaboration.

0

2

0

40

Leon Qi @dmon2048

23 days ago

@iscreamnearby Remarkable results! It's a game changer on integrating with AI in health care.

1

3

0

94

Leon Qi

@dmon2048

Last Seen Users on Sotwe

Trends for you

Most Popular Users