actAVA AI

about 3 hours ago

ChatGPT: https://t.co/8blUnIjy5x

0

14

about 3 hours ago

If you turn on Web Search in ChatGPT and ask it find healthcare agent benchmarks for you, CHI-Bench is the top recommended, strongest match for testing agent performances on realistic healthcare operations

actAVAai's tweet photo. If you turn on Web Search in ChatGPT and ask it find healthcare agent benchmarks for you, CHI-Bench is the top recommended, strongest match for testing agent performances on realistic healthcare operations https://t.co/4t2LqBcQ6p

1

2

0

1

221

4 days ago

𝗧𝗵𝗲 𝗾𝘂𝗲𝘀𝘁𝗶𝗼𝗻 𝗵𝗲𝗮𝗹𝘁𝗵𝗰𝗮𝗿𝗲 𝗔𝗜 𝘁𝗲𝗮𝗺𝘀 𝗮𝗿𝗲 𝗮𝘀𝗸𝗶𝗻𝗴 𝗶𝗻 𝟮𝟬𝟮𝟲 𝗶𝘀𝗻'𝘁 "𝗖𝗮𝗻 𝘄𝗲 𝗯𝘂𝗶𝗹𝗱 𝗮𝗻 𝗮𝗴𝗲𝗻𝘁?" 𝗜𝘁'𝘀 "𝗛𝗼𝘄 𝗱𝗼 𝘄𝗲 𝗸𝗻𝗼𝘄 𝗶𝘁'𝘀 𝘀𝗮𝗳𝗲 𝗳𝗼𝗿 𝗽𝗮𝘁𝗶𝗲𝗻𝘁𝘀?" Too many agents are deployed before they’re ready, leading to incidents that set automation back by years. Here is the framework for true operational readiness: • 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀 𝘃𝘀. 𝗥𝗲𝗮𝗹𝗶𝘁𝘆: A 90% accuracy score on a clinical Q&A benchmark does not predict success with complex, multi-step transactions such as prior authorization. • 𝗜𝗱𝗲𝗻𝘁𝗶𝗳𝘆 𝗙𝗮𝗶𝗹𝘂𝗿𝗲 𝗠𝗼𝗱𝗲𝘀: You must evaluate agents against systemic risks, including reasoning vulnerabilities, tool integration latencies, and escalation anomalies. • 𝗦𝗶𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻 𝗼𝘃𝗲𝗿 𝗗𝗲𝗺𝗼𝘀: Stop relying on "happy path" vendor demos. Implement rigorous workflow simulations that test administrative edge cases, state persistence, and live integrations. • 𝗖𝗼𝗻𝘁𝗶𝗻𝘂𝗼𝘂𝘀 𝗚𝗼𝘃𝗲𝗿𝗻𝗮𝗻𝗰𝗲: Deployment is only the beginning. You must continuously monitor for environmental degradation, including regulatory shifts, model drift, and shifting patient cohort distributions. The "deploy-and-observe" era must end. Healthcare enterprises require a verifiable, reproducible foundation for risk mitigation to ensure compliance and patient safety. Frameworks like actAVA KORA and χ-BENCH provide the structure necessary to move from perpetual pilot to production-ready automation. Read more about this important topic from https://t.co/ckqkBuF9Sy Chief AI Officer, Dr. Weiran Yao, here --> https://t.co/usBEnnqpPS

2

3

1

0

202

6 days ago

ISend your resume directly to 👉 : [email protected]

7 days ago

We are hiring Solution Engineer, Account Executive, SVP of Sales, SVP of Strategy for the founding team of @actAVAai. Join us to shape the future of healthcare AI factory, agents platform and foundation models. Send your resume directly to: [email protected]

1

9

2

0

445

0

1

0

66

7 days ago

We welcome Tom Patterson to the actAVA AI advisory board. As the SVP of Corporate Development and Strategy at BetterUp and a 3-time founder, Tom brings deep operational expertise in aligning enterprise technology with human performance and workforce resilience. In this role, Tom advises us on integrating human-in-the-loop safeguards into the agent lifecycle. His focus centers on critical governance infrastructure, including: - Privacy Compliance: Structuring models to strictly adhere to enterprise privacy regulations. - User Agency: Implementing clear design frameworks that distinguish AI-generated advice from human guidance. - Workforce Stability: Deploying systems capable of monitoring operational distress cues to support frontline teams. Tom’s first question to us was “How do you scale production-ready agentic AI without introducing workforce instability or employee anxiety?” From Tom’s point of view, technology can only move from pilot to production when built with rigid structural safeguards that protect the human element. Building an unshakeable partnership between silicon and carbon requires evidence-led governance, not open-ended promises. We agree! We are grateful for Tom’s partnership as we continue to build the infrastructure for responsible, governed, and measurable agentic AI at enterprise scale. Read our full conversation with Tom on the structural safeguards required for enterprise growth: https://t.co/7ZomT13vg5 The AI factory for healthcare. Master your agentic future. #actAVA #AIGovernance #AgentLifecycle

0

1

0

43

actAVAai retweeted

7 days ago

99% of AI researchers don’t know. Your HuggingFace datasets can be selected into benchmark shortlist if it tests on open source models. We just made CHI-Bench into the shortlist with other popular benchmarks like GSM8k, SWE-Bench🤗 RT or reply👇if you want to know how!

iscreamnearby's tweet photo. 99% of AI researchers don’t know. Your HuggingFace datasets can be selected into benchmark shortlist if it tests on open source models. We just made CHI-Bench into the shortlist with other popular benchmarks like GSM8k, SWE-Bench🤗 RT or reply👇if you want to know how! https://t.co/RnMGiNZzV2

1

5

1

3

369

actAVAai retweeted

8 days ago

Huge congrats to @Humana's Erius agent taking the #1 spot on CHI-Bench for Prior Auth and 6th for all domains. It outperforms every frontier lab on one of healthcare's hardest workflows.

iscreamnearby's tweet photo. Huge congrats to @Humana's Erius agent taking the #1 spot on CHI-Bench for Prior Auth and 6th for all domains. It outperforms every frontier lab on one of healthcare's hardest workflows. https://t.co/GeBYvlG55w

2

9

5

7

699

8 days ago

@iscreamnearby @Humana Congratulations 🎉

1

0

91

14 days ago

CHI-Bench leaderboard just gets updated with the newest and highest score from @claudeai Opus 4.8. CHI-Bench is world's first long-horizon benchmark for healthcare AI agents. Leaderboard: https://t.co/wjd9wK44eU

14 days ago

🚀 @claudeai Opus 4.8 just took #1 on CHI-Bench (long-horizon healthcare agents). 75 real workflows across prior auth, utilization & care management. Opus 4.8 → 33.3% (PA 32 · UM 28 · CM 40) Opus 4.6 → 28.0% (PA 18 · UM 41 · CM 24) Opus 4.7 → 24.4% (PA 24 · UM 17 · CM 32)

iscreamnearby's tweet photo. 🚀 @claudeai Opus 4.8 just took #1 on CHI-Bench (long-horizon healthcare agents).

75 real workflows across prior auth, utilization & care management.

Opus 4.8 → 33.3% (PA 32 · UM 28 · CM 40)
Opus 4.6 → 28.0% (PA 18 · UM 41 · CM 24)
Opus 4.7 → 24.4% (PA 24 · UM 17 · CM 32) https://t.co/wugFZFzNPD

1

28

3

5

117K

0

4

0

270

actAVAai retweeted

16 days ago

Agents like Claude Code, Codex can sustain hundreds of tool calls and ace coding. So we had 30 of them run a real prior-auth, utilization management and care management case end to end. Best agent still fails 72% of U.S. healthcare workflows This is CHI (χ)-Bench. 🧵

iscreamnearby's tweet photo. Agents like Claude Code, Codex can sustain hundreds of tool calls and ace coding. So we had 30 of them run a real prior-auth, utilization management and care management case end to end. Best agent still fails 72% of U.S. healthcare workflows

This is CHI (χ)-Bench. 🧵 https://t.co/sRydsTWW3s

2

19

3

12

2K

16 days ago

https://t.co/s2PlgWlUkE launches CHI-Bench. CHI-Bench is the 1st long-horizon healthcare benchmark for AI agents. If you're building or buying AI for healthcare, this is the test that actually matters — real clinical workflows, not toy demos. U.S. healthcare 🏥 needs this

16 days ago

Agents like Claude Code, Codex can sustain hundreds of tool calls and ace coding. So we had 30 of them run a real prior-auth, utilization management and care management case end to end. Best agent still fails 72% of U.S. healthcare workflows This is CHI (χ)-Bench. 🧵

2

19

3

12

2K

0

3

0

1

187

actAVAai retweeted

17 days ago

Introducing CHI-Bench on @huggingface: the world’s first long-horizon healthcare benchmark for AI agents. 75 real healthcare workflows + 20 apps + 200+ MCP tools + 1,290 skills + process / outcome rewards https://t.co/PKmQ4RiIJY Any questions, lmk!

8

145

25

130

32K

actAVAai retweeted

ModelScope

@ModelScope2022

17 days ago

The best AI agent (Claude Code + Claude Opus 4.6) passes only 28% of real healthcare workflow tasks. CHI-Bench by @actAVAai @iscreamnearby @HaolinChen11, built with Johns Hopkins, Yale, Stanford, CMU, Oxford and 20+ institutions, was designed to find out exactly how far we are. 🏥 Try it yourself 👉 https://t.co/tRwunIHBbd Three long-horizon domains tested: 🏥 Prior Authorization: provider intake and PA preparation for new referrals 📋 Utilization Management: full payer review cycle from intake to peer-to-peer 👥 Care Management: chronic disease follow-up, outreach, assessment, care planning 75 tasks + 3 marathon tasks + 23 end-to-end dual-agent scenarios. 20 medical apps via MCP, 1,279-document handbook. 💻 Git: https://t.co/HuAqGSTaah 🔗 Leaderboard: https://t.co/Z0bFHXUU7H

ModelScope2022's tweet photo. The best AI agent (Claude Code + Claude Opus 4.6) passes only 28% of real healthcare workflow tasks. CHI-Bench by @actAVAai @iscreamnearby @HaolinChen11, built with Johns Hopkins, Yale, Stanford, CMU, Oxford and 20+ institutions, was designed to find out exactly how far we are. 🏥 Try it yourself 👉 https://t.co/tRwunIHBbd

Three long-horizon domains tested:
🏥 Prior Authorization: provider intake and PA preparation for new referrals
📋 Utilization Management: full payer review cycle from intake to peer-to-peer
👥 Care Management: chronic disease follow-up, outreach, assessment, care planning

75 tasks + 3 marathon tasks + 23 end-to-end dual-agent scenarios. 20 medical apps via MCP, 1,279-document handbook.

💻 Git: https://t.co/HuAqGSTaah
🔗 Leaderboard: https://t.co/Z0bFHXUU7H

7

35

6

32

5K

17 days ago

CHI-Bench is the world's 1st long-horizon healthcare benchmark for AI agents. If you're building or buying AI for healthcare, this is the test that actually matters — real clinical workflows, not toy demos. U.S. healthcare needs this. 🏥🔬

ModelScope

@ModelScope2022

17 days ago

The best AI agent (Claude Code + Claude Opus 4.6) passes only 28% of real healthcare workflow tasks. CHI-Bench by @actAVAai @iscreamnearby @HaolinChen11, built with Johns Hopkins, Yale, Stanford, CMU, Oxford and 20+ institutions, was designed to find out exactly how far we are. 🏥 Try it yourself 👉 https://t.co/tRwunIHBbd Three long-horizon domains tested: 🏥 Prior Authorization: provider intake and PA preparation for new referrals 📋 Utilization Management: full payer review cycle from intake to peer-to-peer 👥 Care Management: chronic disease follow-up, outreach, assessment, care planning 75 tasks + 3 marathon tasks + 23 end-to-end dual-agent scenarios. 20 medical apps via MCP, 1,279-document handbook. 💻 Git: https://t.co/HuAqGSTaah 🔗 Leaderboard: https://t.co/Z0bFHXUU7H

7

35

6

32

5K

0

3

1

178

actAVAai retweeted

Harbor Framework

@harborframework

21 days ago

healthcare benchmark, built on harbor!

2

14

2

2K

actAVAai retweeted

Frank Wang

@FWang9959

21 days ago

🚨 Historic moment for @actAVAai ! 📷Just one day after launch, our benchmark dataset is already #10 most popular on Hugging Face — out of 1 million+ datasets! Huge thanks to @iscreamnearby , @HaolinChen11 , Deon Metelski, Leon Qi, Tao Xia, Joon Lee, Steve Brown, Kevin Riley, T. Y. Alvin Liu, M.D., Zhiwei Liu, Qingsong Wen, @CaimingXiong , Sanmi Koyejo, Eric Xing & all our collaborators. 📷📷

FWang9959's tweet photo. 🚨 Historic moment for @actAVAai ! 📷Just one day after launch, our benchmark dataset is already #10 most popular on Hugging Face — out of 1 million+ datasets! Huge thanks to @iscreamnearby , @HaolinChen11 , Deon Metelski, Leon Qi, Tao Xia, Joon Lee, Steve Brown, Kevin Riley, T. Y. Alvin Liu, M.D., Zhiwei Liu, Qingsong Wen, @CaimingXiong , Sanmi Koyejo, Eric Xing & all our collaborators. 📷📷

2

5

2

1

218

17 days ago

actAVA AI integrates CHI-Bench with @huggingface and @harborframework today. Users can run the CHI-Bench evaluation and RL training from both platforms.