Introducing CHI-Bench on @huggingface: the world’s first long-horizon healthcare benchmark for AI agents.
75 real healthcare workflows + 20 apps + 200+ MCP tools + 1,290 skills + process / outcome rewards
https://t.co/PKmQ4RiIJY
Any questions, lmk!
We are building Aether AI. #AetherAI
Scaling has made AI powerful. But scaling pattern recognition alone will not deliver real-world intelligence.
The next paradigm requires causal world models and causal agentic systems — systems that uncover mechanisms, reason about interventions, and improve through the consequences of their own actions.
Our first proving ground is Physical AI.
#Causality #AI
In real healthcare operations, agents must do far more than answer medical questions. They need to read charts, interpret clinical and operational policies, verify coverage, route referrals, draft P2P scripts, and finalize care plans — where a single policy violation can mean a denied claim or missed patient outcome.
@actAVAai@iscreamnearby led and developed CHI-Bench (Clinical Healthcare In-situ Benchmark), the first long-horizon, policy-rich benchmark for AI agents operating across end-to-end U.S. healthcare workflows.
Key highlights:
▶️ High-fidelity simulators for Provider Prior Authorization, Payer Utilization Management, and Population Health Care Management, all exposed as MCP servers over patient, clinician, and insurer records.
🧪 Each trial runs 60–80 agent steps across 4–6 clinical stages, with access to 21 healthcare apps, 200+ MCP tools, and a 1,279-document operations handbook.
Leaderboard results across 30 frontier agents:
• Claude Code + Opus 4.6: 28% pass@1
• Codex + GPT-5.5: 21%
• Utilization review: 41%
• Care management: 32%
• Prior authorization: 29%
Reliability remains a major challenge: no agent exceeds 20% when the same case is repeated three times.
1/🧵Can AI agents automate U.S. healthcare workflows end to end given just clinician & insurer apps and operations, medical policy library? Introducing CHI-Bench: 75 long-horizon realistic healthcare workflows × 30 frontier agents. Best agent solves only 28% #AIinHealthcare 👇
Stop restarting your long-running agents.
Enterprise Deep Research (EDR) lets you steer mid-run—like driving a car.
It can save you hours or even days of work. Open-source, enterprise-ready, built by @SFResearch.
Try it & drop your use case below 👇
��GitHub: https://t.co/5elr3XBrCG
MBZUAI Machine Learning Winter School 2026: Representation Learning & GenAI (https://t.co/voU5FqSZE3)
on Feb. 9-13, 2026, in Abu Dhabi, UAE.
Application Deadline: Oct. 20, 2025!
Join us for an exciting 5-day program with world-class researchers! Funding available! #MBZUAI
🧵 Your SAE learns different features each time? Struggling to convince people to trust your interpretations? Maybe you're only one architecture choice away from a solution.
We formulate this as a Feature Consistency problem and show that high consistency is achievable!
We present 🧩Retroformer🧩, iteratively improving LLM agents by learning a plug-in retrospective model, that through the process of policy gradient optimization, automatically refines the prompts with env-specific rewards.
arXiv: https://t.co/zITi65Z14q
#LanguageAgents#LLM
Registration deadline of #UAI2023 (39th Conf. on Uncertainty in #Artificialintelligence) is July 24! It will take place @CarnegieMellon, Pittsburgh from 07/31-08/04. Check out the beautiful @PhippsNews for the banquet: https://t.co/otjYYqwWWZ
Four days left for early registration for #UAI2023: https://t.co/qye8lWgtCO #UAI2023. UAI 2023 will take place at Carnegie Mellon University, Pittsburgh, PA, USA, Jul 31-Aug 4, with banquet @PhippsNews Phipps Conservatory and Botanical Gardens!
We are organizing a @UncertaintyInAI workshop on the #History and #Development of Search Methods for #CausalStructure. Welcome submissions of "Case Studies of Applied Causal Discovery", either successful or not. For details see https://t.co/c4konY5tJJ
Registration for UAI 2023 is now open! https://t.co/W1qXhYuAF1 #UAI23@UAI2023 will take place at Carnegie Mellon University, Pittsburgh, PA, USA Jul 31-Aug 4, with banquet @PhippsNews Phipps Conservatory and Botanical Gardens!
Early bird deadline is June 22. See you there!
UAI 2023 looks forward to seeing you at Carnegie Mellon University from July 31 to Aug. 4, 2023. Thanks to our local team and CMU for making things happen!
The CLeaR society is delighted to announce that we are organizing the 2023 edition of CLeaR in Tubingen, Germany. The submission deadline will be around mid-October. Details will be released shortly. Please stay tuned!
We've just released Betty, a PyTorch library for generalized meta-learning (GML) and multilevel optimization (MLO)!
Betty gives a unified programming interface for applications including HPO, NAS, MAML, RL, and more.
Code: https://t.co/LTU20uDFOR
Paper: https://t.co/IwIZP1OInI
We are happy to announce that the UAI 2022 program committee is carefully reviewing the 730 submissions to the conference! We are looking forward to seeing you in Eindhoven, The Netherlands on August 1-5, 2022!
Excited to serve as a workflow chair for UAI 2022 with Petar Stojanov. Paper submission deadline is February 25, 2022 (23:59 UTC). #UAI2022@UncertaintyInAI
https://t.co/ldtPGXomkH
We are excited to release the Python causal-learn package for causal discovery! See the package (https://t.co/6fVGoEAkqV) and documentation (https://t.co/HA5r0Tf8Rg). Any feedback is welcome.
We are excited to release the Python causal-learn package for causal discovery! See the package (https://t.co/D0YK6ZqMjs) and documentation (https://t.co/kA2bwYtU1l). Any feedback is welcome.