Show Codex a workflow once. Reuse it as a skill.
Record & Replay lets you show Codex a recurring task, like filing an expense report or submitting a time-off request.
Codex turns that demo into an inspectable, editable skill.
You control when recording starts and stops.
Between rooms. On rounds. Walking the corridor outside an OR. Charting one-handed during a phone call. This is where clinical questions happen.
Today we're launching Voice Mode. OpenEvidence is the first multimodal medical AI: physicians can type, speak, or listen, on the same evidence base.
The clinician asks a clinical question out loud. Voice Mode waits when you pause, stops when you interrupt. The answer comes back concise, peer-reviewed, and verifiable against the source.
Conversation with a colleague. That was the bar.
For years we've focused on the intelligence: curation, retrieval, citations. Voice Mode is the interface catching up to where physicians practice.
The evidence quality doesn't change with the modality. Voice answers are shorter and shaped for listening; the references and the full written form stay in the conversation.
Voice Mode is now in OpenEvidence web and mobile.
Until now, physicians using AI in clinic had to assemble the patient’s context themselves. Allergies, comorbidities, medications, prior procedures, copy-pasted in from the chart.
Today we’re announcing a partnership with @CedarsSinai. OpenEvidence now works directly inside Epic, drawing on the patient’s full record and interpreting the medical literature through the lens of that specific patient.
Cedars-Sinai is the first academic health system to deploy patient-aware clinical intelligence at enterprise scale. The clinician asks a complex question in natural language. The answer reflects both the best available evidence and the patient in front of them.
Patient data is never stored after the clinical session or used for any other purpose.
We're releasing Medmarks v0.1, the largest completely open-source automated evaluation suite for assessing the medical capabilities of LLMs!
Developed in our @MedARC_AI community, w/ support from @PrimeIntellect
So far we’ve explored 46 models to figure out the best!
We now have causally-informative data on the effects of AI scribe adoption on how doctors spend their time.
They work well!
AI scribes help doctors to spend less time in EHRs and filling out documentation, allowing them to spend more time with patients.
First day using AI-powered smart glasses in clinic.
Real-time EHR. No turning to the screen. Just eye contact and conversation. All the data I need, when I need it, dynamically served up and projected into the room. Even differential diagnosis!
Early… but unbelievably good 👇
This may be the most controversial thing I’ve posted. But I think it needs to be said.
We are having an urgent conversation about slowing AI in medicine. Instituting rigorous safety measures. Thorough vetting before deployment. Many respected voices are calling for caution, and their instincts are grounded in a tradition of patient safety that I deeply respect.
But I want to pose a question that I haven’t seen anyone ask.
What is the cost of slowing down?
Not the cost to technology companies. The cost to patients.
We talk about AI safety as if the alternative is a well-functioning system. It isn’t. The current system produces error rates that have barely improved in decades. M&M cases that rarely lead to broad physician education. Community hospital physicians reliant on self-education of highly variable quality. Emergency physicians who never receive feedback on whether their practice patterns are calibrated — whether they order too many CTs or too few, admit too aggressively or too conservatively. Practice patterns that drift with fatigue across a single shift and across an entire career. These aren’t hypothetical harms. They’re the measured, documented, persistent background rate of medical error that we have normalized.
Half of chronic disease medications aren’t taken correctly. Twenty percent of prescriptions are never filled. Up to half of adverse drug reactions are preventable. Thirty to eighty percent of hypertension patients discontinue treatment within the first year.
This is the baseline. This is what we’re protecting when we slow AI deployment.
Calculus measures continuous change. If we modeled the rate of improvement in patient care as a function over time, slowing AI adoption doesn’t just delay improvement by a fixed amount. It changes the integral. The cumulative patient harm prevented shrinks. Every month of delayed deployment represents ongoing harm from errors that a more capable system could have caught.
We aren’t comparing AI with safety checks versus AI without safety checks. We’re comparing AI deployed in 2028 after rigorous vetting versus the current system continuing to produce the same error rates it has produced since 2000. The question is whether the cumulative harm from that delay exceeds the harm AI might introduce.
I am not arguing against safety measures. I’m arguing that the cost of delay must be measured against a baseline that is far worse than most people acknowledge. We apply compassionate use and emergency authorization frameworks to drugs when the background mortality rate justifies accelerated deployment. We should at least ask whether AI in medicine has reached that threshold.
The instinct to slow down feels responsible. But if slowing down means patients continue dying from errors that AI could prevent — errors we’ve failed to fix for decades through every other means — then the calculus of caution isn’t as simple as it appears.
Sometimes the most dangerous thing you can do is nothing.
Can someone make an AI tool that takes all medical records in EMR and outside forms a concise note that meets documentation/billing standards and incorporates OpenEvidence AI to formulate plan and discussion
Once this tool is made my job just becomes communicating with patients.
for those who have never worked in healthcare, I made a simulation to get a sense of how difficult it is to do something even very simple like order Tylenol.
WARNING: this may be infuriating to some providers
GLP 1 medicines are expanding beyond diabetes into obesity, cardiovascular, liver, and inflammatory disease care.
Prevention still matters most, and AI will help identify risk earlier and personalize treatment decisions.
https://t.co/scPptqVENw
#MedTwitter#Cardiology #AIinHealthcare #GLP1