A multimodal AI agentic model that integrates electronic medical records, lifestyle, layers of biologic omics data to predict health outcomes and, with perturbations, "what if" scenarios a person improved lifestyle or took a medication
@Cell_Metabolism
https://t.co/qgVomjkipS
UK Biobank accelerometry in “non-exercisers”: just ~3.4 min/day of ≤1-min vigorous bursts was associated with lower incident cancer (total HR 0.83; PA-related HR 0.72). 
Still observational and vulnerable to healthy-user bias and reverse causation. But the clinical unit is actionable: prescribe intensity in micro-bouts (stairs, fast carries, uphill walk), not “exercise,” for patients who will not train. 
Yes. PT/INR is often the first lab to look abnormal because factor VII has the shortest half life, but 1.2 to 1.4 alone rarely means clinically meaningful coagulopathy. In inpatients it is usually a marker of nutrition, antibiotics, liver stress, or acute illness. The number gets over-read.
This is exactly why consumer autonomous triage is the wrong deployment target.
High-risk edge cases are where safety matters most, and they are also where general-purpose systems drift, anchor, and fail inconsistently.
The better model is clinician-facing decision support: source-grounded retrieval, explicit citations, transparent uncertainty, and a physician in the loop.
That is the lane we’re building in with Astra.
@JAMA_current AI may improve efficiency, but medicine is not only pattern recognition and documentation. Patients still need judgment, context, accountability, and a physician willing to bear uncertainty with them. The tools will change. The obligation does not.
@JAMA_current Sepsis is many failures wearing one name. This trial treated the biology, not the label. SOFA improved by day 9 in 35.1% vs 17.9%. Mortality did not move. Not a cure. Not a victory. But the direction is right: phenotype first, then treat.
The interesting part isn’t the word quantum, it’s the prospect of genetically encoded intracellular sensors that can pull signal out of noisy tissue. If this works at physiologic conditions, imaging could move from static structure toward functional microenvironment mapping. Very early, but conceptually important.
@NEJM Prurigo nodularis secondary to chronic excoriation would be on my differential, but given the 1-year history of a crawling sensation (formication) localized to the chin with this well-demarcated, ulcerated nodule — this is classic for trigeminal trophic syndrome (TTS).
DSM treats diagnoses as separate boxes. Genomics keeps insisting they share wiring.
PGC cross-disorder GWAS (1,056,201 cases; 14 disorders) finds 5 latent genomic factors explaining ~66% of genetic variance, tied to 238 pleiotropic loci.
Tightest clusters: SCZ+BIP; MD+PTSD+anxiety.
Signals diverge: excitatory-neuron enrichment for SCZ/BIP vs oligodendrocyte/glial biology for internalizing.
Limitation: common variants, mostly EUR ancestry, not a bedside classifier.
Principle: comorbidity is often biology. Think dimensions, not silos.
DANFLU-2 is a good reminder that pragmatic effectiveness can look nothing like immunogenicity.
In 332k adults ≥65, high-dose vs standard-dose had 0.68% vs 0.73% hospitalization for influenza or pneumonia (rVE 5.9%, CI crosses 0), so no clean win on the prespecified broad clinical endpoint.
The trial likely got kneecapped by low event rates and a noisy endpoint (influenza or pneumonia coded hospitalization is a blunt instrument that dilutes a true influenza signal).
Principle: stop treating single-study VE as a binary verdict. We need a triangulation stack: lab-confirmed outcomes where possible, multiple seasons, strain level context, and real-world platforms that can update estimates fast when the virus changes.
Cardio-onc in 3 buckets you can actually remember:
BTK inhibitors (ibrutinib class): think AF + HTN + CTRCD. Screen with ECG, then ambulatory monitoring if symptomatic. Manage AF with beta blocker, anticoag as indicated, and if it is intolerable consider switching to acalabrutinib/zanubrutinib/pirtobrutinib rather than forcing a stop.
ICIs: the problem is myocarditis (plus pericarditis, arrhythmias). Symptoms are often vague. If suspected: ECG + troponin/BNP (often with CK, AST/ALT), echo, CMR, and EMB when needed. Treat early with high dose IV steroids. This is the one toxicity where continuing therapy is usually not the move.
VEGF inhibitors: think hypertension first, then CTRCD and ATE/VTE. Diagnose with repeated BP checks. Treat with ACEi/ARB and or a dihydropyridine CCB, and if BP ≥160/100 start dual therapy. This is classic permissive cardiotoxicity: control BP, keep the cancer drug going when feasible.
Modern cancer therapies heighten the importance of managing CV treatment-related complications. A new @ACCinTouch CCG report outlines best practices for diagnosing & managing CV adverse effects from Bruton’s TKIs, ICIs & VEGF inhibitors. https://t.co/3Hl4XXN052 #JACC#CardioOnc
JAMA Research of the Year, condensed.
1. GLP-1s in obesity + HFpEF + T2D: sema and tirz associated with >40% lower HF hospitalization or all-cause death vs sitagliptin proxy. No clear HF advantage of tirz over sema. Observational signal, not a class mandate.
2. Shingles vaccine and dementia: quasi-random age eligibility showed ~2 percentage point absolute reduction in new dementia diagnoses over ~7 years. Strong natural experiment, mechanism still unclear.
3. LLMs in medicine: review of 500+ studies. Only ~5% used real patient data. Most test exam knowledge, not workflow, bias, or outcomes. We are massively overclaiming readiness.
4. Newborn genome sequencing: ~72% consent, ~4% actionable positives. ~90% of positives would be missed by standard newborn screening. Raises yield, raises ethics, raises cost questions.
5. US POINTER trial: multidomain lifestyle intervention improved cognition in at-risk adults. No true no-intervention control and not powered for incident dementia. Signal yes, policy leap no.
6. TRAIN neurocritical care: liberal transfusion threshold (Hgb 9 vs 7) improved neurologic outcomes and reduced ischemic events. One of the few transfusion trials that actually changes practice in a defined population.
7. COMET DCIS: active monitoring noninferior to usual care for low-risk DCIS. Massive reduction in mastectomy rates. Overtreatment finally getting randomized data.
8. BEe-HIVe: HepB-CpG vaccine in HIV nonresponders achieved ~95–99% seroprotection, faster and more reliable than alum formulations. This should quietly become standard.
9. Launch-HTN: lorundrostat add-on dropped SBP ~17 mmHg at 6 weeks with minimal discontinuation. Aldosterone synthase inhibition looks real, durability and safety still the question.
Theme: host biology, prevention, and systems matter as much as new drugs. Timing, selection, and endpoints are everything.
Introducing JAMA’s Research of the Year.
Chosen by JAMA editors, the inaugural roundup highlights 9 of the most impactful, newsworthy, and novel studies published in the journal over the past year. 🧵
#JAMAROTY25
🔗 https://t.co/zvpFrvH0Os
Clean mechanism, but the key point people miss:
This is not aspirin acting on tumor COX-2 directly. It is platelet COX-1 inhibition upstream.
Low dose aspirin shuts down platelet TXA2. That removes thromboxane mediated suppression of CD8 T cells during early metastatic seeding. Tumor COX-2 and PGE2 fall secondarily because platelets are no longer licensing the metastatic niche.
Implication: timing and dose matter. This is about peri-metastatic immune escape, not late stage tumor cytotoxicity or blanket cancer prevention.
All of that is downstream of incentives. When your revenue model is growth plus institutional deals, the shortest path is a tool that looks “good enough” in a pilot, passes a few synthetic safety checks, and makes clinicians feel faster. There is no commercial pressure to slow the user down with uncertainty, to highlight missing data, or to admit “we actually do not know.”
Astra is built on the opposite premise.
Start with capability first. Use a frontier class model that can actually handle long contexts, nuanced reasoning, and subtle edge cases. Then restrict it. Feed it only from whitelisted journals, guidelines, registries, and other vetted sources. Work at full text level as much as possible, not just abstracts. Treat each trial, meta analysis, or guideline as a structured object, not just a blob of text.
On top of that, build a retrieval layer that cares about both recall and discriminative value. It should be able to say “these are the three trials that actually move the needle on this exact question, here are their designs, here are the key subgroup results, here is how they conflict.” Not just “here are ten vaguely relevant papers.”
Then force the system to show its work.
If there is no solid evidence, Astra should say that outright and default to guideline level or expert consensus, clearly labeled as such. If there is a gap in the literature, that should be visible, not silently papered over.
Finally, design the product so that exam performance is a side effect, not the objective. If you can reason correctly over real trials and guidelines, you will crush multiple choice questions by accident. But the reverse is not true. Optimizing for test questions does not magically produce a trustworthy clinical engine.
The gap between “LLM that feels like a smart resident” and “system you can safely lean on for decisions” is wide. Most current vendors live in the first category. They are useful, sometimes impressive, and occasionally dangerous. Astra is aimed at the second category. It treats evidence as the primary unit, builds transparency into the core workflow, and accepts that sometimes the honest answer is uncertainty, not a confident paragraph with pretty citations.
That is the entire point. Not to win a leaderboard, but to build something that actually respects how high the stakes are when words on a screen turn into orders on a patient.
Most medical LLM tools look impressive until you ask a simple question: what, exactly, are they optimized for?
Most are optimized for passing exams, sounding fluent, and keeping liability low. That is a very different objective than helping a real clinician make a hard decision at 2 a.m. with incomplete data and a fragile patient.
The current crop of “evidence assistants” all follow the same recipe. Take a relatively weak or older base model. Fine tune it on abstracts, exam banks, and a narrow slice of PubMed. Wrap it with a vector search index over PDFs. Slap a clean UI on top with citations in the footer. Market it as a specialist brain.
On paper, it looks safe. In practice, you are talking to a glorified pattern matcher whose field of view is whatever its proprietary curation pipeline decided to include.
The abstract problem is real. Many of these systems work mostly on abstracts and open text, not consistently on full articles. That means the model is “summarizing” trials it has never seen in detail. Methods, subgroup analyses, exclusions, adverse event tables, and time horizons are compressed into a few sentences. For teaching vignettes that is fine. For anticoagulation in a cirrhotic patient with renal failure, it is not.
Retrieval is usually shallow. Vendors talk about “semantic search over millions of papers,” but they rarely publish precision, recall, or any real retrieval metrics. You paste in a question, the system fetches a handful of vaguely related articles, and the model hallucinates connective tissue between them. Because the underlying model is weak, it falls back to generic language and generic recommendations. The citations at the bottom give you the illusion of rigor, but most users will never click through to see how loosely those papers actually relate to the claim.
Curation is opaque. You have no idea which journals are included, how often the index updates, how preprints are handled, or whether certain sponsors and specialties are overrepresented. The bias is invisible until you run into it. One day the tool pushes an aggressive practice pattern because it saw a high profile trial in a top journal and never surfaced the negative follow up data from a smaller, less glamorous venue. The UI will still show tidy bullet points, as if the world were settled.
There is also the exam trap. Many vendors anchor their story on board scores. “Our model beats residents on XYZ question bank.” That sounds strong until you remember what those exams actually measure. They reward pattern recognition on stylized cases with a single correct answer baked into the stem. They do not test messy multimorbidity, conflicting guidelines, insurance constraints, or patient preferences. Building your product and brand around test performance fundamentally biases the system toward short, confident answers that look good on multiple choice, not long messy outputs that expose uncertainty and tradeoffs.
Weak base models amplify all of this. Smaller or older LLMs hallucinate more, struggle with long chains of reasoning, and collapse under genuine ambiguity. To keep them “safe,” vendors add aggressive filters and templates. The result is a system that sounds cautious and responsible, but actually just avoids committing to anything precise. You get hedged generalities. You do not get the specific ARR, NNH, trial name, follow up duration, and key subgroup caveat that would actually change behavior.
The common failure modes are predictable:
- Outdated guidelines quietly trump newer RCTs because the system is tuned to respect certain document types above all else
- Rare but important safety signals never surface because the index is brittle or the model compresses away anything that does not fit the most common pattern
- Conflicting evidence gets smoothed over into bland prose instead of being exposed as a real conflict the clinician needs to see
A focused bedside pathway for syncope prioritizes early identification of cardiac and other high-risk causes using history, exam, orthostatics, and 12‑lead ECG, followed by targeted testing. Use validated short-term risk tools to support—but not replace—clinical judgment, and reserve admission or expedited workup for high-risk features.
https://t.co/uEfXKS85Dx
Nice to see WGS at this scale in breast cancer. 1,364 tumors with RNA and outcomes. WGS HRD signatures tag 23% of cases (72% of basal like) and in TNBC HRD had far better DFS after AC chemo (HR ~0.10). They also see ecDNA driven ERBB2 amps and copy number/ITH metrics tracking response to CDK4/6i and anti HER2 therapy, which makes this feel like the new reference map for WGS based biomarkers in breast oncology.