Faculty at UKY. Views my own, not of my employer(s). Work: #BioNLP, #NLProc, medical informatics, machine learning, LLMs, AI & fairness, health+socialdata
Today was my last session of being a standing panel member on the CDMA study section. It was highly time consuming but also rewarding. I reviewed nearly 100 grants (mostly R01s) & scored over 400 grants in clinical NLP/AI over the past 5 years. Thanks to the NIH & my department.
Today we’re introducing Gemma 4 12B — our latest open model that brings advanced agentic reasoning, vision and audio directly to your laptop.
It delivers performance nearing our larger Gemma models with a much smaller total memory footprint, while being small enough to run locally with just 16GB of VRAM. It’s open and accessible for everyone to use under a permissive Apache 2.0 license.
This is all made possible by our new, unified architecture that removes separate multimodal encoders. Here’s how we did it 🧵
@yoavgo It's a count of anything: papers, grants, invited talks, conference PC roles, whatever aligns with "productivity." When I write letters for others' green cards or tenure, I put those stats in there (as a brief bio) as that matters to those who read the letters :-)
I don't penalize anything (grants, papers) based on whether I suspect it is AI generated, for this reason. I don't dunk on people saying their tweets are AI generated. Focus on the content's value/merit; don't obsess with running pangram on it. Peace! (Yes, AI slop is annoying.)
After a minor detour in Paris, I am back in KY as the inaugural DGS of our new PhD program in biomedical informatics and data science. Exciting and busy times ahead!
Your RL post-training may be sabotaging your LLM’s test-time scaling!
Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*.
We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.
Excited to share our new paper!
“Forecasting Downstream Performance of LLMs With Proxy Metrics”
w/ my amazing advisors @sivareddyg, @mariusmosbach, @DBahdanau
Cross-entropy loss is a poor predictor of how models perform on downstream tasks (esp. reasoning). We propose something better: proxy metrics computed over expert reasoning traces.
🧵 Thread below 👇
Hope they all find a new job soon. Also never been clear what Meta’s main product is. Most people I know are not on Facebook or Instagram. Many use WhatsApp but I don’t see ads there. May be there is popularity in some demographics.
Adaptive Chunking is out!
It presents a framework that selects the most suitable chunking strategy for each document based on a set of five novel intrinsic, document-based metrics.
Code: https://t.co/fFKxvFSLpT
Paper: https://t.co/kK8n2InRk3
🔥 Excited to share our latest work - Med-V1: Small Language Models for Zero-shot and Scalable Biomedical Evidence Attribution
Recent reports in @Nature and @TheLancet on fabricated citations have drawn substantial attention, but even real citations may fail to support the statements attached to them.
This makes evidence attribution — verifying if the citations really support the claims — essential for auditing both human- and AI-generated texts. As AI generates billions of medical references every day, we need a scalable model for this task.
In this work:
- We generated MedFact-Synth, a high-quality dataset of 1.5M synthetic claim-article pairs.
- Using MedFact-Synth, we trained and open-sourced Med-V1, a family of 3B-parameter LLMs.
- Med-V1 surpasses its backbone models by 27-71%, matching the performance of GPT-5.
- Med-V1 can be used to identify high-stakes misattributions and detect LLM hallucinations.
🔗 Paper: https://t.co/R5KmjDbjdJ
🔗 Model: https://t.co/aBu14MDMhI
🙌 Kudos to all our great collaborators: Yin Fang, Lauren He, Yifan Yang, Guangzhi Xiong, Zhizheng Wang, Nicholas Wan, Joey Chan, Donald Comeau, Robert Leaman, Charalampos Floudas, Aidong Zhang, Michael F. Chiang, Yifan Peng & Zhiyong Lu
#MedicalAI #HealthAI #LLMs #Hallucination #EvidenceBasedMedicine #ChatGPT
After being busy with work in Mallorca, took a couple of days to explore Barcelona. Besides all the hot spots, the catholic monastery in Montserrat up in the mountains is the best there if you love nature and spirituality. The main church is beautiful and the singing so ethereal.
🚨 New Paper! 🚨
One of my first Ph.D. papers found that LLMs can answer multiple-choice questions without seeing the question 🤔
At #ACL2026, I'm presenting a follow-up showing that current reasoning LLMs can still do this! And quite similarly to a clever test-taker 🧑🎓🧵
Bye bye #LREC2026! It was a refreshing event focused on (multilingual) resources and old school comp ling and nlp. This is my first time and really enjoyed it. Found some interesting papers at the intersection of knowledge graphs and LLMs.
I support this. If it comes back and bites me, I will deserve it. This will reduce the AI slop that we will be asked to cite/compare in peer reviews. It will also induce much needed comeuppance for PIs who just “bless” their lab’s papers and don’t do enough due diligence.
Attention @arxiv authors: Our Code of Conduct states that by signing your name as an author of a paper, each author takes full responsibility for all its contents, irrespective of how the contents were generated. 1/
🚀 What happens if you temporarily train a bidirectional encoder like a decoder?
Surprisingly: better biomedical encoders 🧬
We release on @huggingface :
• ModernBERT-bio (EN)
• ModernCamemBERT-bio (FR)
• Base + Large
• 8192-token context
Thread 👇
1/
To train better open models, we need predictable scaling.
Delphi is Marin’s first step: we pretrained many small models with one recipe, then extrapolated 300× to predict a 25B-param / 600B-token run with just 0.2% error.
Getting there took some work 🧵
The data science revolution is here now.
TabPFN-3 is live, taking tabular foundation models to enterprise scale 🤩
1M training rows on a single H100. No training. No tuning. Load and predict.
🧵 1/5
#tabpfn#tabularfoundationmodels#priorlabs