Shivi Bhatia @Shivipmp - Twitter Profile

Pinned Tweet

4 days ago

Released ClinAuthBench v1: a synthetic inpatient health authorization benchmark for testing whether LLMs can reason over dense chart packets without inventing unsupported claims. HF dataset: 📷https://t.co/QMxkjndnAf GitHub: https://t.co/eM8NBqTn0k

1

8

0

3

82K

Shivi Bhatia

@Shivipmp

about 1 hour ago

@MParakhin Unless the code you wrote was bad clumsy and ai generated in no serious way Claude can find these issues in a large code base unless you baby sit it , talk to it like a toddler . It actually sucks on a large code base

Shivipmp's tweet photo. @MParakhin Unless the code you wrote was bad clumsy and ai generated in no serious way Claude can find these issues in a large code base unless you baby sit it , talk to it like a toddler . It actually sucks on a large code base https://t.co/ofvpZZJ5Rs

0

61

Shivi Bhatia

@Shivipmp

about 7 hours ago

@gdb It would also help if you can break it down by industry like healthcare , agriculture, how droughts arr being caused- more insights to calamities - how do we share on what we do using codex

0

30

Shivi Bhatia

@Shivipmp

1 day ago

@_philschmid Isn’t this where model should be intelligen enough to do itself - rather than training the model for specific behavior. After 6 months something different comes in do we still retraining the model and the cycle always continues ?

0

78

Who to follow

Kim Unger

@WizardOfViz

Tableau fanatic trying to understand this world one data set at a time. Tableau Certified Professional. Tableau Certified Trainer. Soccer Mom. Poker Player.

EARL Conference

@earlconf

Enterprise Applications of the R Language conference brought to you by Datacove #earlconf

2 days ago

@deepakshenoy Why is this bad they have taken the tax completely off

0

4K

Shivi Bhatia

@Shivipmp

2 days ago

@larsencc Physical ai Markov decision process

0

13

Shivi Bhatia

@Shivipmp

3 days ago

@vijayshekhar lol yeah it’s 2022 when gpt was launched

0

83

Shivi Bhatia

@Shivipmp

3 days ago

@OfficialLoganK Great initiative- since these are fresh brain new ideas should be explored based on what they think is a challenge

0

1

0

28

Shivi Bhatia

@Shivipmp

3 days ago

@yuyinzhou_cs This is interesting . I also released a dataset on similar healthcare : https://t.co/QMxkjndnAf The idea is similar final answer accuracy is not enough. You need workflow-stage evaluation, validation checks, deterministic scoring, and error diagnosis

2

1

0

60

Shivi Bhatia

@Shivipmp

3 days ago

@dwarkesh_sp @srush_nlp What happens if the second model has bias, is not a SOTA as first one, hallucinated & gave incorrect recommendation . I understand trajectory → reward = 0. But your bet is The bet is: “A slightly wrong local signal is better than an extremely sparse global signal.”

0

666

Shivi Bhatia

@Shivipmp

4 days ago

No real patient data. No payer/provider/facility/EHR affiliation.this also include a notebook. This is evaluation-scale, not training-from-scratch scale. The goal is to support reproducible work on authorization reasoning, evidence-grounded summarization, contradiction handling, and hallucination control in dense clinical-style documentation

0

33

Shivi Bhatia

@Shivipmp

4 days ago

Released ClinAuthBench v1: a synthetic inpatient health authorization benchmark for testing whether LLMs can reason over dense chart packets without inventing unsupported claims. HF dataset: 📷https://t.co/QMxkjndnAf GitHub: https://t.co/eM8NBqTn0k

1

8

0

3

82K

Shivi Bhatia

@Shivipmp

4 days ago

The differentiator is evidence discipline. Each case includes: dense 72-hour data structured gold label evidence anchors constraints documentation-challenge labels MDP trajectory metadata V1 includes 180 synthetic cases: 108 continued-stay 72 safe/LLOC-ready 22 contradiction/conflicting-documentation cases cases 121-180 include probabilistic MDP traces

1

0

90

Shivi Bhatia

@Shivipmp

5 days ago

You could have ford raptor r for off-roading & a ct5 v blackwing as your daily driver and sports vehicle right so why not 2 models . In space of research ,finding needle in the haystack codex is way better Claude sucks & breaks down on extremely long complex cases but UI hands down better

0

1

0

278

Shivi Bhatia

@Shivipmp

5 days ago

@StockMKTNewz He’s saying it for donkey years and same for US dedollarization

0

46

Shivi Bhatia

@Shivipmp

5 days ago

@Midnight_Captl L O L

0

41

Shivi Bhatia

@Shivipmp

5 days ago · Woodinville

Some really weird error occurring while using OpenAI model in codex not sure what is this , not using any image model or anything in my repo

Shivipmp's tweet photo. Some really weird error occurring while using OpenAI model in codex not sure what is this , not using any image model or anything in my repo https://t.co/Vyu0L1809a

0

54

Shivi Bhatia

@Shivipmp

6 days ago

@npjDigitalMed Do we also do something from industry folks

0

48

Shivi Bhatia

@Shivipmp

6 days ago

@AravSrinivas lol so burn more tokens vs something that could have been done more efficiently and cost effective

0

21

Shivi Bhatia

@Shivipmp

7 days ago

@bentossell Codex is the best tool hands on , all OpenAI models are way better

0

66

Shivi Bhatia

@Shivipmp

7 days ago

@DrDatta_AIIMS Have 2 poster papers selected in healthcare Berkeley ai summit and AMIA - not sure if someone from US on Visa can collaborate- this is more on behavioral health - Suicide , opioid , clinical depression, trails , clinical forms , lab etc

0

2

0

403

Shivi Bhatia

@Shivipmp

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users