Kirk Willmarth @kirkwillmarth - Twitter Profile

about 2 months ago

We're excited to share TARIO-2, a new bio foundation model architecture for generative spatial whole transcriptomes from simple pathology images! Read more here:

0

81

17

40

10K

kirkwillmarth retweeted

Alex Prompter

@alex_prompter

6 months ago

This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly: Can LLMs actually discover science, or are they just good at talking about it? The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder: Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists? Here’s what the authors did differently 👇 • They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision • Tasks span biology, chemistry, and physics, not toy puzzles • Models must work with incomplete data, noisy results, and false leads • Success is measured by scientific progress, not fluency or confidence What they found is sobering. LLMs are decent at suggesting hypotheses, but brittle at everything that follows. ✓ They overfit to surface patterns ✓ They struggle to abandon bad hypotheses even when evidence contradicts them ✓ They confuse correlation for causation ✓ They hallucinate explanations when experiments fail ✓ They optimize for plausibility, not truth Most striking result: `High benchmark scores do not correlate with scientific discovery ability.` Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories. Why this matters: Real science is not one-shot reasoning. It’s feedback, failure, revision, and restraint. LLMs today: • Talk like scientists • Write like scientists • But don’t think like scientists yet The paper’s core takeaway: Scientific intelligence is not language intelligence. It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.” Until models can reliably do that, claims about “AI scientists” are mostly premature. This paper doesn’t hype AI. It defines the gap we still need to close. And that’s exactly why it’s important.

alex_prompter's tweet photo. This paper from Harvard and MIT quietly answers the most important AI question nobody benchmarks properly:

Can LLMs actually discover science, or are they just good at talking about it?

The paper is called “Evaluating Large Language Models in Scientific Discovery”, and instead of asking models trivia questions, it tests something much harder:

Can models form hypotheses, design experiments, interpret results, and update beliefs like real scientists?

Here’s what the authors did differently 👇

• They evaluate LLMs across the full discovery loop hypothesis → experiment → observation → revision
• Tasks span biology, chemistry, and physics, not toy puzzles
• Models must work with incomplete data, noisy results, and false leads
• Success is measured by scientific progress, not fluency or confidence

What they found is sobering.

LLMs are decent at suggesting hypotheses, but brittle at everything that follows.

✓ They overfit to surface patterns
✓ They struggle to abandon bad hypotheses even when evidence contradicts them
✓ They confuse correlation for causation
✓ They hallucinate explanations when experiments fail
✓ They optimize for plausibility, not truth

Most striking result:

`High benchmark scores do not correlate with scientific discovery ability.`

Some top models that dominate standard reasoning tests completely fail when forced to run iterative experiments and update theories.

Why this matters:

Real science is not one-shot reasoning.

It’s feedback, failure, revision, and restraint.

LLMs today:

• Talk like scientists
• Write like scientists
• But don’t think like scientists yet

The paper’s core takeaway:

Scientific intelligence is not language intelligence.

It requires memory, hypothesis tracking, causal reasoning, and the ability to say “I was wrong.”

Until models can reliably do that, claims about “AI scientists” are mostly premature.

This paper doesn’t hype AI. It defines the gap we still need to close.

And that’s exactly why it’s important.

378

8K

2K

6K

1M

Kirk Willmarth @kirkwillmarth

about 1 year ago

This is a worthwhile and wonky read on some of the potential (existential) impacts to the FDA based on recent events

Alexander Gaffney

@AlecGaffney

about 1 year ago

This piece is now available to read on our free site due to public interest: https://t.co/Qff2lqiFeg

0

7

0

591

0

32

Kirk Willmarth @kirkwillmarth

over 1 year ago

@LizzyLaw_ Interesting that AI in devices seems directly targeted. Any sense for whether the impacted people were involved in Neuralink interactions?

0

22

Who to follow

Joon An

@joonomics

Scientist working on autism genetics. Associate Prof at Korea University. family 👨‍👩‍👧‍👦, swimming 🏊🏼‍♂️ but hiking too ⛰

Maria Chahrour

@MariaChahrour

Studying the genetics of autism spectrum disorder @UTSWMed McDermott Center. Tweets and opinions my own

Han Fang

@Han_Fang_

AI Research @ Meta SuperIntelligence Labs

kirkwillmarth retweeted

Larry Levitt @larry_levitt

over 1 year ago

Starting today, Medicare beneficiaries will have their annual out-of-pocket drug costs capped at $2,000 as a result of the Inflation Reduction Act.

larry_levitt's tweet photo. Starting today, Medicare beneficiaries will have their annual out-of-pocket drug costs capped at $2,000 as a result of the Inflation Reduction Act. https://t.co/h3K9TPXLJZ

14

476

214

30

58K

kirkwillmarth retweeted

Ron Alfa

@Ronalfa

over 1 year ago

Announcing OCTO-VirtualCell (vc) a multi-scale, multimodal transformer trained to predict gene expression for a virtual cell in cellular contexts within patient tissue samples. Complete wth the Celleporter demo app to explore the data! 1/

36

437

85

337

82K

kirkwillmarth retweeted

Berna Sozen @BernaSozen_

over 1 year ago

Our latest out today @Nature! We revisit a century-old theory and show that metabolic gradients work hand-in-hand with genetic&signalling instructions to shape the embryo. Glucose isn't just an energy source, it acts as a major conductor of the body plan🍬 https://t.co/mfx6Hk7nma

30

989

202

258

92K

kirkwillmarth retweeted

Mushtaq Bilal, PhD

@MushtaqBilalPhD

over 1 year ago

50% of scientists stop publishing within a decade of starting their career. A recent study looked at the careers of 140,000 scientist and concluded that less than half were still publishing 15 years later.

MushtaqBilalPhD's tweet photo. 50% of scientists stop publishing within a decade of starting their career.

A recent study looked at the careers of 140,000 scientist and concluded that less than half were still publishing 15 years later. https://t.co/LXx6PFxjq8

7

88

26

35

26K

Kirk Willmarth @kirkwillmarth

almost 2 years ago

I had this exact experience. Our automated rent increase put us over the available open unit down the hall. We asked them to match, they said they were not allowed to, so we moved units for a lower rent, and even got a new tenant incentive on top (gift card).

Scott Santens

@scottsantens

about 2 years ago

Housing by algorithm is so much nonsense. I'm experiencing it directly in a way I hadn't known about before. My apartment building uses one of these legal price-fixing algorithms. Our lease is up soon and we saw that identical units to ours were going for much cheaper than what we were being offered to stay in our unit. We told them we'd be happy to stay if they lowered our rent to the market price, but they said their hands were tied. They must obey the Algorithm. The almighty Algorithm appears to mandate that all rent must rise if the current renter wants to renew a lease. The increase must be at least ~1.5% even if a new renter will pay much less. So we decided to move into a unit slightly larger than our existing unit and pay $600/mo LESS. Our existing unit is now on the market for $600/mo less too. We would have preferred to just stay where we are and pay less, but Our Lord Algorithm does not allow that. It's weird to think that what we could have done was not renew our lease, and then sign a new lease in the same unit the minute it went on the market for less. But we weren't allowed to just renew our lease at the lower rate. Because of the Algorithm. So keep that in mind, fellow renters. When your lease is up, make sure and check to see what the Algorithm God is saying the rent should be for similar places, not what it says you should pay for your current place. And consider the savings of moving within the same building if that's an option, or even risk not renewing and picking up your same place with a lower market rate lease. All hail the Algorithm.

35

513

181

48

61K

0

10

2

0

1K

kirkwillmarth retweeted

Spencer Greenberg 🔍

@SpencrGreenberg

almost 2 years ago

Does astrology work? We tested the ability of 152 astrologers to see if they could demonstrate genuine astrological skill. Here is how the study was designed and what we found (including a result that really surprised me): 🧵

SpencrGreenberg's tweet photo. Does astrology work? We tested the ability of 152 astrologers to see if they could demonstrate genuine astrological skill.

Here is how the study was designed and what we found (including a result that really surprised me):

🧵 https://t.co/o5hpbeNMkL

464

21K

4K

17K

6M

kirkwillmarth retweeted

Scott Gottlieb, MD 🇺🇸

@ScottGottliebMD

almost 2 years ago

SCOTUS decision is very significant for FDA. Courts will continue to defer to FDA on product-review decisions, where Congress gave FDA discretion to make fact-based decisions based on careful process often outlined in guidance. But we'll see material changes in other areas 1/n

2

338

120

84

179K

kirkwillmarth retweeted

NCHS @NCHStats

almost 2 years ago

#STATOFTHEDAY The overall number of Americans without health insurance dropped by 8.2 million from 2019 to 2023 https://t.co/4jf6zz3Kgm

NCHStats's tweet photo. #STATOFTHEDAY The overall number of Americans without health insurance dropped by 8.2 million from 2019 to 2023 https://t.co/4jf6zz3Kgm https://t.co/6GZy2FpOMJ

5

69

40

7

69K

kirkwillmarth retweeted

STAT

@statnews

almost 2 years ago

There was mounting excitement about the FDA's review of Lykos' MDMA-assisted therapy. But former employees said the company committed a series of lapses. https://t.co/ahLmvgaYUp

1

3

2

5K

kirkwillmarth retweeted

Paras Sharma @paras_biotech

about 2 years ago

VC Fund size to DPI plot - 168 funds covered & Vintage year 2014-2016 1. Only ~20% are in the carry 2. >$2B barely returned deployed capital back to LPs (not including fees/hurdle) 3. Median fund size of $200M returned >1.5x DPI Will post other thematic observations later

paras_biotech's tweet photo. VC Fund size to DPI plot - 168 funds covered & Vintage year 2014-2016

1. Only ~20% are in the carry
2. >$2B barely returned deployed capital back to LPs (not including fees/hurdle)
3. Median fund size of $200M returned >1.5x DPI

Will post other thematic observations later https://t.co/9hn6snT5Lb

7

104

26

91

50K

kirkwillmarth retweeted

NYT Science

@NYTScience

about 2 years ago

Daniel Kahneman helped pioneer a branch of economics that exposed hard-wired mental biases in people’s economic behavior. The work led to a Nobel. He has died at 90. https://t.co/KTfB1XBLFh

1

20

9

3

7K

kirkwillmarth retweeted

Ethan Mollick

@emollick

about 2 years ago

Write up of our paper: https://t.co/MqXtAdn3bh

4

84

5

26

15K

kirkwillmarth retweeted

Eric Topol

@EricTopol

about 2 years ago

When #AI support was provided to 140 radiologists, there was marked variability and unpredictability as to its impact on performance https://t.co/YGXUcagIWi @NatureMedicine @pranavrajpurkar @feiyangkathyyu

EricTopol's tweet photo. When #AI support was provided to 140 radiologists, there was marked variability and unpredictability as to its impact on performance
https://t.co/YGXUcagIWi @NatureMedicine @pranavrajpurkar @feiyangkathyyu https://t.co/iNv9bsHP5V

3

251

87

76

49K

kirkwillmarth retweeted

Carlos D. Bustamante 🇻🇪🇺🇸

@cdbustamante

about 2 years ago · Coral Gables

We at @GalateaBio are thrilled to be partnering with @illumina to build the Biobank of the Americas, a 10 million person participatory cohort to enable precision health at scale for all. https://t.co/JpakUj9U8y

9

207

44

24

37K

kirkwillmarth retweeted

Bruce Booth

@LifeSciVC

about 2 years ago

Crowding is happening: HER2 has 14 active programs in the clinic, FRa and CLDN18.2 have 7 each.

5

57

17

38

36K

kirkwillmarth retweeted

Larry Levitt @larry_levitt

over 2 years ago

New: Somewhat remarkably, just 39% of adults know that the Affordable Care Act prohibits insurers from denying coverage due to pre-existing conditions. https://t.co/kbxSnxdCgE

larry_levitt's tweet photo. New: Somewhat remarkably, just 39% of adults know that the Affordable Care Act prohibits insurers from denying coverage due to pre-existing conditions.

https://t.co/kbxSnxdCgE https://t.co/LmAdfyDcgJ

16

223

124

12

160K

Kirk Willmarth

@kirkwillmarth

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users