Olcan @olcan - Twitter Profile

No one: Claude Opus 4.8 Max: Let me refine your load-bearing claim rather than just accepting it, because you’re doing zero moves there, and the gap is what’s actually interesting. The one place I’d still push, because I think it matters: your message is wearing content-clothes, but the content isn’t actually *there*. The tell: it’s just an empty string. But the emptiness of the string IS its lack of content. Pull one, and the other goes inert. That’s the structural spine.

210

5K

305

712

554K

Olcan

@olcan

3 days ago

@MarioNawfal hope she is ok but that was a clean shoulder check 😆

0

4

0

338

Olcan

@olcan

6 days ago

@ThePrimeagen you were not ready

0

1

0

250

Olcan

@olcan

27 days ago

@alexisgallagher @John_Attridge

0

2

0

178

olcan retweeted

Corey Quinn

@QuinnyPig

28 days ago

"AI code is crap." The shit your human engineers get up to:

187

16K

710

2K

720K

olcan retweeted

Deedy

@deedydas

29 days ago

The Ultimate List of Artificial Intelligence "Neolabs": May 2026. A Neolab is a pre-revenue scale startup working on long-term AI breakthroughs, usually with a $1B+ valuation. There are now 63 of them!

deedydas's tweet photo. The Ultimate List of Artificial Intelligence "Neolabs": May 2026.

A Neolab is a pre-revenue scale startup working on long-term AI breakthroughs, usually with a $1B+ valuation.

There are now 63 of them! https://t.co/7SlUmed6pW

109

2K

237

3K

541K

olcan retweeted

Deedy

@deedydas

about 1 month ago

The creators of SWE-Bench just dropped a really simple new benchmark every LLM gets 0% on. ProgramBench asks: can models recreate real executable programs (ffmpeg, SQLite, ripgrep) from scratch with no internet? We are far from saturated on model quality.

deedydas's tweet photo. The creators of SWE-Bench just dropped a really simple new benchmark every LLM gets 0% on.

ProgramBench asks: can models recreate real executable programs (ffmpeg, SQLite, ripgrep) from scratch with no internet?

We are far from saturated on model quality. https://t.co/LrYfNLrpms

251

5K

441

2K

841K

olcan retweeted

Kevin Patrick Murphy

@sirbayes

about 1 month ago

Major update to my "Bayesian Linguistic Forecasting" paper! I have now tried it on 5 different LLMs: Gemini 3.1 Pro, Gemini 3 Flash, Sonnet 4.6, GPT 5.4 and Kimi K2.5. It improves performance across the board, although BLF+Pro is still the winner, and outperforms all other methods on Forecast Bench leaderboard.

sirbayes's tweet photo. Major update to my "Bayesian Linguistic Forecasting" paper! I have now tried it on 5 different LLMs: Gemini 3.1 Pro, Gemini 3 Flash, Sonnet 4.6, GPT 5.4 and Kimi K2.5. It improves performance across the board, although BLF+Pro is still the winner, and outperforms all other methods on Forecast Bench leaderboard.

4

77

10

28

9K

Olcan

@olcan

about 1 month ago

@pmarca would recommend trying to reset/simplify as you upgrade to newer models

0

5

0

544

Olcan

@olcan

about 1 month ago

@s8mb or reading past the “claudia”

0

90

Olcan

@olcan

about 1 month ago

@RealJasonBeatty @saras76 evidence

0

15

olcan retweeted

Susan Zhang

@suchenzang

about 1 month ago

been a while since i've seen such a well-articulated paper highlighting train-test gap, path-dependency of training dynamics on convergence, and more it would be a funny stretch if a "better optimizer" now leads to "overconfidence on misclassified test examples", aka brittle sycophancy we now see in many frontier models... 👀