Yishan

@yishan

I run Terraformation, and I was once the CEO of Reddit. Both are very interesting challenges. AMA in a subscriber-only newsletter!

Made on Earth by Humans

Joined April 2008

535 Following

106K Followers

26.4K Posts

Pinned Tweet

Yishan

@yishan

4 months ago

Do you miss the old Quora answers of yore? I'm trying out a subscriber-only newsletter where I answer questions: https://t.co/uO15AoN9O6 Secrets to tech, AI, climate, social platforms, and more!

38K

Yishan

@yishan

about 7 hours ago

When I was in high school I had a girlfriend who, when I was over at her house and her parents were out, she’d suggest giving each other backrubs. To which I was generally unenthused, thinking “this is boring, why are we doing this?” Years later I recounted this to my wife, and she laughed and laughed.

Yishan

@yishan

about 7 hours ago

@sri_srikrishna Also thank the original study authors who made it possible by open-sourcing!

397

Yishan

@yishan

about 9 hours ago

A big problem with research studies on AI models is that given how long the peer review process is, the results are always out-of-date by the time the paper is published. This time, we have something better! The typical reaction to research results like this roughly goes "You're just testing on old models. Today's models are way better and surely can do it now!" But the best solution is for these papers to also open-source all of their testing framework so that upon publication, others can reproduce their results, as well as run it on the newest models of the day - and into the future. After all, "this is the worst they'll ever be" so what really matters is determining when they DO pass the threshold. As it turns out, the authors of this paper DID open-source their evaluation framework! Here: https://t.co/iXLwmItKwu So I figured... let's re-run the tests on the latest models! Summary of our results are here: https://t.co/1Dzj0UcJUQ One drawback is that, unfortunately, the authors didn't (or weren't legally able to) open-source ALL the testing data, since apparently some of it is copyrighted by JAMA/NEJM etc. That's a separate problem with the medical research publishing industry for another time. However, we were able to reproduce the test on the public datasets they did include! First, we re-ran the same tests (as closely as we could) on the old models the paper claimed to use, in order to establish a baseline and determine how much "drift" there would be. (Answer: not too much) Then we ran those tests on the newest frontier models we could find. The results are: the most capable models today (GPT-5.5 Pro) did outperform the best models from before (79/100 vs 69/100), but did not improve enough to be considered sufficient for reliable medical use. In fact, the paper's criterion for "fit for reliable medical use" is more stringent, requiring the models to be robust under perturbation and bad data, knowing when to say there's not enough information, give clinically valid reasoning rather than hallucinations, etc. Those sound pretty reasonable to me. I wasn't able to reproduce that kind of qualitative evaluation, but even on the basic pass/fail test using public datasets of interpreting radiology images, the newest models are better, but not yet quite good enough. Nevertheless, I would like to praise the paper's authors for at least open-sourcing what they could, enabling me to (fairly quickly) attempt to reproduce their results. This is definitely a step in the right direction! While my reproduction wasn't able to be comprehensive, it certainly gave me useful directional info and - perhaps more importantly - allowed me (a random dude on the internet) to directly reproduce the results in their paper and validate them. I would like to encourage ALL authors of research papers on AI models to do similar open-sourcing of their experimental frameworks!

Eric Topol

@EricTopol

1 day ago

We stress tested many frontier AI models for multimodal medical reasoning (including GPT-5, Claude 3.5, Gemini 2.5 Pro). They’re not ready. Faulty reasoning, use of inappropriate shortcuts, hallucinations. Published today @NatureMedicine https://t.co/P6eHZEmfbW

EricTopol's tweet photo. We stress tested many frontier AI models for multimodal medical reasoning (including GPT-5, Claude 3.5, Gemini 2.5 Pro). They’re not ready. Faulty reasoning, use of inappropriate shortcuts, hallucinations. Published today @NatureMedicine https://t.co/P6eHZEmfbW https://t.co/ovRsi4cJbE

109

286

481

159K

128

37K

Who to follow

Marc Andreessen 🇺🇸

@pmarca

You’re not talking to someone who woke up a loser. That loser attitude, that loser premise makes no sense to me.

Patrick Collison

@patrickc

@Stripe CEO, @ArcInstitute cofounder.

about 11 hours ago

@shayy_lior @grok Codex 5.5 xhigh said 6270

Yishan

@yishan

about 12 hours ago

@mickeyxfriedman @cantrell Simultaneously non-technical founder and also a 10x engineer

Yishan

@yishan

about 17 hours ago

@inferredbylisa I apparently can't do 3d spatial manipulation and talk at the same time. I was flying a drone once and someone started talking to me and I realized that I couldn't respond because my speech centers were occupied.

775

Yishan

@yishan

about 20 hours ago

@TheZvi The majority of people do not actually understand the concept of rule of law when it is stated. This is a top-10%ile concept, and most people who DO understand it don’t realize this.

Yishan

@yishan

about 21 hours ago

@aykutuz @EricTopol @NatureMedicine Oh HELL YEAH this is exactly what I was asking for! And they already did it!

238

Yishan

@yishan

about 21 hours ago

Yes, the displaced Judeo-Christian impulse in secular society is why Europe and the US are two sides of being unable to effectively combat climate change: they view emissions as “sin” and the US rebels against it while Europeans submit to it. Neither simply views it as a technical problem to accept and solve.

982

Yishan

@yishan

about 23 hours ago

@sc2btc @sudoingX I got your reference! https://t.co/j4ZBgpyqrT

130

Yishan

@yishan

about 23 hours ago

It messed up my sleep schedule for over a year before I forced a fix by pushing my sleep schedule in the OTHER direction (going to sleep even later), and being progressively more inverted for maybe a month or so until it came all the way back around to a reasonable bedtime and then I pinned it there with melatonin. Maybe just skipping to the melatonin will work? The key is two-fold: - take 1mg, not the 2-5mg the bottle usually says - take it 2 hours before you want to feel sleepy (rather than at bedtime) Higher doses lead to more brain activity (disturbance due to crazy dreams) and taking it and going to bed right away also makes it harder to sleep. Doing it 2 hours prior signals to your brain that bedtime is coming and to start settling down.

178

Yishan

@yishan

about 23 hours ago

@sc2btc @sudoingX Technology versus… horse

Yishan

@yishan

1 day ago

@bscholl Literally a revolutionary new type of motor.

Yishan

@yishan

1 day ago

@martyrdison @MurrayHillGuy1 I have found this to be personally true as well

Yishan

@yishan

2 days ago

@theramblingfool @CharlesFLehman Yes, truth is an absolute defense to

Yishan

@yishan

2 days ago

@omgsidewalks Maybe you want to look at how much China has done to end poverty and realize you may want to re-organize your worldview to look for heroes and villains elsewhere.

976

yishan retweeted

Tsung Xu

@tsungxu

2 days ago

Restrained ground testing begun. The speed at which we can identify and fix issues with both the prototype and our processes is just unreal. At least 10x if not 100x faster than how fast I was fixing issues on my own VTOL builds back in pre-agentic dev days.

108

30K

Yishan

@yishan

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users