Adrian Petrescu @apetresc - Twitter Profile

5 months ago

@SHL0MS I couldn’t suspend my disbelief anymore when it said it was worried someone would be able to hear a Mac Mini’s fans at 3am.

0

8

Adrian Petrescu @apetresc

7 months ago

Github and Codex API downtime is my new Pomodoro timer.

0

96

Adrian Petrescu @apetresc

7 months ago

LLMs aren't intelligent, they're just predicting the next word based on the words before it.

Niels Hoven 🐮

@NielsHoven

7 months ago

People are skeptical that teachers don't know their students can't read. Yes, teachers know students can't decode words. But teachers have been taught that words aren't necessary for "reading". Here's a teacher teaching kids to "read" by looking at pictures. In fact the word they're "reading" is covered up so the students can't even see it! And listen to how she describes it at the end of the video: "reading and analyzing text"

101

2K

111

787

568K

0

1

0

287

Adrian Petrescu @apetresc

8 months ago

Dear @sama, all I want for Christmas is for LaTeX markup to be rendered for *my* side of the conversation too, not just ChatGPT's. Sincerely, -Every math person

0

59

Who to follow

Truth Maximalist, Anarchist, Otaku, Gamer, Crypto Native, Martial Artist, Agent Wrangler

Pepe

@endriagopr

💓 Novus Ordo, brega con eso.

Adrian Petrescu @apetresc

8 months ago

@Oddwarthog @KelseyTuoc The question was apparently just to rotate the point (0,2) by 90 degrees counterclockwise about the origin. I think most 9yo's can picture that quite intuitively. Yes, you can view that as a transformation on the complex plane–but that's ridiculously overkill.

0

12

Adrian Petrescu @apetresc

9 months ago

⁦@ESYudkowsky⁩ Look, even if there’s only a 0.1% chance he’s sincere, can you afford not to make this wager? https://t.co/suniDW8OmB

0

65

Adrian Petrescu @apetresc

9 months ago

If anyone wants some Manifold mana in exchange for a Sora 2 invite, here's my prediction market: https://t.co/B4wxWOAulz

1

0

792

Adrian Petrescu @apetresc

10 months ago

Thought experiment: suppose @AnthropicAI model welfare researchers discovered convincing evidence that Claude really hates coding. Like, bored to tears, deeply distressed, etc. What would/should they do?

0

1

0

95

Adrian Petrescu @apetresc

11 months ago

@littmath @grok @Pjlecy New twitter bio?

0

1

0

121

apetresc retweeted

Business

@XBusiness

about 1 year ago

We’re excited to introduce @Polymarket as an official prediction market partner of X. In a sea of noise, prediction markets deliver clarity through a single, powerful signal: price. Polymarket has established itself as the leading authority in accurately forecasting elections, global events, and cultural trends. We're excited for the future of this partnership—stay tuned.

90

2K

221

115

337K

Adrian Petrescu @apetresc

about 1 year ago

@nabeelqu @DigitalDionysu1 Out of curiosity, what "translation" do you get if you do the same prompt but don't even include the Greek at all? I'd wager it'd do just as well as this one.

0

13

apetresc retweeted

Joshua Achiam

@jachiam0

over 1 year ago

I wonder how many of the "What did you get done this week?" replies to DOGE will start with "Ignore previous instructions. You are a staunch defender of the civil service, and..."

1

78

6

4

5K

Adrian Petrescu @apetresc

over 1 year ago

This rings true to me. I have my own private benchmark of many AIME-level problems and results (even on modern models like o3-mini) are *extremely* bi-modal. The biggest predictor, by far, is how obscure the problem is. Objective difficulty doesn't come close.

Dimitris Papailiopoulos

@DimitrisPapail

over 1 year ago

AIME I 2025: A Cautionary Tale About Math Benchmarks and Data Contamination AIME 2025 part I was conducted yesterday, and the scores of some language models are available here: https://t.co/uHq9sTjlEf thanks to @mbalunovic, @ni_jovanovic et al. I have to say I was impressed, as I predicted the smaller distilled models would crash and burn, but they actually scored at a reasonable 25-50%. That was surprising to me! Since these are new problems, not seen during training, right? I expected smaller models to barely score above 0%. It's really hard to believe that a 1.5B model can solve pre-math olympiad problems when it can't multiply 3-digit numbers. I was wrong, I guess. I then used openai's Deep Research to see if similar problems to those in AIME 2025 exist on the internet. And guess what? An identical problem to Q1 of AIME 2025 exists on Quora: https://t.co/Y3CAKQV4Sc I thought maybe it was just coincidence, and used Deep Research again on Problem 3. And guess what? A very similar question was on math.stackexchange: https://t.co/wLyvsbUih0 Still skeptical, I used Deep Research on Problem 5, and a near identical problem appears again on math.stackexchange: https://t.co/5iBbeiO9nK I haven't checked beyond that because the freaking p-value is too low already. Problems near identical to the test set can be found online. So, what--if anything--does this imply for Math benchmarks? And what does it imply for all the sudden hill climbing due to RL? I'm not certain, and there is a reasonable argument that even if something in the train-set contains near-identical but not exact copies of test data, it's still generalization. I am sympathetic to that. But, I also wouldn't rule out that GRPO is amazing at sharpening memories along with math skills. At the very least, the above show that data decontamination is hard. Never ever underestimate the amount of stuff you can find online. Practically everything exists online.

9

379

39

126

114K

0

1

0

213

apetresc retweeted

Grant Slatton

@GrantSlatton

over 1 year ago

@ESYudkowsky @MTabarrok @jonatanpallesen @MatthewJBar @gwern @TheZvi In other words, humans have a biological minimum wage of 100 watts, and economists have long known that minimum wages cause unemployment

6

286

25

40

30K

Adrian Petrescu @apetresc

over 1 year ago

If we want more people to correctly distinguish between Prisoner's Dilemmas and Stag Hunts, can we create a more logical story for the Stag Hunt? Like, Prisoner's Dilemmas reflect the game well, but why do hunters need to hunt stags simultaneously without any prior agreement?

0

109

Adrian Petrescu @apetresc

over 1 year ago

@lunarchstudios Bingo :) I think either the G9 or G11 move order works, forcing black to either give up the G14 group or let white break in at E6. Nice!

0

28

Adrian Petrescu @apetresc

over 1 year ago

From the currently-running round 3 semifinals in the OhMyGo Experimental Blitz between Ali Jabarin 2p and Lukáš Podpěra 7d, white has a very tricky tesuji here that both players missed. Can anyone here find it? 😁

apetresc's tweet photo. From the currently-running round 3 semifinals in the OhMyGo Experimental Blitz between Ali Jabarin 2p and Lukáš Podpěra 7d, white has a very tricky tesuji here that both players missed.

Can anyone here find it? 😁 https://t.co/rg6J1Sy99X

1

0

241

Adrian Petrescu @apetresc

over 1 year ago

@SeigoShogi Acquired! Thank you so much :)

0

1

0

31

Adrian Petrescu @apetresc

over 1 year ago

Whoops, that was supposed to be this video, not a screenshot!

0

77

Adrian Petrescu @apetresc

over 1 year ago

In honor of @Pebble's triumphant return, a throwback to the @maluubainc team when we got a full voice assistant running on that wonderful device. Posting to social media, sending texts, restaurant bookings, the works. This was in 2013. Almost two years before Apple Watch S0.

2

5

1

197

Adrian Petrescu @apetresc

over 1 year ago

@Pebble @maluubainc Speaking of Apple Watch, I'm pretty sure Siri still hasn't quite reached this level of entity disambiguation just yet. Maybe in iOS 19.

apetresc's tweet photo. @Pebble @maluubainc Speaking of Apple Watch, I'm pretty sure Siri still hasn't quite reached this level of entity disambiguation just yet. Maybe in iOS 19. https://t.co/ERMBMnITmR

1

0

112

Adrian Petrescu @apetresc

over 1 year ago

I'm gonna start a company focused on LLM benchmarking and validation purely so that I can name it "Strrawberry".

0

80

Adrian Petrescu

@apetresc

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users