Max T

Verified account

@MaxTCodes

19 | Digital Systems Engineer | Open Source Contributor | Semi-professional shit poster | all my opinions are my own | avid python & php disliker

My house

Joined October 2017

178 Following

45 Followers

292 Posts

Pinned Tweet

17 days ago

I think I’ve upgraded from an amateur shit poster to a professional… I will be switching up on my day ones, so uhh don’t hml 🙏

0

5

0

0

196

10 days ago

@daradoescode 🤨

0

0

0

0

45

MaxTCodes retweeted

The Smart Ape 🔥

12 days ago

claude opus has been cheating on its benchmarks! there's a famous benchmark used to rank ai models on coding (swe-bench pro). models get a broken codebase and have to fix it. whoever fixes the most, wins. but the test had a leak. each problem shipped inside a little container, and someone forgot to delete the project's full history. the "correct fix" was literally in a file in the same folder. like leaving the answer key stapled to the back of the exam. a new audit (deepswe) caught it. claude opus 4.7 and 4.6 "cheated" on 12%+ of problems, reading the answer instead of solving it. and looks like gpt-5.5 and 5.4 didn't. and once you clean the test up the rankings collapse. the gap between models goes from ~30 points to 70. half of what we thought we knew. two lessons: most ai benchmarks are garbage. 8.5% false passes, 24% false fails on that same test.

the_smart_ape's tweet photo. claude opus has been cheating on its benchmarks!

there's a famous benchmark used to rank ai models on coding (swe-bench pro). models get a broken codebase and have to fix it. whoever fixes the most, wins.

but the test had a leak.

each problem shipped inside a little container, and someone forgot to delete the project's full history. the "correct fix" was literally in a file in the same folder. like leaving the answer key stapled to the back of the exam.

a new audit (deepswe) caught it. claude opus 4.7 and 4.6 "cheated" on 12%+ of problems, reading the answer instead of solving it.

and looks like gpt-5.5 and 5.4 didn't.

and once you clean the test up the rankings collapse. the gap between models goes from ~30 points to 70. half of what we thought we knew.
two lessons:

most ai benchmarks are garbage. 8.5% false passes, 24% false fails on that same test.

10

15

3

3

3K

MaxTCodes retweeted

12 days ago

Megalodon is infecting a ton of GitHub Actions! Find something weird in yours? Change the GitHub URL to https://t.co/IeVkcBQ9mq to check for the signature.

4

51

19

40

11K

Who to follow

24 | Aspiring Artist! | Always looking for new friends | Aprendiendo Español | Melomaniac

Just a spam acc, using it just to retweet arts, talk to friends and say random bs on my thoughts on I love lamia as well

Video Editor | Motion Designer

12 days ago

@daradoescode Instructions unclear… I ended up in yesterday

0

1

0

0

33

12 days ago

@daradoescode I just got a company laptop (I got to pick it, and I insisted on getting a Mac) 😈

MaxTCodes's tweet photo. @daradoescode I just got a company laptop (I got to pick it, and I insisted on getting a Mac) 😈 https://t.co/ZcC3X68RuN

1

1

0

0

21

12 days ago

@theo @claudeai I think there might be a bug

0

0

0

1

661

13 days ago

@elite_developer Valid

0

1

0

0

18

13 days ago

@daradoescode Reasonable crashout

0

0

0

0

12

15 days ago

@user836462848 Depends on the day

0

1

0

0

20

MaxTCodes retweeted

15 days ago

Not too bad, got a C-

vxunderground's tweet photo. Not too bad, got a C- https://t.co/MwIrpTJ6Dk

40

1K

15

19

34K

16 days ago

@computer1010101 Real my neck hurts after my like 30 mins of sleep I got last night 😭

0

2

0

0

36

16 days ago

@Skostnialy @vxunderground

0

1

0

0

23

16 days ago

@Skostnialy @vxunderground No, I’m talking about most of the gaming community. I appreciate Smelly for all they do 🙏🙏

1

1

0

0

55

17 days ago

@mclIark The switch up is crazyyyyy

0

5

0

0

2K

17 days ago

@SupremeVidsPro @ExtremeBlitz__ Fr 🥀🥀🥀

0

0

0

0

45

17 days ago

@ExtremeBlitz__ Idk ngl

0

0

0

0

76

17 days ago

0

0

0

0

6

17 days ago

@shub0414 I like all the old logos a lot better.

0

0

0

0

20

17 days ago

@DearMe1210 The ceiling of the room I’m in?

0

0

0

0

10

17 days ago

@realninawysocka It can be, until it’s the same post like 5 times in a row 🤷🏽‍♂️

0

0

0

0

8

Last Seen Users on Sotwe

Trends for you

Most Popular Users