Emirhan Erkan @permaximum88 - Twitter Profile

Pinned Tweet

7 days ago

Claude Opus 4.8 is an incremental but noticable improvement and leads the Singularity Gate with 20.47%. But still no model fully predicts a discovery. Opus 4.7 is 2nd, GPT-5.5 is 3rd.

permaximum88's tweet photo. Claude Opus 4.8 is an incremental but noticable improvement and leads the Singularity Gate with 20.47%. But still no model fully predicts a discovery. Opus 4.7 is 2nd, GPT-5.5 is 3rd. https://t.co/21Cwhyzk93

1

0

1

999

Emirhan Erkan

@permaximum88

5 days ago

@VictorTaelin @fchollet He totally used an LLM for that one lol :D

0

1

0

202

Emirhan Erkan

@permaximum88

5 days ago

@haider1 Anthropic co-founder Jack Clarks also think end of 2028 is the point we reach rescursive self improvement. https://t.co/DKKLYLORTb

Jack Clark

@jackclarkSF

about 1 month ago

I've spent the past few weeks reading 100s of public data sources about AI development. I now believe that recursive self-improvement has a 60% chance of happening by the end of 2028. In other words, AI systems might soon be capable of building themselves.

289

4K

502

2K

2M

0

140

Emirhan Erkan

@permaximum88

5 days ago

@scaling01 It seems the first jump was related to the department of war fiasco both the Trump administration and OpenAI caused though, in the beginning of March.

0

73

Who to follow

Ali Hassan Takkar

@hassan_takkar

Child Rights Advocate | Social Sciences Student | Sports Analyst & Cricket Expert | Writer on Social Issues and Sports With a strong commitment to child em

One Degree

@MyOneDegree

OD is a London #mentoring #charity that has been transforming the academic performance & self belief of students with unrealised potential since 2009

Peter Attenborough

@PeterAttenborou

I couldda been someone

Emirhan Erkan

@permaximum88

5 days ago

@bindureddy They should also make the model smarter, and more agentic. Opus and Claude Code feels more like an actual smart agent that know you and your intent.

0

2

0

528

Emirhan Erkan

@permaximum88

5 days ago

@arcprize No surprises here as my own benchmark, the Singularity Gate, has shown Claude Opus 4.8 is clearly smarter than other models as well. And it's at "high" effort. Not max or even xhigh. https://t.co/x4Bl6jjwow

Emirhan Erkan

@permaximum88

7 days ago

Claude Opus 4.8 is an incremental but noticable improvement and leads the Singularity Gate with 20.47%. But still no model fully predicts a discovery. Opus 4.7 is 2nd, GPT-5.5 is 3rd.

1

0

1

999

0

784

Emirhan Erkan

@permaximum88

5 days ago

No surprises here as my own benchmark, the Singularity Gate, has shown Claude Opus 4.8 is clearly smarter than other models as well. And it's at "high" effort. Not max or even xhigh.

ARC Prize

@arcprize

5 days ago

Anthropic Opus 4.8 is new SOTA on ARC-AGI-3 Score: 1.5%, ~$10K ARC-AGI-3 analysis notes: * Opus 4.8 read the environment an abstraction *above* Opus 4.7, as objects & systems, not pictures * Opus 4.8 succeeded on early levels, but still committed to a wrong sub-goal

arcprize's tweet photo. Anthropic Opus 4.8 is new SOTA on ARC-AGI-3

Score: 1.5%, ~$10K

ARC-AGI-3 analysis notes:
* Opus 4.8 read the environment an abstraction *above* Opus 4.7, as objects & systems, not pictures
* Opus 4.8 succeeded on early levels, but still committed to a wrong sub-goal https://t.co/PkQQ1u8NaX

53

1K

115

168

126K

0

97

Emirhan Erkan

@permaximum88

6 days ago

@fbneistersen Pierre Sage'yi Crystal Palace almak üzere. Ne yapın edin devreye girip getirin. 100% şampiyon yapar, rahat yapar. An itibariyle açık ara farkla dünyadaki en iyi teknik direktör. Yapay zeka şirketimde zaten teknik direktör, santrafor, ve kaleci önerileri yapıyoruz.

0

23

Emirhan Erkan

@permaximum88

6 days ago

@yagosabuncuoglu Adam 23 nisanda görüştüm diyor. Nereye görüşeceğini açıklamış? Seçimi kazanırsa Hakan Safi, Fenerbahçe'yle alakalı her yerden men edilmelisin sen, tatlı su kurnazı seni. @fbneistersen

0

2

0

288

Emirhan Erkan

@permaximum88

7 days ago

In the Singularity Gate, Claude Opus 4.8 at 'xhigh' performed worse than its 'max effort' setting, matching Opus 4.7's max effort. Since the benchmark only tracks each model's best configuration (highest effort, agentic harness & tool use allowed), its results have been excluded.

Emirhan Erkan

@permaximum88

7 days ago

Claude Opus 4.8 is an incremental but noticable improvement and leads the Singularity Gate with 20.47%. But still no model fully predicts a discovery. Opus 4.7 is 2nd, GPT-5.5 is 3rd.

1

0

1

999

0

80

Emirhan Erkan

@permaximum88

7 days ago

Claude Opus 4.8 leads in four of five scientific fields; GPT-5.5 leads in Physics & Astronomy. Per-field breakdown is below. For more information about the Singularity Gate head over to the site or check the paper: Paper: https://t.co/miqCc6YPq9 Website: https://t.co/E4w7puXzFa

permaximum88's tweet photo. Claude Opus 4.8 leads in four of five scientific fields; GPT-5.5 leads in Physics & Astronomy. Per-field breakdown is below.
For more information about the Singularity Gate head over to the site or check the paper:
Paper: https://t.co/miqCc6YPq9
Website: https://t.co/E4w7puXzFa https://t.co/BLbtIBlBAf

0

1

0

81

Emirhan Erkan

@permaximum88

7 days ago

Claude Opus 4.8 is an incremental but noticable improvement and leads the Singularity Gate with 20.47%. But still no model fully predicts a discovery. Opus 4.7 is 2nd, GPT-5.5 is 3rd.

1

0

1

999

Emirhan Erkan

@permaximum88

7 days ago

We've seen a steady improvement with Claude Opus models in the Singularity Gate. They're getting closer to fully predicting a discovery. We'll probably see the first ones with Mythos.

permaximum88's tweet photo. We've seen a steady improvement with Claude Opus models in the Singularity Gate. They're getting closer to fully predicting a discovery. We'll probably see the first ones with Mythos. https://t.co/TbzvFNSq8z

1

0

88

Emirhan Erkan

@permaximum88

8 days ago

The Singularity Gate results for the new Claude Opus 4.8 is coming in the next 24 hours! The contamination audit flagged a few discoveries for Opus 4.8, so those're removed from the corpus. Because of that small score changes for all models should be expected.

0

33

Emirhan Erkan

@permaximum88

8 days ago

@fbneistersen Geç kalırsanız Liverpool önümüzdeki ay içinde alır haberiniz olsun.

0

17

Emirhan Erkan

@permaximum88

8 days ago

@fbneistersen Daha önce de mesleğimden bahsetmiştim. Pierre Sage şu an Fenerbahçe (ve birçok takım) için dünyadaki en iyi teknik direktör. Ne kadara ikna ediyorsanız edin. Takım değerini 1.5 ile çarpacak kadar etkisi var. 300 milyon euroluk takımdan 450 milyon euroluk performans alır.

1

0

92

Emirhan Erkan

@permaximum88

8 days ago

@petergostev I was expecting this. The model is incredibly honest and grounded.

0

196

Emirhan Erkan

@permaximum88

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users