Kеvіn Rіchаrd @512banque - Twitter Profile

la grande mode depuis quelques années c'est d'acheter du S&P500 pour se diversifier mais le top est ultra exposé à l'IA et est consanguin. dans un contexte où les entreprises du retail (starbucks, pizza hut) annulent leurs projets IA car complètement foireux, et où des modèles chinois open weight talonnent de près les modèles "frontier" américains.

512banque's tweet photo. la grande mode depuis quelques années c'est d'acheter du S&P500 pour se diversifier mais le top est ultra exposé à l'IA et est consanguin.

dans un contexte où les entreprises du retail (starbucks, pizza hut) annulent leurs projets IA car complètement foireux, et où des modèles chinois open weight talonnent de près les modèles "frontier" américains.

6

10

0

1

3K

Kеvіn Rіchаrd

@512banque

3 days ago

@dsampaolo karpathy avait sorti son "llm council" mais ça génère + de bruit qu'autre chose à l'usage, non?

3

1

0

1

6K

Kеvіn Rіchаrd

@512banque

3 days ago

je suis vraiment un skeptic

Jeffrey Snover

@jsnover

4 days ago · Wiesbaden

In my Harvard fellowship I study the views of AI accelerationists, safetyists and skeptics. What I have come to realize is that both the Accelerationists and the Safetyists believe that we are creating an AI God. The difference is that Accelerationists believe that it is the god of the New testament. A god of loving kindness. The Safetyists believe that it is the god of the Old testament. The jealous one who told Abraham to kill his son, destroyed Sodom and Gomorrah, and killed everybody in the flood. The Skeptics think it's just a damn toaster with more knobs.

81

527

108

261

146K

2

1

0

1

842

Kеvіn Rіchаrd

@512banque

3 days ago

@LuKg0v1 essaie pi, c'est pas une silverbullet mais j'ai obtenu de meilleurs résultats parfois (et surtout une meilleure utilisation du cache). et paradoxalement tente un coup ds4 flash, il passe parfois là où le pro ne passe pas (oui oui wtf).

0

107

Kеvіn Rіchаrd

@512banque

4 days ago

testez sur vos cas, vous verrez

Tim Jayas

@TimJayas

4 days ago

DeepSeek just EMBARRASSED Claude Opus 4.7 Just switched to DeepSeek V4 Pro for a few days Cost: DeepSeek V4 Pro = $2.02 Claude Opus 4.7 = $265.21 Same quality for most of the medium tasks with no noticeable difference in output I know it’s hosted in China with cheap electricity but at this point western labs are getting cooked on price

TimJayas's tweet photo. DeepSeek just EMBARRASSED Claude Opus 4.7

Just switched to DeepSeek V4 Pro for a few days

Cost:
DeepSeek V4 Pro = $2.02
Claude Opus 4.7 = $265.21

Same quality for most of the medium tasks with no noticeable difference in output

I know it’s hosted in China with cheap electricity but at this point western labs are getting cooked on price

83

608

46

133

63K

6

35

1

45

18K

Kеvіn Rіchаrd

@512banque

4 days ago

@LuKg0v1 tu utilises quel harness avec ?

1

0

529

Kеvіn Rіchаrd

@512banque

6 days ago

@DamienLusson voilàààààà

0

3

0

4

3K

Kеvіn Rіchаrd

@512banque

6 days ago

une catastrophe ce Opus 4.8 c'est une vraie sainte-nitouche. ils veulent pousser Mythos à fond ces coquinous

5

45

0

11

22K

Kеvіn Rіchаrd

@512banque

6 days ago

3

5

0

2

3K

Kеvіn Rіchаrd

@512banque

6 days ago

@ZamHaberAjans hangi gta modu bu kardesim

1

0

977

Kеvіn Rіchаrd

@512banque

7 days ago

ok gratuit carrément

OpenCode

@opencode

7 days ago

OpenCode x MiMo V2.5 - Free for a limited time 1M context • reasoning • text • image

152

4K

248

904

348K

1

6

0

14

5K

Kеvіn Rіchаrd

@512banque

8 days ago

@Alex_Car12 @bmercusot ou alors d'hoster deepseek chez un cloud provider européen

1

0

81

Kеvіn Rіchаrd

@512banque

8 days ago

I got tired of abstract AI benchmarks that rank models in isolation. Users don't run a model. They run a full loop: model + harness + tools + retries + cache + prompts. So I ran 27 tasks that look like my real work across different coding-agent harnesses, 5 times each to reduce variance. I also wanted to create my own tasks to avoid the problem of benchmaxxing. Result: near-identical pass rates, wildly different bills. Codex/Claude costs are API-equivalent because I use subscriptions. But at public API prices, one Codex setup charts at ~420× the cost of Pi + DeepSeek V4 Flash for the same strict score. The lesson: the harness is a huge part of the value you feel as a user. And when some loops are this cheap, the optimal strategy changes: you can afford retries, parallel attempts, and verification passes instead of betting everything on one expensive first shot. Don't trust my tasks. Run it on yours.

512banque's tweet photo. I got tired of abstract AI benchmarks that rank models in isolation.

Users don't run a model. They run a full loop: model + harness + tools + retries + cache + prompts.

So I ran 27 tasks that look like my real work across different coding-agent harnesses, 5 times each to reduce variance. I also wanted to create my own tasks to avoid the problem of benchmaxxing.

Result: near-identical pass rates, wildly different bills.

Codex/Claude costs are API-equivalent because I use subscriptions. But at public API prices, one Codex setup charts at ~420× the cost of Pi + DeepSeek V4 Flash for the same strict score.

The lesson: the harness is a huge part of the value you feel as a user. And when some loops are this cheap, the optimal strategy changes: you can afford retries, parallel attempts, and verification passes instead of betting everything on one expensive first shot.

Don't trust my tasks. Run it on yours.

10

41

14

21

9K

Kеvіn Rіchаrd

@512banque

8 days ago

@lludol i displayed it on the repo with this chart 😅

2

0

27

Kеvіn Rіchаrd

@512banque

8 days ago

@bmercusot

1

0

177

Kеvіn Rіchаrd

@512banque

8 days ago

@nextgenai_fr le harness est un énorme multiplicateur mais oui c'est un produit entre 2 trucs. je dis pas d'utiliser haiku dans claude code hein

0

1

0

36

Kеvіn Rіchаrd

@512banque

8 days ago

le harness est tellement la clé de voûte de tout qu'on voit que les mecs sont en train de reculer et essaient de le cacher de + en +: par exemple pour claude design, tout est hosté en ligne ce qui fait qu'on n'a pas accès au harness de manière visible comme c'est le cas pour claude code. je suis persuadé qu'il y a moyen d'obtenir les mêmes résultats que claude design avec le bon prompt et le bon harness et les bonnes itérations en boucle. Ya "plus qu'à" reverse engineerer ça jusqu'à tomber sur les bonnes itérations. en nous filant le CLI claude code sur nos ordinateurs, Anthropic a explosé mais s'est en même temps tiré une balle dans le pied sur le long terme. C'est littéralement le truc le plus important actuellement. De beaux jours s'annoncent pour les utilisateurs...

Kеvіn Rіchаrd

@512banque

8 days ago

Le truc à retenir, c'est que le harness est encore plus important que le modèle. Avec des modèles qui ne coûtent virtuellement plus rien, alors la stratégie change totalement et ça devient rentable de spammer si le prompt et la logique de debug/de retry est bonne derrière.

1

17

1

6

6K

4

13

1

13

3K

Kеvіn Rіchаrd

@512banque

8 days ago

i was schocked by the "135 turns" by codex too. but it's a different way of counting it, it will basically go through all the tasks and use multiple tools but will count 1 turn only. I have no way to count the actual "turns" in the same fashion as the others. but codex was slow af (the larger the bar, the slower).

512banque's tweet photo. i was schocked by the "135 turns" by codex too. but it's a different way of counting it, it will basically go through all the tasks and use multiple tools but will count 1 turn only.

I have no way to count the actual "turns" in the same fashion as the others. but codex was slow af (the larger the bar, the slower).

1

0

208

Kеvіn Rіchаrd

@512banque

8 days ago

@Capetlevrai j'ai eu de moins bons résultats avec GLM qu'avec deepseek mais sur mes usecases à moi. https://t.co/0lu0Vr6Zn6

Kеvіn Rіchаrd

@512banque

8 days ago

I got tired of abstract AI benchmarks that rank models in isolation. Users don't run a model. They run a full loop: model + harness + tools + retries + cache + prompts. So I ran 27 tasks that look like my real work across different coding-agent harnesses, 5 times each to reduce variance. I also wanted to create my own tasks to avoid the problem of benchmaxxing. Result: near-identical pass rates, wildly different bills. Codex/Claude costs are API-equivalent because I use subscriptions. But at public API prices, one Codex setup charts at ~420× the cost of Pi + DeepSeek V4 Flash for the same strict score. The lesson: the harness is a huge part of the value you feel as a user. And when some loops are this cheap, the optimal strategy changes: you can afford retries, parallel attempts, and verification passes instead of betting everything on one expensive first shot. Don't trust my tasks. Run it on yours.

10

41

14

21

9K

1

0

209

Kеvіn Rіchаrd

@512banque

9 days ago

vous avez pas conscience du bordel qui arrive avec deepseek...

29

139

1

112

116K

Kеvіn Rіchаrd

@512banque

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users