Pierre ✺ @pgllmt - Twitter Profile

Pierre ✺ @pgllmt

9 days ago

@UlrichRozier Merci pour ce partage de qualité 👌

0

1

0

137

Pierre ✺ @pgllmt

11 days ago

Loop engineering : du prompt manuel aux boucles vérifiables https://t.co/1TrLO5HmvW via @LinkedIn

0

27

pgllmt retweeted

Midjourney

@midjourney

11 days ago

Announcing a new division of Midjourney called "Midjourney Medical"

3K

40K

6K

11K

18M

Pierre ✺ @pgllmt

13 days ago

Fable 5 : le jour où l’accès à l’IA est devenu un sujet de souveraineté https://t.co/plu4WNSEnn

0

58

Who to follow

Michaël Garcia

@grc_michael

French interactive developer @madewithgsap @mymind

Shopify e-commerce studio. Based in France. Working worldwide.

pgllmt retweeted

Artificial Analysis

@ArtificialAnlys

17 days ago

We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task. The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others. More below.

ArtificialAnlys's tweet photo. We've updated the Artificial Analysis Coding Agent Index, replacing SWE-Bench Pro with Datacurve's DeepSWE benchmark - the swap lifts Codex with GPT-5.5 (xhigh) above Claude Code with Opus 4.8 (max), while the newly released Claude Fable 5 (max) in Claude Code debuts at the top

DeepSWE, built by @datacurve, writes its tasks from scratch rather than adapting them from public GitHub issues or pull requests, so no model has seen the solutions during training. That matters because SWE-Bench Pro, the benchmark it replaces in our Coding Agent Index, had grown gameable, with some models recovering the fix from the repository's commit history instead of solving the task.

The swap reorders the index: Codex with GPT-5.5 (xhigh) rises from 65 to 76, overtaking Claude Code with Opus 4.8 (max) at 73. Claude Code with Fable 5 (max), which enters directly on the refreshed index, leads at 77. SWE-Bench Pro had been flattering some combinations and penalizing others.

More below.

114

2K

184

414

570K

Pierre ✺ @pgllmt

17 days ago

Claude Fable 5 : un vrai saut de génération, mais à quelles conditions ? https://t.co/QcOFucl3Dr

0

15

Pierre ✺ @pgllmt

20 days ago

Les benchmarks IA pour le code sont-ils en train de nous induire en erreur ? https://t.co/p250NAX8CS via @LinkedIn

0

1

0

26

Pierre ✺ @pgllmt

24 days ago

8/ Source 👇 https://t.co/blGPsTNC9g #AI #LLM #AgenticAI #Nemotron

0

4

Pierre ✺ @pgllmt

24 days ago

1/ NVIDIA sort Nemotron 3 Ultra : un modèle ouvert pensé pour orchestrer des agents long-running. MoE 550B params (55B actifs), contexte jusqu'à 1M tokens. L'angle n'est pas "un gros modèle de plus" mais "le modèle qui prend les décisions difficiles dans un système d'agents". 🧵

1

0

21

Pierre ✺ @pgllmt

24 days ago

7/ Verdict : sorti important pour l'écosystème agentique open-weight, MOPD en particulier. Mais à valider en pratique : benchmarks indépendants, coût réel par tâche, qualité sur vrais codebases, stabilité en sessions longues, latence via providers.

1

0

6

pgllmt retweeted

Kaiba

@KaibaRH

25 days ago

folks @nousresearch are the only lab that thinks software should be beautiful. been in my head, so i did something about it @Teknium ↓

KaibaRH's tweet photo. folks @nousresearch are the only lab that thinks software should be beautiful. been in my head, so i did something about it @Teknium ↓ https://t.co/tFWYup6WYU

31

684

27

222

58K

Pierre ✺ @pgllmt

26 days ago

Action concrète pour aujourd'hui : Ouvre tes fichiers AGENTS.md / CLAUDE.md / system prompts. Demande-toi : est-ce que c'est toujours vrai ? Toujours utile ? Si tu regardes ce thread, tu as probablement déjà de la dette prompt à traiter. Commence maintenant. 🧹

0

11

Pierre ✺ @pgllmt

26 days ago

Tu penses que ta dette technique, c'est ton code legacy ? Tu te trompes ! Il y a un type de dette que personne ne surveille et qui te coûte déjà. 🧵 Thread :

1

0

34

Pierre ✺ @pgllmt

26 days ago

Philosophie Unix appliquée à l'IA : Démarre avec le minimum. Ajoute uniquement ce qui résout un problème réel que tu as maintenant. Supprime dès que tu peux. Écris tes prompts toi-même. Et supprime-les dès que c'est possible.

1

0

12

Pierre ✺

@pgllmt

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users