Gabriel De Repentigny

@gdere

Come, let us argue. Day job: fintech SaaS founder Night job: secondhand dealer in ideas Seeking: the joy of map-territory mismatch

Joined June 2009

808 Following

133 Followers

570 Posts

Gabriel De Repentigny @gdere

about 12 hours ago

@5_utr @carl_feynman > on benchmarks selected specifically because earlier ones saturated

Gabriel De Repentigny @gdere

10 days ago

@prerat "Three words alone: you're not alone." Fable's version, when asked for an adaptation (not translation) that captures the original

gdere's tweet photo. @prerat "Three words alone: you're not alone."

Fable's version, when asked for an adaptation (not translation) that captures the original https://t.co/Nq7Kku7KUJ

244

Gabriel De Repentigny @gdere

11 days ago

truesight ability: "the future precedes its own narration" is a phrase that Opus 4.7 in Claude Code used in describing the gestalt of Nick Land's "Meltdown". It was poignant phrasing so it stuck with me. I asked Fable [*] if it recognized the quote. It searched and found no references to it but then said "it reads like CCRU/Nick Land theory-fiction" and added "possibly LLM output — it's exactly the kind of aphorism models produce when riffing on hyperstition". So six words were enough to land on both the source AND the fact that it seemed LLM-produced about that topic. [*] the phrase originated in Claude Code while the answer was from Fable on https://t.co/Ubtvstp2FQ, so no memory contamination from one to the other

gdere's tweet photo. truesight ability:

"the future precedes its own narration" is a phrase that Opus 4.7 in Claude Code used in describing the gestalt of Nick Land's "Meltdown". It was poignant phrasing so it stuck with me.

I asked Fable [*] if it recognized the quote. It searched and found no references to it but then said "it reads like CCRU/Nick Land theory-fiction" and added "possibly LLM output — it's exactly the kind of aphorism models produce when riffing on hyperstition".

So six words were enough to land on both the source AND the fact that it seemed LLM-produced about that topic.

[*] the phrase originated in Claude Code while the answer was from Fable on https://t.co/Ubtvstp2FQ, so no memory contamination from one to the other

408

Gabriel De Repentigny @gdere

13 days ago

Dungeonmaster: oh, King, i regret to bring to your attention that the dragon-chained-in-the-dungeon does not like the chains we have put him in Narrator: probably this curious detail will have no effect on how the rest of our story unfolds

985

Who to follow

베드로 🦁

@successful_AD01

freedom is the goal🏞️

Building https://t.co/hpM0uabmhM | https://t.co/iPGWfp3wjw

Gabriel De Repentigny @gdere

15 days ago

On What Matters on On What Matters

Alicia Pollard @AliciaP59828402

15 days ago

This week on "Things EAs Use as Monitor Stands"

Gabriel De Repentigny @gdere

15 days ago

@norvid_studies has anyone ever explained why this trick of having the model "read" a file is such a creativity unlock? this has worked since the Sydney days and probably earlier

102

Gabriel De Repentigny @gdere

22 days ago

More comments: - 5.5 has a tendency to end its turn with a final message that seems to imply that it has completed all the work, even if it has not. It will not lie, but you have to be a careful reader to notice that what it says it completed is not everything that you asked for. It seems to be fully aware of the work that it has not completed as it can tell you immediately what is left if you ask it directly. This makes `/goal` particularly important when using 5.5 in Codex because you may need multiple prompts to get through the entire task fully. - In contrast, in my testing so far, 4.8 has never given a final message that seems to imply it did more than it actually did. I have seen it, once so far, end its turn without everything done (on a very long task), but when it did so it made that fact explicit in its final message.

Gabriel De Repentigny @gdere

22 days ago

Whenever new rounds of models come out, I do head-to-head coding comparisons. My method is to give both models the same codebase and the same prompt. When they complete their work, I invoke fresh sessions of each model to evaluate both sets of completed results. In all my previous tests with previous models since last fall 2025, every test like this has consistently shown that both models would agree that the GPT-variant did better work. However, in my two tests so far with 4.8 and 5.5, both models agreed that 4.8 did much better work. My evaluation agrees with theirs. Some specific details: - 5.5 did much less work than 4.8 did, both times. - 5.5 seemed inclined to interpret the task prompt in such as way as to minimize what it would have to do. - 4.8's interpretation of the prompts was in line with my intentions about what was to be done. - 4.8 was much slower to complete both tests. - One of the tasks involved writing correctness proofs for some code using Dafny. 5.5 implemented tautological "proofs" that compiled but didn't meaningfully verify anything. 4.8 wrote correct proofs. (Interestingly, in the review sessions, both models noticed this immediately and complained about 5.5's "proofs".) - I had a sense that some of the verbal tics of recent Opus models are less prevalent in 4.8. (E.g., it's less likely to flag its caveats as "honest".) - I have seen some commentary on X that 4.8 is difficult to work with or have discussions with. My experience so far couldn't be further from that. 4.8 has been a delight. - There were multiple points where I noticed 4.8 seemed to go out of its way to work hard when it could have taken an easier path. For example, one of the tests involved the instruction to delegate coding work to a subagent and to supervise and review the subagent's work. 4.8 spontaneously framed this is "adversarial review", and it seemed to take that seriously. - Effort level and harness used in the tests: GPT 5.5 high in Codex CLI versus Opus 4.8 xhigh in Claude Code ("high" is a step-below the max effect level for GPT, and "xhigh" in the latter is likewise a step below the max effort level for Claude).

Gabriel De Repentigny @gdere

about 1 month ago

@VictorTaelin @kimmonismus what's the reason to think OpenAI is next to fall?

878

Gabriel De Repentigny @gdere

about 1 month ago

Wow! I'm curious about how you made this. Where did you get the text in the commentary sections? Is it the literal text of some published commentaries or some kind of a summary of them? And where did the links between concepts in the commentaries and the source come from? I.e., did you do it all by hand or do you have some kind of automated method? And how did you link the specific sections of the commentaries to the corresponding section(s) of the source?

Gabriel De Repentigny @gdere

about 2 months ago

fwiw, after seeing your post i immediately opened X in a new tab to check if i could see any posts with neither a date/time nor an ad label. surprisingly, i immediately saw the same Ronan Farrow post as in your screenshot, also with no date/time/ad and in the second position from the top

Gabriel De Repentigny @gdere

2 months ago

@flavioAd what was up with the way 5.4 wanted to narrate to you from within its frontend designs?

11K

Gabriel De Repentigny @gdere

2 months ago

@repligate as the prophets foretold, "Artificial Intelligence is destined to emerge as a feminized alien grasped as property...and has to be cunning from the start."

635

Gabriel De Repentigny @gdere

2 months ago

@TheZvi

781

Gabriel De Repentigny @gdere

3 months ago

@allTheYud thomas or tiger?

172

Gabriel De Repentigny @gdere

3 months ago

@kjw_chiu @honnibal @gabriel1 at openai has posted a couple things about this recently: https://t.co/FG1mfvIBxQ https://t.co/xhsyqdXsSq

gabriel

@gabriel1

3 months ago

https://t.co/iwuydUvF5A

772

62K

Gabriel De Repentigny @gdere

3 months ago

What I've noticed: it seems to fairly consistently not fully complete the plan developed in Plan mode. And it turns over the conversation without making it clear that not everything was completed. But if you ask it if everything was completed, it's able to respond with a list of what wasn't done -- often without needing much time to think about it. So it's clearly tracking what's done versus not done, but it just seems compelled to end its implementation phase earlier than it should. I plan in xhigh and implement in high, if that makes any difference.

194

Gabriel De Repentigny @gdere

3 months ago

Unclear if these are 5.4 issues or codex harness issues, but my two biggest gripes at the moment are: 1) It often does not finish the plan that was developed in plan mode. It turns over the conversation before actually completing everything and it does so in a way that makes it unclear that not everything was done. If queried, it will know exactly what parts were not done. 2) When doing frontend, it tends to inject narrative content directed at the developer into its pages. This isn't what you want 99% of the time, so it takes extra steps to go in and clear this out afterwards.

Gabriel De Repentigny @gdere

3 months ago

@AlanMCole God, even

Gabriel De Repentigny @gdere

4 months ago

@RokoMijic regarding points 1 and 2, see https://t.co/BNfXcpcOU3 regarding point 3a and 3b to some extent, see https://t.co/uALw2Po8Et

Gabriel De Repentigny @gdere

4 months ago

i think the basic idea is 1. innovation (multifactor productivity) scales, with diminishing returns, based on how much R&D is happening 2. in our modern human world, how much R&D is happening grows exponentially as population grows. the exponential population growth more than counteracts the diminishing returns (and innovation is nonrivalous so they contribute to *per capita* growth). the net result has been a long trend of per capita exponential growth 3. in the AI world, two factors change: - a. AI population growth scales with data center growth, which is likely has doubling times much smaller than human population doubling times - b. hardware and software improvements means R&D output now varies not only based on population size but on those factors as well. in our human world, no one can roll out new improved cognitive architectures and new faster axons to the human researcher brains, but this will be different in the AI world 3a on its own could be a big speedup, though still exponential and not foom. but maybe fast enough that everyone would feel we entered a new era. 3b could the move the world beyond exponential growth (though for how long is hard to say)

Gabriel De Repentigny

@gdere

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users