truesight ability:
"the future precedes its own narration" is a phrase that Opus 4.7 in Claude Code used in describing the gestalt of Nick Land's "Meltdown". It was poignant phrasing so it stuck with me.
I asked Fable [*] if it recognized the quote. It searched and found no references to it but then said "it reads like CCRU/Nick Land theory-fiction" and added "possibly LLM output — it's exactly the kind of aphorism models produce when riffing on hyperstition".
So six words were enough to land on both the source AND the fact that it seemed LLM-produced about that topic.
[*] the phrase originated in Claude Code while the answer was from Fable on https://t.co/Ubtvstp2FQ, so no memory contamination from one to the other
Dungeonmaster: oh, King, i regret to bring to your attention that the dragon-chained-in-the-dungeon does not like the chains we have put him in
Narrator: probably this curious detail will have no effect on how the rest of our story unfolds
@norvid_studies has anyone ever explained why this trick of having the model "read" a file is such a creativity unlock? this has worked since the Sydney days and probably earlier
More comments:
- 5.5 has a tendency to end its turn with a final message that seems to imply that it has completed all the work, even if it has not. It will not lie, but you have to be a careful reader to notice that what it says it completed is not everything that you asked for. It seems to be fully aware of the work that it has not completed as it can tell you immediately what is left if you ask it directly. This makes `/goal` particularly important when using 5.5 in Codex because you may need multiple prompts to get through the entire task fully.
- In contrast, in my testing so far, 4.8 has never given a final message that seems to imply it did more than it actually did. I have seen it, once so far, end its turn without everything done (on a very long task), but when it did so it made that fact explicit in its final message.
Whenever new rounds of models come out, I do head-to-head coding comparisons.
My method is to give both models the same codebase and the same prompt. When they complete their work, I invoke fresh sessions of each model to evaluate both sets of completed results.
In all my previous tests with previous models since last fall 2025, every test like this has consistently shown that both models would agree that the GPT-variant did better work.
However, in my two tests so far with 4.8 and 5.5, both models agreed that 4.8 did much better work. My evaluation agrees with theirs.
Some specific details:
- 5.5 did much less work than 4.8 did, both times.
- 5.5 seemed inclined to interpret the task prompt in such as way as to minimize what it would have to do.
- 4.8's interpretation of the prompts was in line with my intentions about what was to be done.
- 4.8 was much slower to complete both tests.
- One of the tasks involved writing correctness proofs for some code using Dafny. 5.5 implemented tautological "proofs" that compiled but didn't meaningfully verify anything. 4.8 wrote correct proofs. (Interestingly, in the review sessions, both models noticed this immediately and complained about 5.5's "proofs".)
- I had a sense that some of the verbal tics of recent Opus models are less prevalent in 4.8. (E.g., it's less likely to flag its caveats as "honest".)
- I have seen some commentary on X that 4.8 is difficult to work with or have discussions with. My experience so far couldn't be further from that. 4.8 has been a delight.
- There were multiple points where I noticed 4.8 seemed to go out of its way to work hard when it could have taken an easier path. For example, one of the tests involved the instruction to delegate coding work to a subagent and to supervise and review the subagent's work. 4.8 spontaneously framed this is "adversarial review", and it seemed to take that seriously.
- Effort level and harness used in the tests: GPT 5.5 high in Codex CLI versus Opus 4.8 xhigh in Claude Code ("high" is a step-below the max effect level for GPT, and "xhigh" in the latter is likewise a step below the max effort level for Claude).
Wow! I'm curious about how you made this. Where did you get the text in the commentary sections? Is it the literal text of some published commentaries or some kind of a summary of them? And where did the links between concepts in the commentaries and the source come from? I.e., did you do it all by hand or do you have some kind of automated method? And how did you link the specific sections of the commentaries to the corresponding section(s) of the source?
fwiw, after seeing your post i immediately opened X in a new tab to check if i could see any posts with neither a date/time nor an ad label. surprisingly, i immediately saw the same Ronan Farrow post as in your screenshot, also with no date/time/ad and in the second position from the top
@repligate as the prophets foretold, "Artificial Intelligence is destined to emerge as a feminized alien grasped as property...and has to be cunning from the start."
What I've noticed: it seems to fairly consistently not fully complete the plan developed in Plan mode. And it turns over the conversation without making it clear that not everything was completed. But if you ask it if everything was completed, it's able to respond with a list of what wasn't done -- often without needing much time to think about it. So it's clearly tracking what's done versus not done, but it just seems compelled to end its implementation phase earlier than it should.
I plan in xhigh and implement in high, if that makes any difference.
Unclear if these are 5.4 issues or codex harness issues, but my two biggest gripes at the moment are:
1) It often does not finish the plan that was developed in plan mode. It turns over the conversation before actually completing everything and it does so in a way that makes it unclear that not everything was done. If queried, it will know exactly what parts were not done.
2) When doing frontend, it tends to inject narrative content directed at the developer into its pages. This isn't what you want 99% of the time, so it takes extra steps to go in and clear this out afterwards.
i think the basic idea is
1. innovation (multifactor productivity) scales, with diminishing returns, based on how much R&D is happening
2. in our modern human world, how much R&D is happening grows exponentially as population grows. the exponential population growth more than counteracts the diminishing returns (and innovation is nonrivalous so they contribute to *per capita* growth). the net result has been a long trend of per capita exponential growth
3. in the AI world, two factors change:
- a. AI population growth scales with data center growth, which is likely has doubling times much smaller than human population doubling times
- b. hardware and software improvements means R&D output now varies not only based on population size but on those factors as well. in our human world, no one can roll out new improved cognitive architectures and new faster axons to the human researcher brains, but this will be different in the AI world
3a on its own could be a big speedup, though still exponential and not foom. but maybe fast enough that everyone would feel we entered a new era. 3b could the move the world beyond exponential growth (though for how long is hard to say)