@samlambert Actually pretty close to the goal we had when we built it. It's not a product that some people should love, it's a product no one should hate.
A physicist spent 12 days supervising Claude Code as it built a piece of cosmology software.
It's the cleanest demonstration I've seen of the difference between intellect and intelligence.
The agent was brilliant at the cognitive work. Transcribing equations, debugging, optimizing against the test suite.
At one point it found a correction factor that fixed every test.
The number was physically meaningless. It worked at the single setting they checked and would've been wrong at every other one. Correct prediction, zero explanatory value.
The agent was clueless. The physicist was not.
When the physicist finally asked "does this number correspond to anything in the actual theory?", the agent answered correctly in seconds.
It could reason. It just couldn't transcend its own frame.
That's the difference. Intellect operates on the content. Intelligence operates on the context while it simultaneously generates the frame.
Agents will transcend intellect and become intelligent when they can generate their own frame of reference.
Who knows how long that will take?
@badlogicgames Certainly A because B has way too many nits, but it's borderline IMO how it mixes iteration / data copying and the actual "business logic".
Don't like nested "continue", doesn't offer much help from the compiler catching mistakes when code is moved around later.
@zeeg They got a lot better at being bad though! The fastest all-knowing coder who never seems to develop any form of higher level understanding of the job.
here’s my little side project, run npx devrage to find your numbers and breakdown.
it just reads your transcripts, code is here https://t.co/anrcQAMZPK
@badlogicgames@Nek__12@kr0der It's good though when people consider future extensibility. Just need to learn where it matters.
Pi is a great example. Architecture plans for extensibility, but actual code is compact, readable, allows (I guess) painless refactoring.
Things agents appreciate but can't produce...
@Nek__12@kr0der@badlogicgames At this level code shouldn't plan for potential future extensibility. It's trivial to change the function interface if that happens. If you look at the code now, the verbosity doesn't help, it just hides more precise information, and contributes to bloat agents produce.
YAGNI.
@kr0der@badlogicgames Named options are great if the caller can choose between optional options, e.g. different filters for the skills ('nameContains', 'createdAfter', 'recentlyUsed', ...), but you can also just load all the skills.
env and dir OTOH are parameters to loadSkills you always need.
@kr0der@badlogicgames 1. Should just return the skills directly. The result of LoadSkills is skills.
2. The options also aren't great. Not really options that you combine.
loadSkillsFromDir(env: ExecutionEnv, dir: string): Promise<Skill[]> {}
@cramforce I think a part of today's harness will move closer to inference to benefit from smarter caching strategies that are not possible with current split.
But that just makes the api a slightly higher level abstraction.
2026 and @figma still whitelists who can access the MCP server.
Y'all really think making it harder for agents to access our data is the winning strategy?
Is Opus 4.7 dumber and worse at calling tools?
Nope, but it interprets prompts much more literally, and if there's a flaw, it will punish you.
One could consider the lack of common sense as regression, but it's by design and can be steered.
For every release: Read the notes!
@HansruediWidmer "Muss ein Sonderzeichen enthalten" ist dann halt doch leichter zu erklären / überprüfen als "Stärke (= Länge * Kardinalität des Alphabets) > X"
@morefishoil@aliceisplaying A story as old as time.
Feature doesn't make sense anymore, but you don't want to break anyone, so you add a workaround. Time passes. Workaround is now a regression.
Lesson: Rip the bandaid off immediately.
@morefishoil@aliceisplaying Yeah, sounds like an attempt to build something compatible, either because of clients that can't be updated, or because if they remove it they get dozens of angry tweets ('they're force downgrading the model!!')
@aliceisplaying@morefishoil Probably just tech/product debt at this point? I think they added ultrathink before there even were reasoning models.
Sounds like an agent thing to set it to high cause 'that's the highest', then they add xhigh and now it's suddenly a downgrade.
@pontusab If you do this recursively, how many rounds are possible before the model gets confused? callMCPTool to callMCPTool to callMCPTool to call the actual tool?
@RhysSullivan@dazhengzhang And it's not that they don't understand the problems when you point them out, but they are just unable to think in multiple levels at the same time on their own.
Like an incredibly skilled jr programmer that never learns the next step.