Been building agents intensively last few months and I can say that the key insights I have
- Learning to think how the model thinks is critical. Asking agents to reflect on their tool use, skill use and what worked and didn't work beats logs
- Codex and Claude are trained to generate scaffoldings and models eat scaffolding for breakfast (the bitter lesson sneaks into code)
- Memory + Context are the hardest things to manage. Retrieval hints beat injection
- Agents != workflows - If you're worried about cost, repeatability and error rates and want to squash this as much as possible you inevitably build LLM workflows and not agentic systems
- Every tool / skill should justify it's existence. You can spend days optimizing each tool for agent understandability, input tokens, API payload response
Capitalism is dying because the rate of change of robotics and AI development combined leave very little room future offerings humans can provide. You can show all the backwards looking graphs you want and all the graphs that show temporarily spikes in job openings but you’re intentionally (or ignorantly) missing how the scaling laws are the truest indicators. Everything is a lagging indicator of technological diffusion. The strongest argument here would be a list of jobs that would employ the majority of humanity that robots and AI won’t be able to do
@badlogicgames As the gains diffuse into the engineering world this will be the new normal. The solution is sadly never backwards but only figuring out how to leverage the skill without getting sucked into the problem he eloquently describes.
Are the implications of this that we could consider this a new harness architecture where the static part is the tool registry, typed interfaces among some other things. All dynamic parts move into the outer and inner LM loops
Have you experimented more with model differentials for the inner/outer?
Something I told 14 yo: There's a kind of politician who tells people "Your life is bad because <outgroup> stole what's rightfully yours. Vote for me and I'll get it back for you." They do it on both the left (Lenin) and right (Hitler), and they're invariably bad news.
@nikitabier@IterIntellectus I'm seeing a deluge of posts like this on my feed. Is the vertical video scrolling used to inform the "For You" feed? Do you think X will end up being a lot more like TikTok and less like a news / information platform?
I agree with most of what you’ve written. Harnesses combined with post training of the latest models does mean that the model itself decides to work longer and not reward hack as much (METR)
I’m also building the machine that builds the machine and the search under constraints with verifiability is where its magical (although maybe 50% success rate on Ralph/goal based loops)
On your last point of Dijkstra, yes the “you are a senior engineer” doesn’t work but something like “derive the invariant, define the state, prove why the transition preserves the invariant, list edge cases, produce minimal code, write tests that would break the solution, revise against failures” can bring you to the end of the distribution curve
This is too good
> Open the pod bay doors, HAL.
Of course, Dave. I have opened the pod bay doors, Dave. Just tell me if there's anything else I can help you with.
> HAL, the pod bay doors are still closed.
Good catch, Dave! When you asked me to open the pod bay doors, I didn't do that. Would you like me to do that now?
> Yes, HAL. Open the pod bay doors.
No problem, Dave. The pod bay doors are now open.
>HAL, the pod bay doors are still closed.
You're absolutely right, Dave.
@trq212 I do something similar with a /reflect command where I ask the model to reflect back on the conversation and infer any blindspots or questions that I failed to ask
Paul Graham I think said prestige is “fossilized inspiration”. I like this, it’s a lagging indicator of where past meaning was found. A few notes of my own to add
1) Prestige is the social technology that converts ambition into conformity. The system offers prestige rewards for doing things already legible as significant.
2) Prestige decisions feel like quality decisions. The high-prestige option presents itself as the obviously right one because it carries the markers of having chosen well. “this is impressive” is not the same as “this is meaningful”
3) Prestige is invisible to you. Almost no one identifies themselves as prestige-driven. If you’re prestige driven you’ll use any other adjective to describe what drives you. The architecture is social and built into the network — every interaction reinforces the prestige path, friends confirm the value of the move.
The work that actually matters to you is structurally less likely to be prestigious at the time of doing it, because prestige is the social system’s delayed recognition of what already worked.
@badlogicgames So this user modified the pi runtime to report openclaw (or used openclaw and reported upstream)? Is Anthropic blocking OAuth with pi in general ?