Claude 4 just refactored my entire codebase in one call.
25 tool invocations. 3,000+ new lines. 12 brand new files.
It modularized everything. Broke up monoliths. Cleaned up spaghetti.
None of it worked.
But boy was it beautiful.
a pattern i use with claude code/codex a lot:
have it write a one-off script to diagnose, reproduce, and fix an error/bug
and only then apply it into the actual codebase
it can iterate in isolation which reduces the cognitive load for the agent
i wonder how many codex features are not implemented out of fear of copycatting claude code too much
like not being able to use C-n/C-p to move up/down a list...
why do i have to use my arrow keys???
depending on hyperscale cloud provider (aws, azure gcp) to host services is unnecessarily complex for the majority of use cases
just rent a server and host it yourself
@rickasaurus good point and i think it's context dependent
with adapters for instance, removing the right degrees of freedom and defining the protocol that each adapter needs to adhere to feels crucial to scale that to any reasonable number
@DSPyOSS but why would the data be too underspecified for this but not for prompt optimization? cause the problem representation space is so much bigger?
working with @DSPyOSS shifts the focus from prompt engineering to problem representation
LLMs are semantic machines. even changing a field name can materially affect results because you're changing the semantic relationship the model sees
@DSPyOSS yeah so it's a bit of a stretch
it feels akin to differentiable neural network architectures
in the is_spam example, the ultimate output is is_spam, but there are many paths to get there
@DSPyOSS exactly. i guess you'd almost want a meta-optimizer that searches across different problem representations and post-processing configurations to find what optimizes for your end goal
@DSPyOSS for instance, this pattern has made a big impact for me:
instead of is_spam: bool
classify as promotional/phishing/personal/automated
then return category in [promotional, phishing]
gives the model richer semantic structure even though you only need the binary result
@DSPyOSS this tripped me up initially. dspy resembles classical ml approaches, so i assumed labeled data was enough.
but problem representation still matters enormously