If LLMs really made workers more productive in a general way, as is claimed, and this had been going on for a couple of years, as is claimed, wouldn’t we expect to see a boost to the economy and an uptick in consumer sentiment, rather than like, historic lows?
Fork your dependencies, trim them to only your use case, never update unless it breaks for your users. I’ve been vocal about this for 10+ years. I’ve always said that updating is way riskier than latent bugs (which can be tracked and CVEs monitored).
If you are updating a dependency, it’s on you to analyze every single commit in the full transitive set of dependencies. If you dont see anything compelling, dont update!
I remember at HashiCorp once in awhile an engineer would try to update a dep or replace a DIY lib with an external one and id always ask “show me the commit we need.” Dont update for the sake of it.
Feeling pretty swell about this mentality with all the supply chain attacks happening.
I strongly believe there are entire companies right now under heavy AI psychosis and its impossible to have rational conversations about it with them. I can't name any specific people because they include personal friends I deeply respect, but I worry about how this plays out.
I lived through the great MTBF vs MTTR (mean-time-between-failure vs. mean-time-to-recovery) reckoning of infrastructure during the transition to cloud and cloud automation. All those arguments are rearing their ugly heads again but now its... the whole software development industry (maybe the whole world, really).
It's frightening, because the psychosis folks operate under an almost absolute "MTTR is all you need" mentality: "its fine to ship bugs because the agents will fix them so quickly and at a scale humans can't do!" We learned in infrastructure that MTTR is great but you can't yeet resilient systems entirely.
The main issue is I don't even know how to bring this up to people I know personally, because bringing this topic up leads to immediately dismissals like "no no, it has full test coverage" or "bug reports are going down" or something, which just don't paint the whole picture.
We already learned this lesson once in infrastructure: you can automate yourself into a very resilient catastrophe machine. Systems can appear healthy by local metrics while globally becoming incomprehensible. Bug reports can go down while latent risk explodes. Test coverage can rise while semantic understanding falls. Changes happens so fast that nobody notices the underlying architecture decaying.
I worry.
If you don't understand this, you will not understand why LLM-based agents are irreparably failing for a general-purpose problem solving.
An agent (by the way it was the topic of my PhD 20 years ago) to be useful, must be rational. Being rational means to always prefer an outcome that results in the maximal expected utility to its master/user.
Let’s say an agent has two actions they can execute in an environment: a_1 and a_2.
If the agent can predict that a_1 gives its user an expected utility of 10, and a_2 gives an expected utility of -100, then a rational agent must choose a_1 even if choosing a_2 seems like a better option when explained in words. The numbers 10 and -100 can be obtained by summing the products of all possible outcomes for each action and their likelihoods.
Now here is the problem with LLM-based agents.
The LLM is not optimizing expected utility in the environment. It is optimizing the next token, conditioned on a prompt, a context window, and a training distribution full of examples of what helpful answers are supposed to look like.
Those are not the same objective.
So when we wrap an LLM in a loop and call it an “agent,” we have not created a rational decision-maker. We have created a text generator that can imitate the surface form of deliberation.
It may say things like:
“I should compare the expected outcomes.”
“The best action is probably a_1.”
“I will now execute the optimal plan.”
But the internal mechanism is not selecting actions by maximizing the user’s expected utility. It is generating a continuation that is statistically appropriate given the prompt and prior context.
This distinction matters enormously.
For narrow tasks, the imitation can be good enough. If the environment is constrained, the actions are simple, and the success criteria are close to patterns seen in training, the system can appear agentic.
But for general-purpose problem solving, the gap becomes fatal.
A rational agent needs stable preferences, calibrated beliefs, causal models of the world, the ability to evaluate consequences, and the discipline to choose the action with maximal expected utility even when that action is boring, non-linguistic, or unlike the examples in its training data.
An LLM-based agent has none of that by default.
It has fluency. It has pattern completion. It has a remarkable ability to compress and recombine human text. But fluency is not rationality, and a plausible plan is not an expected-utility calculation.
This is why these systems so often fail in strange, brittle, and irreparable ways when given open-ended responsibility.
They are not failing because the prompts are insufficiently clever.
They are failing because we are asking a simulator of rational agency to be a rational agent.
🚨 Someone reverse-engineered the design systems of Apple, Spotify, Airbnb, and 30+ billion-dollar companies.
Packed each one into a single file. Free.
It's called Awesome Design MD.
Drop one file into your project. Your AI agent builds UI that looks like Spotify. Or Apple. Or Airbnb. Instantly.
Not screenshots. Not Figma links. A single DESIGN .md file that captures every color, font, spacing value, button style, and layout pattern from a real website. In a format AI agents read and reproduce.
Here's the difference:
Tell Claude Code "build me a landing page" and it gives you generic UI.
Tell Claude Code "build me a landing page" with Spotify's DESIGN .md in your project and it gives you Spotify.
Here's what's inside:
→ Apple. Premium white space, SF Pro typography, cinematic imagery.
→ Spotify. Vibrant green on dark, bold type, album-art-driven layout.
→ Airbnb. Warm coral accent, photography-driven, rounded UI.
→ Linear. Ultra-minimal, precise spacing, purple accent.
→ SpaceX. Stark black and white, full-bleed imagery, futuristic.
→ BMW. Dark premium surfaces, precise German engineering aesthetic.
→ NVIDIA. Green-black energy, technical power aesthetic.
→ Uber. Bold black and white, tight type, urban energy.
→ Sentry, PostHog, Raycast, Cursor, ElevenLabs, and 20+ more.
Here's how to use it:
→ Pick a design system from the collection
→ Copy the DESIGN .md file into your project root
→ Tell your AI agent to use it
→ Get UI that matches the design language of a billion-dollar company
That's it. One file. Your AI agent now has the design taste of a $200/hour design consultant.
Designers charge $5,000+ for a custom design system. Companies spend $50,000+ building one from scratch.
This is free. 31 design systems. Copy. Paste. Ship beautiful UI.
Works with Claude Code, Cursor, Codex, and any AI coding agent that reads project files.
100% Open Source. MIT License.
Dear algo, plz show more content like this
I don’t mind hitting the translate button to understand, or even don’t need it…
The design speaks for itself
🤯BREAKING: Alibaba just proved that AI Coding isn't taking your job, it's just writing the legacy code that will keep you employed fixing it for the next decade. 🤣
Passing a coding test once is easy. Maintaining that code for 8 months without it exploding? Apparently, it’s nearly impossible for AI.
Alibaba tested 18 AI agents on 100 real codebases over 233-day cycles. They didn't just look for "quick fixes"—they looked for long-term survival.
The results were a bloodbath:
75% of models broke previously working code during maintenance.
Only Claude Opus 4.5/4.6 maintained a >50% zero-regression rate.
Every other model accumulated technical debt that compounded until the codebase collapsed.
We’ve been using "snapshot" benchmarks like HumanEval that only ask "Does it work right now?"
The new SWE-CI benchmark asks: "Does it still work after 8 months of evolution?"
Most AI agents are "Quick-Fix Artists." They write brittle code that passes tests today but becomes a maintenance nightmare tomorrow. They aren't building software; they're building a house of cards.
The narrative just got honest: Most models can write code. Almost none can maintain it.
In the 90s, a TV reporter tells David Bowie the internet is hugely exaggerated.
Bowie replies back:
"I don't think we've even seen the tip of the iceberg."
An essential bit of gamedev math - atan2!
We often know a target's position and want to face it. But how do we get the angle?
angle = atan2(y / x)
Or in 3D...
angle = atan2(x / z)
(We can also see below, what happens if we use atan instead of atan2)
#godot#gamedev
prediction cone/safe triangle — this is something we take so much for granted in modern day native UIs.
but it's not the same for most web-based dropdown menus. it took me a while to implement this here.
Amazon, macOS, Windows all implement some version of this.