PICARD: Data, shields up
DATA: Brilliant! Shields can reduce damage we sustain. Not immunity. Not hubris. Just prudence. It's not precaution—it's strategy.
[camera shakes]
WORF: HULL BREACHES ON NINE DECKS
DATA: Here's what happened: you told me to raise shields, and I didn't
Antigravity CLI 1.0.1 is out.
Key updates:
- Fixed OAuth not persisting in some environments.
- Enhanced the visual experience on Windows.
- Added the new "proceed in sandbox" permission control.
Restart agy to auto update or run “agy update".
See the full changelog for details:
https://t.co/zJNgkJck3Z
I'm thrilled to release CodeAlta - one of the first efficient AI coding-agent TUIs built entirely in C#/.NET 🚀
I've been developing and using it daily for the past 3 months, and I hope you enjoy it as much as I do! 🤗
Retweets are highly appreciated! 🙏
CodeAlta brings you a beautiful, colorful timeline interface, multiple threads in the same workspace, a real prompt editor experience, quick file viewing/editing with syntax highlighting, in-app model provider configuration, a multi-agent-ready environment, and much more! ✨
If you don't understand this, you will not understand why LLM-based agents are irreparably failing for a general-purpose problem solving.
An agent (by the way it was the topic of my PhD 20 years ago) to be useful, must be rational. Being rational means to always prefer an outcome that results in the maximal expected utility to its master/user.
Let’s say an agent has two actions they can execute in an environment: a_1 and a_2.
If the agent can predict that a_1 gives its user an expected utility of 10, and a_2 gives an expected utility of -100, then a rational agent must choose a_1 even if choosing a_2 seems like a better option when explained in words. The numbers 10 and -100 can be obtained by summing the products of all possible outcomes for each action and their likelihoods.
Now here is the problem with LLM-based agents.
The LLM is not optimizing expected utility in the environment. It is optimizing the next token, conditioned on a prompt, a context window, and a training distribution full of examples of what helpful answers are supposed to look like.
Those are not the same objective.
So when we wrap an LLM in a loop and call it an “agent,” we have not created a rational decision-maker. We have created a text generator that can imitate the surface form of deliberation.
It may say things like:
“I should compare the expected outcomes.”
“The best action is probably a_1.”
“I will now execute the optimal plan.”
But the internal mechanism is not selecting actions by maximizing the user’s expected utility. It is generating a continuation that is statistically appropriate given the prompt and prior context.
This distinction matters enormously.
For narrow tasks, the imitation can be good enough. If the environment is constrained, the actions are simple, and the success criteria are close to patterns seen in training, the system can appear agentic.
But for general-purpose problem solving, the gap becomes fatal.
A rational agent needs stable preferences, calibrated beliefs, causal models of the world, the ability to evaluate consequences, and the discipline to choose the action with maximal expected utility even when that action is boring, non-linguistic, or unlike the examples in its training data.
An LLM-based agent has none of that by default.
It has fluency. It has pattern completion. It has a remarkable ability to compress and recombine human text. But fluency is not rationality, and a plausible plan is not an expected-utility calculation.
This is why these systems so often fail in strange, brittle, and irreparable ways when given open-ended responsibility.
They are not failing because the prompts are insufficiently clever.
They are failing because we are asking a simulator of rational agency to be a rational agent.
Introducing Claude Opus 4.7, our most capable Opus model yet.
It handles long-running tasks with more rigor, follows instructions more precisely, and verifies its own outputs before reporting back.
You can hand off your hardest work with less supervision.
SLS vs. Saturn V, a real time comparison.
Saw a comparison yesterday using my video on the left at 50% speed. Here's a proper comparison with both rockets launching at their real speeds.
You'll notice SLS is significantly faster off the pad because of the dual massive SRBs!
I've built a full LLM inference engine in C#/.NET 10. From scratch. Not a wrapper - native GGUF loading, BPE tokenizer, attention, KV-cache, SIMD-vectorized CPU kernels, CUDA GPU backend, OpenAI-compatible API. Solo dev, ~2 months, AI-assisted (not vibe-coded!). First preview is out.
Check it out for mode details at https://t.co/Bl5wAYalYY and https://t.co/rQWhKN0iVA
LINQPad now works with your @copilot account! Use Copilot as the backend for all of LINQPad's AI features: coding agent, chat, code completion, and SQL-to-LINQ conversion. No extra account needed, fully integrated, all native LINQPad prompts.
https://t.co/DBhJBmUqIY