Coding agents are good at static stuff that can be verified via syntax check, but very bad at understanding complex runtime semantics (data or system components). I don't see this problem being solved by LLMs themselves, but a good harness should be able to help.
Microsoft is releasing a coding model:
MAI-Code-1-Flash is an inference-efficient agentic coding model. This model is tailor-made for and deeply integrated into GitHub Copilot, VS Code and the Microsoft stack, and, with 5 billion parameters, is comparable to Haiku but cheaper.
(https://t.co/S8xF5l6wea)
Why should we be bound to a single harness?
I think we need a coordinator to run multi-harness with auto-configurable backend models; let's call it harness-of-harness engineering, or harnessness.
This has quietly been a miracle month in medicine.
In the last 5 weeks we’ve got news on:
- retatrutide, the triple agonist GLP-1 from Lilly, basically melting fat and body-wide inflammation at record levels
- RevMed’s new pancreatic cancer drug showing unprecedented abilities to extend life
- small trial of a one-and-done PCSK9 gene editing therapy for slashing LDL cholesterol
- Mayo’s AI-assisted radiology showing vastly improved cancer detection
- this new therapy for metastatic solid tumors
This stuff is at varying levels of evidence. Retatrutide is ~100% on its way, other stuff needs more clinical trial data. But put it together and we’re maybe on the verge of majorly reducing the mortality of heart disease and cancer, the two leading causes of death in America.
For most simple application-layer programs, I don't see any need for any other languages at all now. TypeScript will be almost the only one used very soon.
Never saw this one coming. For the backend, C -> C++ -> Java -> Ruby -> Node(JS) -> Java -> Go -> Python -> now back to Node(TS), lol.
I can now probably say this:
Two months ago, inside Anthropic someone suggested building a token leaderboard.
A heated internal debate followed and the decision was made to *never* ever do it… because several people inside Anthropic simply thought ahead of the consequences
LOL, when you ask Claude which model it is through the API, its answer is "Qwen" when the question is in Chinese and "Claude" when the question is in English.
LLMs are definitely Bayesian; anyone saying they are intelligent is just wrong.
Opus 4.8 has been pretty impressive, solving quite a few tasks that 4.7 and GPT-5.5 ran circles.
But apparently it can't find its own bugs. Claude Code has been quite buggy recently, but this one is especially annoying since it is not recoverable, meaning the tokens used before the bug are wasted.
Anyone who has worked on agent loops should easily see where the bugs are coming from. Agent-assisted coding still has a long way to go for sure.
Very glad everyone is safe.
An extremely bad night for everyone @blueorigin, and those counting on them. My heart goes out to all.
Hopefully the cause of the explosion can be found swiftly, and the launchpad rebuilt on an accelerated timeline.
We are investigating unauthorized access to GitHub’s internal repositories. While we currently have no evidence of impact to customer information stored outside of GitHub’s internal repositories (such as our customers’ enterprises, organizations, and repositories), we are closely monitoring our infrastructure for follow-on activity.
I am not sure why anyone would want to limit themselves to a single coding agent atm. The coding tool war is still very early; I bet Claude Code will have some promotions very soon.
Also, Copilot and Gemini are getting better from my experience with them once in a while. They don't have any big promotions because they are not good enough right now.
codex is the best AI coding product and we want to make it easy to try.
for the next 30 days, we are giving companies that want to try switching over two months of free codex usage.
GitHub has its problems. But we have used other options, including the self-hosted GitLab enterprise version and Bitbucket in the early days. GitHub is actually pretty impressive, especially considering that it is dealing with this kind of unexpected usage growth.
The actual problem is that this kind of freemium business model needs real adjustment in the agent era of software and usage.
It's honestly impressive that GitHub kept the service up at all, given this kind of growth.
I predicted this years ago: Free services will become untenable with the advent of human-level bots.
Worth exploring micro-payments: Even cents per git push might be enough to reduce spam and make this sustainable. Maybe powered by Bitcoin to keep this open and accessible (as opposed to KYCing users).
It is really hard to pick which parts of the code to write manually these days (trending zero though). I guess it is even harder to resist the temptation to ask the agent to write the corporate docs.
I bet there are skills out there to list all the common AI patterns to restrict them from appearing, like a cat-and-mouse game.
Actually, anyone with some reasonable ops experience knows that it SHOULD be impossible for any human (let alone agents) to destroy the production DB, its WAL (or CDC), and periodical backups altogether. It is basically ops 101 to prevent bad actors (used to be mostly human though) from doing these kinds of damage.
It is not the failure of the agents (they are bound to fail sometimes), it is the failure of the ops. Agents just need good rails.
Always wanted to do this but finally got some time to get it done using one command:
Find a VM with 4GB RAM and 2vCPU on GCP and deploy the @examples/web_server/ to it for the domain name https://t.co/sbfKHf1JPR registered on GoDaddy with self-made SSL certificates.