Totally agree that Opus 4.5 will be the cutoff. When there is an OSS model that can function as well as Opus 4.5, we will see a lot of local instances set up.
We've gone really quickly from "local models are dogshit" to "local models are good actually" (like, a 12 month window from A to B). I don't think they're actually good ENOUGH yet. We need an Opus 4.5 quality local model. When that happens, I think the world will spill over.
Opus 4.5 is/was amazing, and is more than good enough for almost all tasks still as long as you pair with a frontier-level planner/judge.
It'll still require a hugely expensive machine to run it, I'm sure, like a $5K or more laptop or mac studio. But, that's going to be pennies compared to the API costs plus all the benefits of guaranteed privacy and so on.
Really excited to open source a new project: Omnigent, a meta-harness for AI agents.
It lets you build multi-agent coding and custom agents, sitting above Claude Code, Codex, Pi, and agent SDKs to let you compose them. It also adds live collaboration and rich control policies.
Microsoft is releasing a coding model:
MAI-Code-1-Flash is an inference-efficient agentic coding model. This model is tailor-made for and deeply integrated into GitHub Copilot, VS Code and the Microsoft stack, and, with 5 billion parameters, is comparable to Haiku but cheaper.
(https://t.co/S8xF5l6wea)
Why should we be bound to a single harness?
I think we need a coordinator to run multi-harness with auto-configurable backend models; let's call it harness-of-harness engineering, or harnessness.
Coding agents are good at static stuff that can be verified via syntax check, but very bad at understanding complex runtime semantics (data or system components). I don't see this problem being solved by LLMs themselves, but a good harness should be able to help.
Microsoft is releasing a coding model:
MAI-Code-1-Flash is an inference-efficient agentic coding model. This model is tailor-made for and deeply integrated into GitHub Copilot, VS Code and the Microsoft stack, and, with 5 billion parameters, is comparable to Haiku but cheaper.
(https://t.co/S8xF5l6wea)
Why should we be bound to a single harness?
I think we need a coordinator to run multi-harness with auto-configurable backend models; let's call it harness-of-harness engineering, or harnessness.
This has quietly been a miracle month in medicine.
In the last 5 weeks we’ve got news on:
- retatrutide, the triple agonist GLP-1 from Lilly, basically melting fat and body-wide inflammation at record levels
- RevMed’s new pancreatic cancer drug showing unprecedented abilities to extend life
- small trial of a one-and-done PCSK9 gene editing therapy for slashing LDL cholesterol
- Mayo’s AI-assisted radiology showing vastly improved cancer detection
- this new therapy for metastatic solid tumors
This stuff is at varying levels of evidence. Retatrutide is ~100% on its way, other stuff needs more clinical trial data. But put it together and we’re maybe on the verge of majorly reducing the mortality of heart disease and cancer, the two leading causes of death in America.
For most simple application-layer programs, I don't see any need for any other languages at all now. TypeScript will be almost the only one used very soon.
Never saw this one coming. For the backend, C -> C++ -> Java -> Ruby -> Node(JS) -> Java -> Go -> Python -> now back to Node(TS), lol.
I can now probably say this:
Two months ago, inside Anthropic someone suggested building a token leaderboard.
A heated internal debate followed and the decision was made to *never* ever do it… because several people inside Anthropic simply thought ahead of the consequences
LOL, when you ask Claude which model it is through the API, its answer is "Qwen" when the question is in Chinese and "Claude" when the question is in English.
LLMs are definitely Bayesian; anyone saying they are intelligent is just wrong.
Opus 4.8 has been pretty impressive, solving quite a few tasks that 4.7 and GPT-5.5 ran circles.
But apparently it can't find its own bugs. Claude Code has been quite buggy recently, but this one is especially annoying since it is not recoverable, meaning the tokens used before the bug are wasted.
Anyone who has worked on agent loops should easily see where the bugs are coming from. Agent-assisted coding still has a long way to go for sure.
Very glad everyone is safe.
An extremely bad night for everyone @blueorigin, and those counting on them. My heart goes out to all.
Hopefully the cause of the explosion can be found swiftly, and the launchpad rebuilt on an accelerated timeline.
We are investigating unauthorized access to GitHub’s internal repositories. While we currently have no evidence of impact to customer information stored outside of GitHub’s internal repositories (such as our customers’ enterprises, organizations, and repositories), we are closely monitoring our infrastructure for follow-on activity.
I am not sure why anyone would want to limit themselves to a single coding agent atm. The coding tool war is still very early; I bet Claude Code will have some promotions very soon.
Also, Copilot and Gemini are getting better from my experience with them once in a while. They don't have any big promotions because they are not good enough right now.
codex is the best AI coding product and we want to make it easy to try.
for the next 30 days, we are giving companies that want to try switching over two months of free codex usage.