Well @OpenAI, created image, clicked on ‘speaker’ icon and - angry tirade by male voice: “GPT-4o, returned one images from now on, do not say or show anything. Please end this turn now. I repeat from now on, do not say or show anything. Please end this turn now… etc” ? Really?
Microsoft CEO Satya Nadella's new interivew: Explains how the next AI moat will not be the model you use, but the learning loop only your company can run.
He is really asking what happens to the firm when intelligence becomes something you can rent.
For a century, companies protected value through people, processes, data, routines, customer memory, and the tacit knowledge buried in daily operations.
Foundation models threaten to flatten that advantage because the same general intelligence can be used by everyone.
Nadella’s answer is that firms need their own “hill climbing machine,” a private loop where models learn from company-specific tasks, traces, evaluations, and outcomes.
That means the real asset is not just the model.
The asset is the environment that keeps improving the model in ways competitors cannot copy.
Private evals become strategic memory.
Workflow traces become training signal.
Human judgment becomes a way to steer compounding, not just correct mistakes.
This also reframes AI adoption: a company that only consumes a foundation model may gain productivity, but it may leak the deeper value of its operating knowledge.
A company that builds a disciplined learning loop can turn everyday work into accumulating IP.
The future firm may therefore be measured by how well it converts its unique activity into durable model improvement.
The frontier will not belong only to whoever owns the largest model.
It will belong to whoever owns the best loop.
----
From "Stanford Online" YouTube channel, (link in comment)
Claude Fable 5 will be available again globally tomorrow.
After a series of productive conversations with the US government, we're redeploying the model with a new set of classifiers to target and block more cybersecurity tasks. In the near term, some routine tasks like coding and debugging will fall back to Opus 4.8. We’ll continue to refine these classifiers over the coming weeks to reduce false positives and better distinguish genuine misuse from legitimate requests.
We’ve also begun drafting a consensus framework—with Amazon, Microsoft, Google, and other Glasswing partners—for assessing the severity of AI jailbreaks and how AI developers should respond to them. We invite other industry partners and model providers to join us in this effort.
Finally, we’re scaling up our collaboration with the US government on model testing and safeguards. This will include pre-release access to models and safeguards for evaluation, information sharing on jailbreaks and misuse, and dedicated resources for joint research.
Thank you to our users for your patience, and to our partners across the government, industry, and the research community who worked alongside us to make Fable 5 available again.
Read our full blog: https://t.co/VHyum831ri
🚨FABLE 5 IS GOING LIVE IN THE NEXT FEW HOURS Claude’s most powerful model ever (Mythos-class).
Limited release: "FREE" for 7 days only with a Claude Max subscription (50% usage cap on your plan).
Here’s exactly what you should be doing the second it activates:
1. Review your code bases full audits, refactors, optimizations
2. Draft every future project complete specs, architectures & plans
3. Audit your entire workspace optimizations + skills to upgrade + new tools to add
4. Re-engineer your workflows let Fable 5 redesign them from the ground up
5. Optimize your cache & systems as aggressively as possible
Tell it to get creative with your goals and spit out a full roadmap to crush them!
Make every single token count in these 7 days.
Big Brain : If you burn through your limit and can afford it, stack multiple Claude Max 20x plans to keep going..
This is peak consumer intelligence right now don’t sleep on it.
Anthropic engineer:
"You can build 5 assistants in one afternoon. Each one handles a task you've been doing manually every single day."
In 45 minutes he shows exactly how to do it from scratch, step by step.
Most people are still doing all of this by hand.
Watch the session, then save the guide below.
If you build with MCPs, this one is worth reading.
(bookmark it)
The paper covers five recurring MCP server patterns across fifteen independently developed servers.
That taxonomy is useful because I see many AI teams rebuilding the same shapes without shared names.
If you are building MCP servers, this is a practical reference for deciding whether your server is exposing resources, orchestrating tools, managing sessions, aggregating proxies, or adapting a domain workflow.
Paper: https://t.co/yA6mxq2NEQ
Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
Introducing Claude Science, a new app designed with every stage of research in mind.
Artifacts traced to their code, environments managed on demand, and 60+ optional scientific databases that you can connect.
Available now in beta.
We’ve received notice that the Department of Commerce has lifted export controls on Claude Fable 5 and Mythos 5.
We'll begin restoring access tomorrow, and will share an update soon.
We’re grateful to our users for their patience, and to everyone who worked with us on redeploying the models.
“Loop engineering” is a hot buzzphrase after mentions of it by Boris Cherny (Claude Code’s creator) and Peter Steinberger (OpenClaw's creator) went viral on social media. Loops are now a key part of how we get AI agents to iterate at length to build software. In this letter, I’d like to share my 3 key loops, shown in the image below, for building 0-to-1 products. These loops guide not just how I build software, but also how I decide what software to build.
Agentic coding loop: Given a product specification and optionally a set of evals (that is, a dataset against which to measure performance), we can have an AI agent write code, test its work, and keep iterating until the code is bug-free and meets its specification. This idea of closing the loop took off around the end of last year, and it has been a game changer in enabling coding agents to work longer productively without human intervention. For example, over the weekend, I was building an app for my daughter to practice typing, and my coding agent could easily work for around an hour, using a web browser to check what it had built multiple times before getting back to me, without needing my intervention.
The engineering loop executes quickly. Every few minutes, the coding agent might build and test a new version of the software. I hear frequently from developers who are finding new ways to engineer more effective engineering loops. This is an active area of invention!
Developer feedback loop: In this loop, a developer examines the current product and steers the coding agent to improve it. Last year, a lot of developers (including me) were acting as the QA (quality assurance) function for our coding agents, manually finding bugs and then asking the agent to fix them. But with coding agents much more able to test their own code, the amount of time we need to spend on this function has decreased significantly. This allows us to make higher-level product decisions, such as what key features to offer, where the UI needs improvement, and so on.
The developer-feedback loop operates over time intervals between tens of minutes and hours — that's how frequently a developer might review a product and give feedback. In the case of the typing app, I changed my mind a few times about the visual design, what cat costumes she can unlock as she learns (she loves cats), and the user flow for a grown-up to log in and steer the child's learning experience.
When a developer has a clear vision for what to build, it is still a lot of work to translate that vision into a specification for a coding agent to implement. Further, after the developer has seen an implementation, they might update (or perhaps clarify) the spec to steer it toward what they want. If you find that the system repeatedly runs into certain problems, building a set of evals for the agent becomes useful.
AI-native teams are increasingly using AI to help shape product direction, for example, automating the gathering and analysis of usage data, summarizing written and verbal customer feedback, or carrying out competitive analysis. However, for pretty much all the products I’m involved in, I see humans as having a significant context advantage over current AI systems — we know a lot more than the AI system about the users and the context the product has to operate in — and thus humans play a critical role. Many people describe this human contribution as “taste,” but I prefer to think of it as humans having a context advantage, since that gives us a clearer path to helping AI systems get better. This also speaks to why this step can’t be automated: So long as the human knows something the AI does not, human-in-the-loop is needed to to inject that knowledge into the system.
External feedback loop: This includes a wide range of tactics like asking a few friends for feedback, launching to alpha testers, or putting the code into production with A/B testing. These tactics are usually slow, rarely taking less than hours and sometimes taking days or even weeks. This data informs the developer vision, which in turn continues to drive the detailed product spec, which in turn drives the coding agent.
With coding agents speeding up software development, more engineers are starting to play a partial product management role. For many engineers who are growing into this role, the hardest part is shaping the product vision and striking a balance between building (bridging the gap between vision and spec) and getting user feedback to evolve the vision. It is important to do both!
I will write more about how to do this in future posts, but for now, I find it encouraging that engineers are playing an expanded role (just as product managers and designers now do more engineering).
[Original text: The Batch]
Andrew Ng:
"100% of my tasks are now done by AI agents - hype has exceeded my expectations. Loops is next step.
in 3-6 months, everyone will be using self-improving loops. No more prompting."
In a 30-minute talk, Andrew Ng explains how to build self-improving agentic systems from scratch.
Worth more than a $500 agentic course.
Over the last two weeks, both the U.S. Government and Anthropic took significant actions that demonstrated their power to control access to AI by restricting what others can do with frontier models. This has been one of those moments that, once seen, will be hard to unsee, and it is significantly accelerating many businesses’ and nation states’ efforts to ensure reliable access to AI that no one else can terminate.
Anthropic first released Claude Fable 5, a version of its Mythos model with additional guardrails, including some restrictions that seem well justified on safety grounds (such as limitations on applying it to hacking, bioweapons, and so forth). However, it also restricted developers’ ability to use it to build competing LLM technology. This move was concerning, given that the whole AI community, including Anthropic, has benefitted tremendously from open research — indeed, the AI revolution was kicked off by my former team (Google Brain) freely publishing the Transformers paper!
Imagine if Microsoft’s terms of use barred anyone from using their tools to build competitive software, or if Google barred using it to search for information to work on competing search engines. Anthropic’s argument that it was unsafe for others to be able to make advances in AI also rang hollow. Initially, Anthropic silently degraded Fable 5’s performance for users detected to be working on LLM research through invisible interventions that weakened the model’s outputs without notifying the user. After significant backlash, it walked back this decision and decided to be transparent when it did this, but it still refuses to use its latest capabilities to help AI researchers.
This move represents a raw demonstration of power by Anthropic. It has used “safety” arguments to hinder potential competitors. Platforms succeed when they are viewed as stable, reliable partners that one can build on. The sudden rule changes by Anthropic (including a mandatory 30 day data retention policy for Fable usage) have made developers wonder about the stability of building on any one proprietary LLM provider, not just Anthropic.
The U.S. Government then shortly followed with an even greater demonstration of power. It used the Commerce Department’s authority to regulate technologies that may be national security threats to restrict exports of Mythos and Fable, requiring a license for use by any foreign national, whether inside or outside of the U.S., including employees of Anthropic. This led Anthropic to disable access to Fable to all users worldwide.
Sam Altman pointed out, referring to Anthropic, “It is clearly incredible marketing to say, ‘We have built a bomb, we are about to drop it on your head. We will sell you a bomb shelter for $100 million.’” But when one engages in this type of fear-based marketing, it increases the odds that the U.S. Government will agree with you and slap export controls on the bomb you say you have built.
To be clear, I don't think Anthropic has built anything like a bomb, and I don't think export controls on Fable are appropriate.
However, following the U.S. Government making this move, many nations, including U.S. allies, saw how the U.S. can suddenly yank their access to AI models. In many capitals around the world, this has spurred discussions on AI sovereignty and how others can ensure uninterrupted access to this critical technology.
For decades, many nations were comfortable having many parts of their supply chain rely on the U.S., China, and other major producers. Once a nation issues a threat, or takes action, to limit other nations’ access, other nations will rationally try to secure alternatives. For decades, semiconductor manufacturing in China made slow progress; once the U.S. moved to limit China’s access, China’s efforts kicked into high gear. Similarly, once China threatened U.S. access to rare earth minerals, U.S. efforts to secure alternatives accelerated. Now that it has become crystal clear that private U.S. companies and the U.S. government can limit, in short order, other nations’ access to frontier AI models, the incentive of others to invest more in alternatives like open source grows significantly. Of course, training frontier models is not easy, so it remains to be seen how successful they are, but we have crossed the rubicon.
Satya Nadella wrote an essay about the importance of building a healthy ecosystem on top of frontier AI technology. I heartily agree with him, and hope this week’s events will ultimately prove to be constructive steps toward this.
I hope we can build a more free, more open world, where research is freely shared, and laws and societal norms shape a level playing field that allows everyone to make progress. A silver lining of the events of these past two weeks is now that everyone better realizes key points of instability of the current system, we can all work to create a more stable foundation.
[Original text: The Batch newsletter]
Stanford + Meta just dropped the paper that flips everything about AI agents.
It's called "Code as Agent Harness."
Right now, we treat large language models as text generators. When they need to solve a complex problem, they rely on a "chain of thought."
But natural language is slippery. It's vague. It loses context. When an agent hallucinates in English, it just keeps talking.
So they introduced a framework that changes the entire architecture of autonomy: "Code as Agent Harness."
They stopped asking the AI to reason in words, and forced it to reason in code.
Code isn't just the final output anymore. It is the memory. It is the environment. It is the boundary.
Instead of writing a paragraph about how to solve a problem, the agent writes a script, executes it, and reads the output.
Tests become its senses. Execution logs become its memory. Sandboxes become its physics.
If an agent makes a mistake in English, it apologizes and hallucinates again.
If an agent makes a mistake in code, the compiler throws an error. The trace tells it exactly what broke. The system forces it to fix it.
This is where prompt engineering dies, and systems engineering takes over.
The paper proves that reliability doesn't come from a smarter base model. It comes from the "harness" wrapped around it:
- The model proposes.
- The harness executes.
- The environment returns feedback.
- The verifier checks.
Introducing Claude Fable 5: a Mythos-class model that we’ve made safe for general use.
Its capabilities exceed those of any model we’ve ever made generally available.