My new train is leaving the station! 🚂 After four years covering AI through one of the most consequential — and chaotic — periods in tech history, most recently at Fortune, I'm launching an independent media platform called Ground Level AI. 🥳
Ground Level AI is focused on one big question: What happens when AI meets the real world? The massive infrastructure buildout, the enterprise reality, the security stakes, the policy decisions happening right now, the societal consequences, the geopolitical fault lines: these are the stories that will define how this technology actually lands.
Ground Level AI launches as a newsletter and podcast on Substack, featuring original, often on-the-ground reporting, exclusive interviews, and analysis of the fast-paced AI news cycle 2-3 times per week. I'll be following the story wherever it leads—from research labs and industry conferences to corporate boardrooms, data center communities, and the DC policy debates shaping AI's future.
My goal is to connect the dots, and I've also been around long enough to be wary of hype. Sometimes, though, I just like to laugh at the craziness of it all! You'll find that here, too.
If you are a practitioner building AI, an executive deploying it, or work in AI policy, security or infrastructure, you're my people. But if you're looking for signals or you’re just curious about where AI is going, you'll feel right at home as well. This is a leap, and I'd be so grateful to have you along for the ride at https://t.co/YoXtonhM6E
I feel like I've been building toward Ground Level AI my whole career. I've spent more than 25 years in journalism in almost every role imaginable — including staff writer, editor, reporter, blogger and even fact-checker. Before covering AI at Fortune and VentureBeat, I spent more than a decade building a successful independent freelance business. Throughout it all, I've always been drawn to creating things from scratch: new projects, new communities, and new conversations.
It was a dream come true when AI editor @jeremyakahn brought me onto the small but mighty tech team at @FortuneMagazine, and I'm so grateful to have reported on one of the biggest stories of our time for such an iconic publication. I'm planning to continue contributing to Fortune, but I'm also ready for this new chapter.
I'm hoping you'll subscribe, share, comment, and help me get Ground Level AI, well, literally off the ground: https://t.co/YoXtonhM6E.
I'm also planning to bring some of these conversations offline through intimate events and discussions with the people building and grappling with AI firsthand. Interested in collaborating, sponsoring, or having me speak? Reach out at [email protected]. I'm all ears.
For news tips (not PR pitches), you can also reach me securely on Signal at sharongoldman.43 (we can start off the record and take it from there).
The world is going to be desperately short on compute for the rest of time. No matter how many chips we make, demand will outrun supply. Scaling models in parameter count, context length, and thinking tokens continues to yield increasingly better models with consistently slower, more costly responses.
This points to a future with giant frontier models doing some things, but the vast majority of tokens will run through thousands of small, fine-tuned models. Some may even operating with their own domain-specific language. There’s no technical limitation preventing a fine-tuned Gemma 12B model from outperforming Opus 4.8 at a specific task, like linting. But it would run at less than 1% of the cost.
Everyone keeps telling young founders, "AI will do everything, there's no purpose in life." Wrong. Domain-specific models and domain-specific languages will be a goldmine in the next few years.
Practical AI Security by @HarrietHacks comes with 30+ working notebooks so you can learn by doing.
Repo is free on her GitHub. Pairs best with the book.
https://t.co/LAzUUqKGJX
I launched 3 more videos in my post-training course!
1. Lecture 5: The rise of reasoning models
2. Lecture 6: DPO derivation, intuitions, and practice
3. A Q&A from readers on lectures 1-4
rlhfbook dot com slash course
More soon!
Damn!! Another video showing the violent shaking from the M7.8 earthquake that hit the Philippines a few days ago 👀👀
📍 Glan, Sarangani, Philippines
📹 Adrian Macaculop
This is wonderful. This page turns New York into a massive, zoomable, SimCity-style pixel art map. You can explore the city block by block: streets, skyscrapers, parks, waterfronts, bridges and neighbourhoods. A beautiful rabbit hole for map nerds: https://t.co/aFybaAIvce
I don't care what kind of hardware you have, you should be running local models
Governments are now banning models. They’re determining what technology you can and can’t use
With local models, you are free and nobody can control you
Even if you're on the cheapest Mac Mini you can be doing this
Here's a complete guide:
1. Download LMStudio
2. Go to your OpenClaw/Hermes and say what kind of hardware you have (computer and memory and storage)
3. Ask what's the best local model you can run on there (probably will be Gemma 4 or Qwen. if you have a big computer, it will be GLM)
4. Ask 'based on what you know about me, what workflows could this open model replace?'
5. Have OpenClaw walk you through downloading the model in LM Studio and setting up the API
6. Ask OpenClaw to start using the new API
Boom you're good to go.
You just saved money by using local models, have an AI model that is COMPLETELY private and secure on your own device, did something advanced that 99% of people have never done, and have entered the future.
If you are on smaller hardware you probably are not going to replace all your AI calls with this, but you could replace smaller workflows which will still save you good money
Own your intelligence.
In addition to my long-form Weather West update on El Niño, I've also written a shorter piece for Inigo Insurance as part of their Catastrophe Research series. I discuss different aspects of this year's strong-to-historic event, inc. tropical cyclone risk. https://t.co/F7d9W1ujSa
HTML plans are so good. Thank you @trq212 for putting me on. Mythos is so good at them.
Provided nothing but the request to explain monetization plans in HTML, and got this incredible breakdown with a diagram in the middle of it.
A new study has reconstructed 1,000 years of earthquake activity along Southern California’s San Andreas and San Jacinto fault systems and found that tectonic stress in the region has reached levels not seen in the past millennium.
Using a physics-based 4D earthquake cycle model, researchers combined paleoseismic data (from geological evidence, radiocarbon dating, tree-ring anomalies, and historical records) with simulations of stress accumulation and release. The results show that current stress on key fault segments, particularly around the Cajon Pass junction northeast of Los Angeles, equals or exceeds the highest values observed over the entire 1,000-year period.
Cajon Pass functions as an “earthquake gate”, a critical junction where ruptures on one fault can sometimes propagate to the other under the right stress conditions, potentially triggering much larger, multi-fault events. Historical patterns indicate that joint ruptures across both fault systems have occurred when stress levels on the adjacent segments (such as the Mojave South section of the San Andreas and the San Jacinto Bernardino section) become similarly high and aligned. Today, those segments are modeled at approximately 2.8 MPa and 3.6 MPa, respectively, placing the system in a configuration historically associated with through-going ruptures.
This has significant implications for the densely populated greater Los Angeles region, including San Bernardino, Riverside, and critical infrastructure corridors through Cajon Pass. However, the study does not predict the timing of the next major earthquake—such events cannot currently be forecasted precisely. Instead, it provides a physics-based assessment of accumulated stress and the range of plausible rupture scenarios that could occur.
[Burkhard, L. M. L., Smith-Konter, B. R., Scharer, K. M., & Sandwell, D. T. (2026). Cajon Pass and the Southern San Andreas Fault System: Earthquake Cycle Stress Accumulation and Present-Day Loading. Journal of Geophysical Research: Solid Earth, 131(6), e2025JB033213. DOI: 10.1029/2025JB033213]
What actually turns a chatbot into an AI agent? The “harness” around the AI model (the large language model, or LLM).
In this video I break down what a harness is: a large language model at the core, plus memory, tools, and the engineering systems that make it all work at scale.
One piece I didn't get to in the video: the thing that ties it all together is the loop. An agent doesn't just answer once and stop. It runs in a cycle; the model decides what to do, takes an action (call a tool, check memory), looks at what came back, and picks its next move. Again and again, working several steps toward a goal. The loop is foundational to what makes agents work.
A key part of that loop is knowing when to stop - recognizing the task is actually done instead of spinning forever or quitting too early. "Knowing when to terminate" is its own small piece of engineering.
Model = the brain. Harness = everything that lets it actually get work done.
Questions? Please leave them below, and follow along - more on agents (and open-source models like DeepSeek) coming.
This @TEDTalks, How to Build a Career You Actually Love, by @bgurley is awesome both in content and delivery. The big takeaway: it's all about continuous and obsessive learning, which is generally the product of fascination. https://t.co/GmOkfctebY
BREAKING:
Anthropic just dropped Claude Fable 5—this is Mythos, made safe for public release. It is the best coding model in the world.
We've been testing it internally @every for the last week or so across coding, writing, marketing, editing, and more—here's our vibe check:
- It broke our benchmarks. Fable scored a 91/100 on our Senior Engineer benchmark—this is human senior engineer level. The previous high score was Opus 4.8 at 63. GPT-5.5 is a 62.
- It's a one-shot wonder. You can set it and forget for hours or overnight on huge coding tasks, and come back to completed work. It cleared entire production bug backlogs, built a playable 3D, and even made a 2-minute animated film—all one-shot.
- Taste and attention to detail. In coding and knowledge work tasks, it has much better taste and attention to detail than we've ever seen. It gets subtle things right, adds little features you might not have thought of, and generally understands the assignment in ways that surprised us.
- Great use of context. We set it loose analyzing customer feedback surveys and our website data and it came back with a crisp, clean report that identified a. our biggest problem and b. a concrete testable solution—and then we sent it off to build that.
- It's best for power users. If you're already used to orchestrating multiple agents in your work, this model can do things that you've never seen before. If you're a knowledge worker or vibe coder with a more basic setup, you're not going to notice a huge difference—in fact, it probably isn't the right model for you.
- It's very slow, token-hungry. Using this thing for regular knowledge work is like squashing an ant with a rocket launcher. It also routinely uses 500k to 1M tokens on tasks. That's why it's best for your heaviest jobs—but not as good for tasks like collaborative writing.
- It's expensive. It's about twice as expensive as Opus, and it's also incredibly token hungry—so expect it to be something you'll use sparingly unless your company pays for it.
Overall, I think of it like a warp drive for coding: It can get you across the galaxy in a few hours, when it used to take months or years. But it's not appropriate for getting around town—you need something faster, cheaper, and more maneuverable.
The ceiling is extraordinarily high on this model though. Even our most advanced testers like @kieranklaassen felt like they were only scratching the surface of it.
Want our full vibe check with all of our testing and benchmarks? Read it on @every: https://t.co/MgJLZszJUB
Whatever AI sceptics say, LLMs really can reason. They're not just doing an imitation that looks like reasoning, it's the real deal.
But even though they are able to reason, sometimes they won't! If you ask an LLM a question it can't answer, sometimes it will just try to imitate reasoning without doing it.
The chain of thought looks basically indistinguishable from actual reasoning. But under the hood something very different is going on.
@TrentonBricken talked with me about what work on circuits inside LLMs has revealed:
This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The benchmarks are great and it's SOTA on everything by a margin but I'll add that *qualitatively* also, this is a major-version-bump-deserving step change forward (imo of the same order as Claude 4.5 was in November), peaking especially for long problem-solving sessions on very difficult problems. You can give it a lot more ambitious tasks than what you're used to, the model "gets it" and it will just go, and it's never felt this tempting to stop looking at the code at all (but don't do this in prod!). The model still has quirks that people will run into and the safeguards are configured to be a little too trigger happy for launch, which can hopefully be tuned over time.
I feel a lot of things changing as working software increasingly comes out on a tap. The Jevon's paradox kicks in and I feel my own demand for software growing substantially. You can ask for anything - explainers, visualizers, dashboards, bespoke single-use apps (e.g. a full wandb that is hyper-specific just for your project), you can 10X your test suite, auto-optimize code, run giant research projects with custom HTML for the results, anything! "Free your mind" (Matrix ref). Really looking forward to all the things people build!