🚨 BREAKING: A new research shows that AI agents don’t actually “reason” the way we assume they rely on different planning strategies that directly affect whether they succeed or fail.
Instead of treating AI agents as black boxes, this paper breaks down how they plan tasks step by step, and why those plans often break in real-world use.
The paper, “AI Planning Framework for LLM-Based Web Agents,” introduces a structured way to understand agent behavior by modeling them as planning systems.
It identifies three main approaches:
Step-by-step agents (sequential reasoning)
Tree-search agents (exploring multiple paths)
Full-plan agents (planning everything in advance)
Each approach comes with trade-offs.
Some are faster but less reliable.
Others are more thorough but computationally expensive.
This directly explains one of the biggest problems in AI today: agents can perform well in controlled demos, but struggle when tasks become complex or unpredictable.
The research also introduces new ways to evaluate AI systems, focusing not just on whether the final answer is correct, but on how the agent arrives at that answer.
This is a major shift from how AI is measured today. Most benchmarks focus on outcomes, while this work shows that the reasoning process itself is equally important.
What this highlights is a deeper limitation: current AI systems are not just limited by knowledge, but by how they plan and execute decisions over time.
The bigger implication is not just capability, it’s reliability.
As AI agents move into real-world workflows, the key challenge is no longer just producing correct answers but building systems that can plan, adapt, and complete tasks consistently.
This points toward a shift in AI development:
From generating outputs
to designing better decision-making systems.
article link below:
Quick update to GPT-5.5 / Spud:
AI labs are shifting back toward making models smarter during pretraining rather than leaning on reasoning (test-time compute) to boost performance. OpenAI's Spud and Anthropic's Mythos both appear to reflect this trend, getting better answers with fewer tokens and less reliance on chain-of-thought reasoning.
Spud is smarter "out of the box" from pretraining, it shouldn't need to burn through long chains of reasoning tokens to reach good answers. Fewer tokens means faster responses and lower cost per query.
Even more excited for spud today.
Meta tracking employee keystrokes + mouse movements to train AI… and we’re supposed to think that stops there?
No doubt the future is heading toward a kind of “surveillance layer” over everyday life , where behavior, clicks, habits, even hesitation becomes training data. Not just what we say, but how we act.
Today it’s internal tools. Tomorrow it could be everything.
The goal? Build AI that understands humans perfectly.
The cost? Privacy shrinking to near zero.
We’re not just using technology anymore, we’re becoming the dataset.
Sam Altman explains how AI got 1000x cheaper at solving hard problems in just 16 months:
He compares the cost of running their first reasoning model to their latest one:
"Our first reasoning model was called o1, came out like 16 months ago. And our latest model where we've now integrated reasoning is 5.4. To get the same answer to a hard problem from that first model to 5.4 has been a reduction in cost of about a 1000x."
He admits he may be slightly off on the timeline, noting "maybe it's a little bit longer," but the magnitude stands.
For @sama, this drop points to two things.
The first is how early we still are in this paradigm:
"We are still so early in this paradigm and we have so much more to gain about our understanding of how to develop these models and train them and run them efficiently... we are doing things in dumb ways and will get better and better."
The second is about human ingenuity, not just model improvements:
"Human ingenuity and the ability to operate in constraints and to find ways to solve problems almost always surprises you on the upside. It's not just that the models have gotten better, it's that kernel engineers came to help figure out how to write more efficient kernels and power engineers and the people that design data centers found more efficient ways to do that."
The gains didn't come from one breakthrough. Every layer of the stack attacked inefficiency at once:
"People are answering the call well beyond just the model side."
Most people think AI is making life easier, which it is, but it also brings consequences.
But this week’s tech news tells a different story:
It might be making us slower thinkers, easier targets, and forcing big shifts at the top.
🧵 Here’s what you probably missed
4) Put it together:
• AI is shaping how we think
• It’s changing how we get exploited
• And it’s reshaping leadership at the top
This isn’t just a trend.
It’s a shift.
3) Meanwhile, big tech is shifting behind the scenes.
Tim Cook is stepping down as CEO of Apple, with John Ternus expected to take over.
This isn’t random.
It signals a new phase for Apple in the AI era.
2) And while we trust AI more…
Scammers are getting smarter too.
A Guardian report highlights how AI-powered job scams are rising harder to detect, more convincing than ever.
The more we rely on tech, the more we expose ourselves to it.
1). AI chatbots are changing the way we think.
A recent BBC article suggests that relying too much on AI could reduce critical thinking over time.
When answers are instant, we stop questioning and move to the next step.
Convenience is killing curiosity.