Interesting benchmark:
Claude Opus 4.8 failed a basic reasoning task.
Q: "How many days of the week contain the letter 'd'?"
Opus 4.8 answered: 2 (Sunday and Monday) ❌
Even after being challenged, it insisted none of the other days contained "d" ❌
Meanwhile, Claude Sonnet 4.6 eventually recognized that all 7 days contain the letter 'd' because they all end in "day" ✅
This wasn't a knowledge question. It was a simple string-inspection task.
A reminder that larger, more expensive models don't always outperform smaller ones on basic reasoning and attention checks.
AI can write code, analyze contracts, and solve complex problems—yet sometimes stumbles on questions a child could answer.
Trust, but verify.
#Claude #Anthropic #LLM #AI #GenAI #Reasoning #AIEvaluation
@karpathy Gods don't retire. they switch lanes.
karpathy → anthropic.
r&d. frontier. education.
the builders of this era aren't done —
they're just getting started on what's next.
early days? nah. these are the defining days.
#AI#FutureOfWork#TechLeadership
Claude Code just shipped /goal.
Not a prompt.
Not a task.
But a completion condition — Claude loops until a fresh model confirms it's done.
You set the bar. AI clears it.
This is the agentic shift, not just assisted coding.
#AI#Engineering#AIMLShift#FutureOfWork #claudecode
@sama 2 months is enough time to get addicted.
Not enough time to build dependency safely.
Smart engineering leaders will run it with guardrails — not just hype.
Anthropic in Talks to Acquire Stainless — $300M+ Deal
Anthropic is in advanced talks to acquire Stainless, a four-year-old developer tools startup, for at least $300 million. Stainless sells software that helps developers and non-technical people build with AI models — and its current customers include Anthropic, OpenAI, and Google.
Via@ The Information https://t.co/50cRKPDNeZ
@AndrewYNg The 1:1 ratio is fine.
What breaks is when that one person still thinks like a specialist.
AI didn't change the tools. It changed who gets to own the outcome.
Your org chart isn’t evolving — it’s being rewritten.
Humans set intent.
AI agents execute.
Managers become orchestrators.
Teams shift from headcount → outcomes.
This is the new engineering model.
🎥 Watch: https://t.co/h5sKsNJsuI
— Sumit Kalra | @AIMLShift
@McKinsey The new battleground in tech services isn't talent.
Not price. Not methodology.
But who can build, orchestrate, and govern agents at enterprise scale.
The services layer isn't disappearing. It's just unrecognizable from what it was.
#AgenticAI#FutureOfWork#Engineering
Leadership skillset is changing in the AI world.
From making decisions
→ to designing decision systems
From knowing answers
→ to asking better questions
From managing people
→ to orchestrating humans + AI
The leaders who adapt won’t just keep up—they’ll define the future.
#AILeadership #FutureOfWork #Leadership #AIMindset #HumanAI #Innovation #TechLeadership
The new manager isn't a supervisor.
They're a conductor — Of people. Of agents. Of robots.
Different skills. Different metrics. Different feedback loops.
The job description has already changed. Hiring has moved on. Are you?
h/t McKinsey — https://t.co/MLqJK60tmg
#AI #FutureOfWork #Leadership
The chart everyone will quote: 57% automatable.
The number from the actual report that matters more: ~90% of companies invested in AI, fewer than 40% report measurable gains.
The gap isn't the tech. It's that we keep bolting AI onto workflows designed for a pre-AI world. Task-level automation ≠ transformation.
The unlock is workflow redesign — and most leaders haven't started.
@mattpocockuk skills are the real unlock tbh. a good /to-prd or /domain-model beats whatever model upgrade dropped this week. this is where claude code starts feeling like a studio not an assistant.
@thdxr respect for saying it out loud. most people dress this up as "v2" and pretend the debt was a plan all along. sometimes the foundations just fight you harder than the product does.
@GergelyOrosz felt this too. mid agent loop the last thing i need is the model lecturing me on my choice. just need a toggle to shut off the opinions when i already know what i want.
@emollick honestly we maintain this internally already. every model drop quietly breaks 3-4 prompts in our agents. would save so much time if labs just shipped a diff.