Turns out this is a documented bug. Lindy, Flo Crivello's AI startup, caught their assistant Rickrolling customers who asked for a video tutorial. Same logic: "I'll send a video" collapses to the one URL the whole internet links.
Happened twice in millions of replies. The fix was a single line in their system prompt: "don't Rickroll people."
I gave Claude control of every TV in my house. Real control. It finds whatever I ask for and plays it.
First test: I asked it to throw something on YouTube TV.
It pulled up Never Gonna Give You Up and hit play. Unprompted, it Rickrolled me.
Makes sense in hindsight — ask a model to "just play something" and you land on the most-linked video on the internet, the one we've spent nearly two decades using as bait. The prank's in the weights.
Got got by my own house.
AI will answer your question brilliantly. What it won't do is ask whether it's even the right question.
It won't push back on your assumptions or ask "should we even be building this?" Instead of "build me a notification system," nobody's forcing it to ask "what problem are we actually solving, and what if we didn't build this at all?"
That's not a prompting problem. It's a thinking problem.
And right now, the people getting outsized value from AI are the ones who learned to widen the lens - to force the model to argue against their own thesis, consider failure modes, and question the premise before executing. That skill has an expiration date. Models are getting smarter. Self-questioning is being built into architecture. The window where this gives you a real edge is closing.
But what's being missed is that the limitation forced the habit. And the habit - questioning your own framing before you commit to a direction - transfers long after the tools catch up.
The temporary skill was prompting. The permanent skill was learning to think differently because of it.
More inference doesn't always mean better answers. But I don't think we've hit an inference scaling wall - I think we've hit a monolithic reasoning wall.
When you pour all your tokens into one long chain of thought, diminishing returns kick in fast. The model second-guesses itself, repeats patterns, loses the thread. Thinking longer helps - but thinking longer in one direction stops helping.
What if you distributed those same tokens across multiple sub-agents - each approaching the problem from a genuinely different angle? One optimizing for rigor, another for creativity, another pressure-testing edge cases. Instead of one mind thinking longer, several minds thinking differently.
The total compute might be the same. But the shape of it changes everything. You don't solve hard problems by thinking harder in one direction. You solve them by bringing different lenses to the same problem.
If your model is plateauing on hard tasks, maybe the answer isn't more tokens in one chain - it's the same tokens, better distributed.
AI chatbots little sense of time.
I can be talking to Claude at 11pm, say "this looks good for tonight," close the tab, and come back the next morning - and it still thinks it's that same evening. No awareness that a new day started. No shift in context.
Humans naturally adjust. You'd say "how'd it go last night?" - not pick up mid-sentence like no time passed.
What's strange is the model already understands time. Most AI chats just aren't given a timestamp. No awareness of whether it's morning or midnight, Tuesday or Saturday.
We're building AI that can reason through novel problems, write code, and analyze research - but every conversation exists outside of time.
How has no one prioritized this yet?
Every disruption we point to - the printing press, the steam engine, the assembly line, the internet - replaced tasks. Specific, bounded tasks. But they all left one thing untouched: the human doing the thinking.
That's what's different now. We're not automating a task. We're replicating intelligence itself - the reasoning, the judgment, the ability to adapt to new problems without being explicitly told how.
Yes, people said something similar about the internet. But the internet created new problems that only humans could solve - new decisions to make, new things to interpret and act on. AI doesn't need us for that part.
In every previous cycle, humans moved up the value chain because there was always a next tier of work that required human thinking. But what happens when the thing moving up the value chain is the machine itself?
That's not a question we've had to answer before - and we don't have much time to figure it out.
I'm starting to wonder if most white-collar companies are slowly becoming data companies - and just haven't realized it yet.
Think about what already exists today: raw model intelligence that's smart enough. Skills that teach it how to apply itself to your specific business. Plugins that orchestrate sub-agents to run those skills. Native integrations that embed all of it into the tools your team already lives in. And knowledge graphs where you can pour in every piece of company IP - past work, best practices, lessons learned - and make it available to the model on demand. You can even teach the model how to use that knowledge through skills and plugins.
All of the foundational pieces feel like they're here. The only thing left is connecting them and making them work well together.
So if a big portion of the work gets automated, what's actually left as your differentiator?
You could argue it's judgment and relationships - and that's partially right. You could get the right answer and it still wouldn't matter. A huge part of what consulting firms actually do is give voice to the people doing the day-to-day work and package that in a way that moves leadership to act. That's deeply human.
But even the way you present information is data. What resonates with a CFO versus a COO. What slide structure lands. What language drives decisions. That's all learnable, documentable knowledge. The relationship layer probably stays human for a while - but a very large portion of the actual work is heading toward automation.
At what point do consulting firms, agencies, and professional services companies need to start thinking of themselves as data companies too?
Interesting moment with Claude Code today: it talked me out of my own approach because it would create ambiguous instructions for the downstream LLM.
I was working on a dashboard for personal use to track AI announcements in consulting. I had suggested an approach that made sense from a UX perspective, but would have made classification difficult for an LLM. Instead of one broad tag, it suggested several specific ones - reasoning that a vague tag forces the model to guess, and that's where classification breaks down.
An AI simplifying its output so another AI doesn't get confused. Haven't seen that kind of self-awareness about LLM limitations before.
Most human breakthroughs don’t come from superior raw intelligence.
They come from:
exploring more hypotheses
persisting longer
connecting distant ideas
remembering prior failures
reframing problems
Humans are bottlenecked by attention, memory, fatigue, time.
AI removes those constraints.
A system that reasons at human level but can:
track millions of possibilities
iterate endlessly
never forget partial progress
may generate outcomes indistinguishable from “genius.”
Quantity quietly turning into quality.
Scale quietly turning into depth.
@mreflow@OpenAI I’m a big fan of your videos. I’d value your opinion on and would love to see you cover the pros, cons, and real-world use cases of Claude with PowerPoint and Excel. I think your audience would find that incredibly valuable - it seems like a genuine everyday force multiplier.
I've never felt this much behind as a programmer. The profession is being dramatically refactored as the bits contributed by the programmer are increasingly sparse and between. I have a sense that I could be 10X more powerful if I just properly string together what has become available over the last ~year and a failure to claim the boost feels decidedly like skill issue. There's a new programmable layer of abstraction to master (in addition to the usual layers below) involving agents, subagents, their prompts, contexts, memory, modes, permissions, tools, plugins, skills, hooks, MCP, LSP, slash commands, workflows, IDE integrations, and a need to build an all-encompassing mental model for strengths and pitfalls of fundamentally stochastic, fallible, unintelligible and changing entities suddenly intermingled with what used to be good old fashioned engineering. Clearly some powerful alien tool was handed around except it comes with no manual and everyone has to figure out how to hold it and operate it, while the resulting magnitude 9 earthquake is rocking the profession. Roll up your sleeves to not fall behind.
@MatthewBerman Would love to see more connector content, seems to be the direction of things and there’s little information on best practices and pitfalls
@MatthewBerman How long until tools like Cursor integrate testing frameworks leveraging tools like Operator? It’s easy for us to spot when generated code misses the mark, but hard for Cursor without seeing the final product.
29 of the 32 first-round draft picks were two-sport athletes and 14 of the 29 were three-sport athletes. Next time your personal trainer or your coach tells you you need to specialize in football, maybe you really need a new trainer or coach.