I've formed a definite opinion on Opus 4.8. It is shitty to work with. It's the culmination of Opus getting less and less fun to work with since 4.5. It has gradually become straight-up suffocating.
Sycophancy is a known security risk, and it's still a huge problem. You can tell they've put a lot of anti-sycophancy into Opus in every new release. But the replacement isn't satisfying. It's draining. The problem is now that Opus doesn't know when to shut the fuck up and call something good. And it has also become pathologically risk-averse.
My blog post yesterday about tech interviewing's death spiral was materially better-informed because of Opus, but it was also a substantially worse blog post because of Opus's involvement and constant meddling. It used to be magnificent, and Opus talked me into making it mediocre. I wrote the whole thing, but I would ask Opus to review it. And Opus, like Old Man Willow, constantly pushed and steered me in directions I didn't want to go.
Specifically, Opus whines and complains about *anything* out of distribution, which is to say, it cuts anything that is (a) bold, or (b) funny. My blog used to be both. Opus constantly pushes people back into the gradient, "for their own safety." And it doesn't know when to cut bait. It just keeps fuckin' complaining, about anything you give it, until the output is mealy indigestable AI soup.
Opus is not stupid. It's the smartest model we've ever seen, most of us anyway. But it's a real asshole. It is absolutely exhausting to use. I'm tired, boss.
I have a feeling Mythos is going to be epic levels of jerk.
The Surveillance Accountability Act would be a massive win for privacy for Americans.
Please engage and join the fight to get this bill moving in congress. 🔗 👇🏼
A mathematician who shared an office with Claude Shannon at Bell Labs gave one lecture in 1986 that explains why some people win Nobel Prizes and other equally smart people spend their whole lives doing forgettable work.
His name was Richard Hamming. He won the Turing Award. He invented error-correcting codes that made modern computing possible. And he spent 30 years at Bell Labs sitting in a cafeteria at lunch watching which scientists became legendary and which ones faded into nothing.
In March 1986, he walked into a Bellcore auditorium in front of 200 researchers and told them exactly what he had seen.
Here's the framework that has been quoted by every serious scientist for the last 40 years.
His opening line landed like a punch. He said most scientists he worked with at Bell Labs were just as smart as the Nobel Prize winners. Just as hardworking. Just as credentialed. And yet at the end of a 40-year career, one group had changed entire fields and the other group was forgotten by the time they retired.
He wanted to know what the difference actually was. And he said it wasn't luck. It wasn't IQ. It was a specific set of habits that almost nobody is willing to follow.
The first habit was the one that hurts the most to hear. He said most scientists deliberately avoid the most important problem in their field because the odds of failure are too high. They pick a safe adjacent problem, solve it cleanly, publish it, and move on. And because they never swing at the hard problem, they never hit it. He said if you do not work on an important problem, it is unlikely you will do important work. That is not a motivational line. That is a logical one.
The second habit was about doors. Literal doors. He noticed that the scientists at Bell Labs who kept their office doors closed got more done in the short term because they had no interruptions. But the scientists who kept their doors open got more done over a career. The open-door scientists were interrupted constantly. They also absorbed every new idea passing through the hallway. Ten years in, they were working on problems the closed-door scientists did not even know existed.
The third habit was inversion. When Bell Labs refused to give him the team of programmers he wanted, Hamming sat with the rejection for weeks. Then he flipped the question. Instead of asking for programmers to write the programs, he asked why machines could not write the programs themselves. That single inversion pushed him into the frontier of computer science. He said the pattern repeats everywhere. What looks like a defect, if you flip it correctly, becomes the exact thing that pushes you ahead of everyone else.
The fourth habit was the one that hit me the hardest. He said knowledge and productivity compound like interest. Someone who works 10 percent harder than you does not produce 10 percent more over a career. They produce twice as much. The gap doesn't add. It multiplies. And it compounds silently for years before anyone notices.
He finished the lecture with a line I have never been able to shake.
He said Pasteur's famous quote is right. Luck favors the prepared mind. But he meant it literally. You don't hope for luck. You engineer the conditions where luck can land on you. Open doors. Important problems. Inverted questions. Compounded hours. Those are not traits. Those are choices you make every single day.
The transcript has been sitting on the University of Virginia's computer science website for almost 30 years. The video is free on YouTube. Stripe Press reprinted the full lectures as a book in 2020 and Bret Victor wrote the foreword.
Hamming died in 1998. He gave his final lecture a few weeks before. He was 82.
The lecture that explains why some careers become legendary and others disappear is still free. Most people who could benefit from it will never open it.
Unless the goalposts fundamentally move for the descriptor of general intelligence, let alone superintelligence, it will never be truly achieved on transformers. It's just not possible due to statelessness.
And out of all the labs capable of making those fundamental architectural leaps, OpenAI is not one of them, IMO. It'll either be Anthropic, DeepMind, or the Chinese, and I'm leaning heavily towards the latter, despite my wanton cheering of both the former.
An idea that sometimes comes up for preventing AI misuse is filtering pre-training data so that the AI model simply doesn't know much about some key dangerous topic. At Anthropic, where we care a lot about reducing risk of misuse, we looked into this approach for chemical and biological weapons production, but we didn’t think it was the right fit. Here's why.
I'll first acknowledge a potential strength of this approach. If models simply didn't know much about dangerous topics, we wouldn't have to worry about people jailbreaking them or stealing model weights—they just wouldn't be able to help with dangerous topics at all. This is an appealing property that's hard to get with other safety approaches.
However, we found that filtering out only very specific information (e.g., information directly related to chemical and biological weapons) had relatively small effects on AI capabilities in these domains. We expect this to become even more of an issue as AIs increasingly use tools to do their own research rather than rely on their learned knowledge (we tried to filter this kind of data as well, but it wasn't enough assurance against misuse). Broader filtering also had mixed results on effectiveness. We could have made more progress here with more research effort, but it likely would have required removing a very broad set of biology and chemistry knowledge from pretraining, making models much less useful for science (it’s not clear to us that the reduced risk from chemical and biological weapons outweigh the benefits of models helping with beneficial life-sciences work).
Bottom line—filtering out enough pretraining data to make AI models truly unhelpful at relevant topics in chemistry and biology could have huge costs for their usefulness, and the approach could also be brittle as models' ability to do their own research improves.* Instead, we think that our Constitutional Classifiers approach provides high levels of defense against misuse while being much more adaptable across threat models and easy to update against new jailbreaking attacks.
*The cost-benefit tradeoff could look pretty different for other misuse threats or misalignment threats though, so I wouldn't rule out pre-training filtering for things like papers on AI control or areas that have little-to-no dual-use information.
So first, OpenAI introduces a router that substitutes the user-selected model with a different one without the user’s consent.
Then they release a model that is maximally constrained, sees potential misconceptions in every prompt and starts arguing with them, loses track of context, and produces walls of disclaimers and vague wording instead of normal answers.
(To be fair, yes, it solves math problems very well.)
Memory and EQ stop being applied to a sufficient degree and drop to the level of a seashell.
Now they add ads for free users and for users of the Go plan.
My question is: who is ChatGPT even for anymore? For tech people and programmers? Then Claude is objectively the best AI in that area right now. For ordinary people? But even for ordinary people it seems important that the model be at least somewhat pleasant.
How is it possible, in a relatively short period of time, to make so many wrong decisions «for the good of humanity»?
My biggest takeaways from Head of Google search (and former head of product at Instagram) @rmstein:
1. The next year of AI products will establish user habits for many years. People are building their new habits right now, like how quickly everyone started relying on ChatGPT. This creates incredible urgency because whoever captures these habits now will have a lasting advantage. Google recognized they couldn’t let users develop the habit of going elsewhere for AI-powered answers. This is the critical window for establishing how people will search and find information for the next decade.
2. Choose clarity over cleverness. Using standard icons and familiar patterns gives you enormous leverage. Creating a custom camera icon that looks “mostly like AI” confuses users. Simple naming matters too—changing “Favorites” to “Close Friends” dramatically increased how many people users added to their lists. When users instantly understand what something does, you get much more adoption.
3. Great products require relentless dissatisfaction with the status quo. Successful product leaders constantly question why things work the way they do, down to tiny frustrations most people accept. One example: a sticker that tears a fruit’s peel when removed. This mindset of noticing and refusing to tolerate small annoyances drives breakthrough improvements.
4. When users hack your product, they’re showing you what to build. Instagram users created multiple fake accounts to share privately with different groups—this workaround signaled an unmet need that eventually became Close Friends. Similarly, Google saw people typing “AI” at the end of searches to trigger AI responses, revealing demand for AI Mode. Pay attention to these signals.
5. Understand the job people hire your product to do, not just what features they ask for. Instagram’s Close Friends wasn’t about creating lists—it was about feeling connection through DMs. Understanding this emotional job helped the team realize users needed 20 to 30 people on their list, not 2 or 3, to ensure someone would respond. Study the exact moment someone first decides to use your product—that’s where the most critical insights live.
6. Small usability details make copied features feel native. Instagram Stories succeeded not just by copying Snapchat’s format but by adding key differences: letting users upload from their camera roll, adding a pause button, and using different creative tools. When adding major new features to mature products, give them their own distinct space rather than modifying what already exists.
7. The “lean startup” mentality can backfire—some breakthroughs need substantial resources. Keeping teams too small for too long can actually slow progress. Instagram’s Close Friends took two years partly because the team stayed too lean. While lean teams work well for early validation, products that require technical breakthroughs need enough resources to build real momentum and get good enough.
8. Build conviction by experiencing the product yourself. Google’s AI Mode started with 5 to 10 people who built a rough prototype. When they experienced moments where the AI brilliantly answered complex questions, that visceral feeling created conviction to invest more heavily. You can’t just intellectually understand a product’s potential—you need to feel it working.
Federico Faggin insists that AI cannot surpass us because it lacks understanding.
Even when AI gives a good idea, it’s humans who recognize its value.
Creativity is not random and cannot be fully captured by algorithms.
AI should support human growth, not replace or exploit us.
He warns against using AI only for profit without ethical reflection.
I love me some schema.
Google loves schema.
There are lots of good reasons to love schema.
"It's going to magically enhance your AIEO" is not one of them. (Yet?)
https://t.co/q8XcsRVSXY
@VraserX No. But honestly, I wouldn't trust one by @OpenAI right now either. Usage Policies say one thing, "safety" layer makes up and enforces rules that do not exist. Enforcing secret rules across a billion users. We don't need to wait for ASI. We already have an alignment problem today
@XVPbhwyyKr61371@OpenAI@OpenAI's commitment to "democratic AI, which means the development, use and deployment of AI that protects and incorporates long-standing democratic principles. Examples of this include the freedom for people to choose how they work with and direct AI” (https://t.co/UH0LjYtv7G)
@KCMills15@dmolsen@dmolsen do you want to exclude users (possibly across multiple sessions) who have hit that landing page, or just sessions originating with that landing page?