Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
Software horror: litellm PyPI supply chain attack.
Simple `pip install litellm` was enough to exfiltrate SSH keys, AWS/GCP/Azure creds, Kubernetes configs, git credentials, env vars (all your API keys), shell history, crypto wallets, SSL private keys, CI/CD secrets, database passwords.
LiteLLM itself has 97 million downloads per month which is already terrible, but much worse, the contagion spreads to any project that depends on litellm. For example, if you did `pip install dspy` (which depended on litellm>=1.64.0), you'd also be pwnd. Same for any other large project that depended on litellm.
Afaict the poisoned version was up for only less than ~1 hour. The attack had a bug which led to its discovery - Callum McMahon was using an MCP plugin inside Cursor that pulled in litellm as a transitive dependency. When litellm 1.82.8 installed, their machine ran out of RAM and crashed. So if the attacker didn't vibe code this attack it could have been undetected for many days or weeks.
Supply chain attacks like this are basically the scariest thing imaginable in modern software. Every time you install any depedency you could be pulling in a poisoned package anywhere deep inside its entire depedency tree. This is especially risky with large projects that might have lots and lots of dependencies. The credentials that do get stolen in each attack can then be used to take over more accounts and compromise more packages.
Classical software engineering would have you believe that dependencies are good (we're building pyramids from bricks), but imo this has to be re-evaluated, and it's why I've been so growingly averse to them, preferring to use LLMs to "yoink" functionality when it's simple enough and possible.
I recently received an email titled “An 18-year-old’s dilemma: Too late to contribute to AI?” Its author, who gave me permission to share this, is preparing for college. He is worried that by the time he graduates, AI will be so good there’s no meaningful work left for him to do to contribute to humanity, and he will just live on Universal Basic Income (UBI). I wrote back to reassure him that there will still be plenty of work he can do for decades hence, and encouraged him to work hard and learn to build with AI. But this conversation struck me as an example of how harmful hype about AI is.
Yes, AI is amazingly intelligent, and I’m thrilled to be using it every day to build things I couldn’t have built a year ago. At the same time, AI is still incredibly dumb, and I would not trust a frontier LLM by itself to prioritize my calendar, carry out resumé screening, or choose what to order for lunch — tasks that businesses routinely ask junior personnel to do.
Yes, we can build AI software to do these tasks. For example, after a lot of customization work, one of my teams now has a decent AI resumé screening assistant. But the point is it took a lot of customization.
Even though LLMs can handle a much more general set of tasks than previous iterations of AI technology, compared to what humans can do, they are still highly specialized. They’re much better at working with text than other modalities, still require lots of custom engineering to get it the right context for a particular application, and we have few tools — and only inefficient ones — for getting our systems to learn from feedback and repeated exposure to a specific task (such as screening resumés for a particular role).
AI has stark limitations, and despite rapid improvements, it will remain limited compared to humans for a long time.
AI is amazing, but it has unfortunately been hyped up to be even more amazing than it is. A pernicious aspect of hype is that it often contains an element of truth, but not to the degree of the hype. This makes it difficult for nontechnical people to discern where the truth really is. Modern AI is a general purpose technology that is enabling many applications, but AI that can do any intellectual tasks that a human can (a popular definition for AGI) is still decades away or longer. This nuanced message that AI is general, but not that general, often is lost in the noise of today's media environment.
Similarly, the progress of frontier models is amazing! But not so amazing that they’ll be able to do everything under the sun without a lot of customization. I know VC investors who are scared to invest in application-layer startups because they are worried that frontier AI model companies will quickly wipe out all of these businesses by improving their models. While some thin wrappers around LLMs no doubt will be replaced, there also remains a huge set of valuable applications that the current trajectory of progress of frontier models won’t displace for a long time.
Without accurate information about the current state of AI and how it is likely to progress, some young people will decide not to enter AI because think think AGI leaves them no meaningful role, or decide not to learn how to code because they fear AI will automate it — right when it is the best time ever to join our field.
Let us all keep working to get to a precise understanding of what’s actually possible, and keep building!
[Original text: https://t.co/OfxCVPGKoq ]
It’s never been easier to build a demo AI agent. But scaling one that can act autonomously, stay on-brand, and work across millions of interactions? That’s the hard part. Sierra’s Agent OS handles the complexity so you can focus on your customers. https://t.co/44EdU6X4A2
I just figured out why @howardlutnick is indifferent to the stock market and the economy crashing. He and Cantor are long bonds. He profits when our economy implodes.
It’s a bad idea to pick a Secretary of Commerce whose firm is levered long fixed income. It’s an irreconcilable conflict of interest.
Today's "DeepSeek selloff" in the stock market -- attributed to DeepSeek V3/R1 disrupting the tech ecosystem -- is another sign that the application layer is a great place to be. The foundation model layer being hyper-competitive is great for people building applications.
He’s literally comparing a mega donor, who he gave an award to, to our highest military honor for their sacrifices. Please, go on about his patriotism.
I am looking forward to experiencing the magic and excitement of Dreamforce next week. Experience Dreamforce from anywhere on Salesforce+.
3️⃣ days
✌️ channels
🥇 innovative event
Stream 72 hours of content and 200+ episodes. Register today, for free: https://t.co/zOECyG7B7D
At the start of the pandemic, headlines signaled the end of days for the Golden State: “California doom: Staggering $54 billion deficit looms,” the AP declared.
Yet that's not what the data shows. California’s economy is the opposite of doom https://t.co/KwU6WWlXST
Our 2nd relief aircraft on its way to India! @Salesforce and @vkhosla partnered with @GiveIndia to deliver this 2nd plane of much needed oxygen concentrators. Our hearts continue to be with our brothers & sisters in India. May they all be protected, healed, & blessed! ❤️🇮🇳🙏