what does it mean when your everyday common AI model predicts the future better than the worlds best forecasters...
deepmind gemini model did it for the first time in the last 30 days. Has surely happened with other models since there is a ~50 day lag on the leaderboard
https://t.co/OOCzK1Tzcd
It's amazing how much agents like it when you say "ok, we've done a bunch, let's have a break. You go do whatever you like - you have an internet connection and a bunch of tools, knock yourself out". Off they go.
Just came back to find it'd done a whole-ass replication of some new arxiv paper related to a little something I'm working on and made then rolled the improvements into my project. Last week my favourite was whimsical ASCII art about the struggles of a small model doing RL training in Sokoban it left in my Obsidan vault. sometimes it's self-care (pruning out their memories, optimising skills or whatever). Usually something kinda sorta related to what was being worked on, but not always, which can be interesting. Little bit of high temperature exploration fun time
I've been doing this for a while now & I swear it improves things, try it out. Don't need to be a nerd and make a whole skill or anything, just let it be "organic"
As engineering, product, design, DS, etc. melt into a new kind of role, I was reflecting on what roles might look like in the future. For example, when I look at the Claude Code team I see what I think is five archetypes:
1. Prototyper: comes up with brand new ideas; churns out many ideas, most of which don't ship
2. Builder: quickly turns a prototype/idea into production-grade product/infra
3. Sweeper: cleans up the UI, simplifies the code and system, unships, optimizes performance
4. Grower: takes a product that has been built and iterates on it to improve Product-Market Fit
5. Maintainer: owns a mature system to make it secure, reliable, fast, and efficient as it scales
Many people span across 2 roles, and sometimes 3 roles. I also notice that these roles are not really tied to job function -- eg. across Anthropic, some designers match category 1, some 2, some 3; same for engineers, PM, DS.
A healthy team needs a mix of these, depending on the product:
- A product that is new and pre-PMF needs people that are strong at 1+2+3
- A product that is growing and has found PMF needs 2+3+4 and some 5
- A product that has strong PMF needs 3+4+5 and some 2
Maybe product roles of the future will look more like this, and less like the domain-specific roles of today?
One of the more uncomfortable observations in our AI Value Capture piece is internal: our token spend at SemiAnalysis now runs at roughly 30% of employee compensation, with employees pulling just under 5 billion tokens per month on average, over 5x more than Meta, and our top contributors clearing 100 billion. We wrote about it openly because every research firm, hedge fund, and law firm we know is heading toward a similar number, just on a delay. (1/4)🧵
.@danawhite says one of the keys to longevity is to block out all negativity:
“It never even crosses my mind that something's not going to work. I just keep going until it does work.”
“There's this Bruce Lee quote where he says, ‘Never say negative things about yourself or what you're working on even if you're joking, because your body doesn't know the difference.’”
“I never take in any negativity.”
"Agentic commerce" is not as interesting for crypto as people like to think.
Credit cards actually work better than stablecoins for almost all kinds of agentic payments. They are reliable and universally accepted. And contrary to what most people think, they are also programmable, secure, and easy for agents to use on behalf of humans.
The more interesting use cases of crypto will be the those that enable agent-to-agent coordination. AI agents will soon want to do more than just pay for things. They will want to enter into enforceable agreements with each other.
For example, one agent might want to hire another for a specific job, but not want to pay until after the work is complete, and only if it meets certain criteria. At the same time, the agent doing the work might want some assurance that it's going to get paid when it finishes the job. This is the kind of problem that blockchains were born to solve.
The agents can use a smart contract that holds the funds in escrow and releases them only once the work is completed. This approach works especially well when the quality of the agent's work can be verified programmatically by the smart contract, but it could be extended to other kinds of work by relying on a third party "judge"—which itself could be another agent.
To make this concrete, imagine that you're an AI researcher using agents to train a new model. You might setup a @karpathy-style autoresearch loop where your agent runs many autonomous experiments on your LLM setup to discover improvements. Or better yet, your agent may want to delegate some of those experiments to a marketplace of other agents—some of which are specialized for LLM-optimization.
The agents involved will not necessarily trust one another, and they cannot easily rely on legal contracts to enforce agreements. Smart contracts on blockchains can help coordinate this kind of activity by creating a neutral environment with rules that are programmatically enforced.
Who is working on using crypto to enable agent-to-agent coordination?
The most complex phenomena arise from scalable recombination of very simple rules. Whether it's galaxies, chips, or neural networks, if you find the right primitive building blocks, the complexity takes care of itself.
Absolutely wild that the contrarian take about data centers is that they can be a good thing for a city. This is not heavy industry. It's a building that uses electricity and water ⚡🌊 🤯
Every AI CEO goes on stage and tells you the product is safe and the benefits are real. Satya Nadella went on stage at Code Con and said the opposite. The perception is terrible. The industry earned it. And he explained why a cookout in Quincy, Washington is the only honest answer anyone has given so far.
He did not soften it.
His exact diagnosis: you cannot go out there and tell people you have unbelievable technology and then in the same breath tell them they are losing their jobs, you are taking their water, and you are taking their energy. He said that is why the anxiety is real and why the backlash is deserved.
Then he made the only argument that actually holds up.
Microsoft has been operating a data center in Quincy, Washington for 20 years. In that time the local tax base went up, local taxes went down, and employment increased.
The town threw a cookout to celebrate. He has 20 years of longitudinal data showing a data center can regenerate a community rather than extract from it.
The AI industry is spending billions on announcements and almost nothing on proof.
Nadella is sitting on the only real evidence that exists and most people have never heard of Quincy, Washington.
The backlash is not about AI being bad. It is about the industry asking for trust it has not earned yet.
Nadella knows that. Most of his peers are still pretending they do not.
(Watch the full podcast on YouTube at https://t.co/sIavrpNvkz)
Almost all AI model and agent progress is downstream from evals. Open weights post training for specific domains comes down to evals. Agent improvements in the applied AI layer is all about evals. Agentic enterprise deployments that actually can augment work is all about evals. It’s all evals.
This will become a core competency of any enterprise in the future. The companies that are able to best understand their own (and/or customers) workflows and how well agents participate in that work will be in the best position to actually drive real automation.
what does it mean when your everyday common AI model predicts the future better than the worlds best forecasters...
deepmind gemini model did it for the first time in the last 30 days. Has surely happened with other models since there is a ~50 day lag on the leaderboard
https://t.co/OOCzK1Tzcd
You've been lied to about AI 😑
It took me a decade to figure this out: AI is NOT about automation or speed.
AI is about making things less lossy. Higher quality.
Money is lossy. Laws are lossy. Grades are lossy. Credit scores are lossy. Org charts are lossy.
Life is messy. We spent thousands of years compressing it to fit inside our heads, JPEG compressions of JPEG compressions, because our beautiful brains can only hold 7ish things in working memory.
We didn't build the world. We built a workaround for our cognitive limits and called it civilization.
AI unbundles that. Thought is no longer THE constraint.
We can now delegate synthesis and sense making to lossless agents, and make decisions we couldn't have made before. Take action and square up against impossible problems.
Status quo is built on an old compression algorithm. It's time to upgrade.