It's here: We just hit superhuman performance on AI kernel optimization!
Real customer models & production settings. Not toy problems (what I typically see).
This is the year that Claude writes its own kernels, Codex its own kernels, for every new GPU that it wants to run on -- something that takes months to port between GPU generations today.
This has a massive impact to scaling intelligence. More compute means getting the next frontier model sooner.
@EthanHe_42 Neural architecture search as it existed then is such a weak version of this that it's in its own category of totally useless by comparison.
This is an *actual* LLM writing arbitrary code, learning from previous experiments, with access to the internet. It's not even close.
Why do coding agents work so well and what would it take to replicate their success in other domains? One important and under-appreciated reason is that agentic coding is a type of neurosymbolic AI.
The main weakness of LLMs is that they are statistical machines and struggle at tasks involving long chains of logic / symbol manipulation. Of course, traditional code is the opposite. The magic of agentic coding is that it fuses the two — there is a lot of code *execution* during code generation. This is a subtle point so let me spell it out.
* Most obviously, agents run the generated code itself, run tests, etc. This makes coding a verifiable domain. It is well known that in verifiable domains, inference scaling is highly effective as agents can fix their own mistakes. It also allows reinforcement learning to be highly effective.
* Next, code generation often takes advantage of existing symbolic tools like compilers that have been optimized and perfected over decades. Imagine if LLMs had to directly output binary code instead. (They sometimes can, and it's a cool trick, but it's no way to do software engineering.)
* IMO the biggest neurosymbolic unlock is the shell, which allows a dramatic expansion in capabilities by using existing tools to effectively do complex editing tasks. Many of us remember the feeling of wizardry when we gained shell fluency. LLMs are able to pick up shell knowledge and best practices through pre-training because it is extensively documented on places like StackOverflow.
* Finally, more complex agentic coding tasks often involve LLMs writing code that in turn invokes LLMs. In principle you can have an arbitrary depth of recursion between statistical and symbolic systems.
Neurosymbolic AI is a touchy topic and many people have their own favored conception of what it should look like. And admittedly agentic coding uses really crude patterns, with LLMs and code being loosely coupled. But the point is — it works! LLMs are able to use the giant warehouse of tools that humans have built over the decades to reach ever-increasing levels of abstraction and complexity.
To build agentic systems in other domains, here’s what we need. First, it must be a verifiable domain. Math is and writing isn’t. There’s no getting around that. Provided we’re in a friendly domain, it all comes down to whether we can build a symbolic toolbox, and how well LLMs can be trained to use that toolbox. IMO this is where the alpha will be, more so than in LLM capabilities themselves.
Consolidated thoughts on Venezuela:
Venezuela matters to the oil market, but not because “big reserves = easy barrels”
Venezuela is not Saudi Arabia. You don’t stick a toothpick in the ground and get 1m bpd
Much of Venezuela’s crude is ultra-heavy and extremely viscous, requiring diluent and complex processing just to move
Extracting it (Maracaibo especially) is technically brutal and environmentally disastrous
Oil under the ground is irrelevant. Only oil that reaches a refinery matters
Today Venezuela pumps ~1m bpd and exports most of it because the domestic economy is broken
Historically, Venezuela produced over 3m bpd, with exports peaking in that range decades ago
Even in the most extreme bearish fantasy - Venezuela fully opened, perfectly run, “51st state” scenario - this is slow
It would still take 3-4 years just to restore fields enough to pump 3-4m bpd
A functioning economy of that size would also consume ~2m bpd domestically
So exports go from ~1m today (1 pumped - ~0 consumed)
To maybe ~2m in the future (4 pumped - 2 consumed)
Net change: exports increase by ~1m bpd
Global oil demand is ~105m bpd
So even in the most aggressive case, this is <1% of global consumption
Yes, oil demand growth is slowing, so +1m bpd is marginally bearish
It knocks a couple dollars off flat price - not a collapse
And none of this happens quickly
Big oil projects are long-cycle: engineering, remediation, contractors, housing, schools, logistics
These things take years, not months (look at Guyana)
This is not “another million barrels tomorrow”
It’s maybe +1m, max +2m, in ~3 years
Which means this isn’t really about future supply at all
This is about the existing ~1m bpd Venezuela already produces today
Right now, a lot of those barrels flow to China at steep discounts
Venezuela also cut strategic deals with Russia and Iran - weapons, influence, Western Hemisphere presence
Removing sanctions changes that dynamic
China loses access to heavily discounted Venezuelan crude
Venezuela can sell on the open market at market prices
Chinese oil gets more expensive
More importantly: if China invades Taiwan, it loses a guaranteed Venezuelan supply backstop
Those barrels would have to be replaced on the open market
And under US sanctions, they won’t reach China
Russia doesn’t benefit on the crude side - it’s already maxed out on buyers
If anything, China becomes more dependent on Russian oil, improving Russia’s pricing power
What Russia does lose is refining arbitrage
Venezuela’s refining system collapsed under Maduro
Crude was sent abroad, refined, and shipped back as gasoline, diesel, and jet fuel
Those flows unwind
Russian refiners lose a customer
Product flows reroute modestly
Net-net: marginal oil impact, meaningful geopolitical leverage
The real objective is leverage over China
And denying Russia and China the ability to embed military systems on America’s doorstep
Imo this is along the lines of how talking to an LLM via text is like typing into a DOS Terminal and "GUI hasn't been invented yet" of some of my earlier posts.
The GUI is an intelligent canvas.
i think LLMs are obviously not conscious because there was no selection pressure for them to be, but rather to mimic byproducts of consciousness
humans are conscious because it was evolutionarily useful for us to be
@theallinpod And what are people going to do with a trained model .pt file ? (ie, the "algo") How will they interpret it exactly? Sounds like your imagining it is an old fashion deterministic, structured, program.
OpenAI realesed new paper.
"Why language models hallucinate"
Simple ans - LLMs hallucinate because training and evaluation reward guessing instead of admitting uncertainty.
The paper puts this on a statistical footing with simple, test-like incentives that reward confident wrong answers over honest “I don’t know” responses.
The fix is to grade differently, give credit for appropriate uncertainty and penalize confident errors more than abstentions, so models stop being optimized for blind guessing.
OpenAI is showing that 52% abstention gives substantially fewer wrong answers than 1% abstention, proving that letting a model admit uncertainty reduces hallucinations even if accuracy looks lower.
Abstention means the model refuses to answer when it is unsure and simply says something like “I don’t know” instead of making up a guess.
Hallucinations drop because most wrong answers come from bad guesses. If the model abstains instead of guessing, it produces fewer false answers.
🧵 Read on 👇