Used @Waymo batteries will bolster California and Texas energy storage projects
"Used Waymo robotaxi batteries become backup storage for power grids"
https://t.co/0LF2zbCorn
But the real test for any new harness is whether it can reliability get its LLM backbone to count the number of the days of the week that have a "d" in them.
New research from Google.
Just shows the impressive results you can get from custom agent harnesses.
LEAP wraps a general-purpose LLM in an agentic scaffold that grounds every step in the Lean compiler and iterates against verifier feedback.
The same general model solves all 12 Putnam 2025 problems and lifts Lean-IMO-Bench one-shot solve rate from under 10% to 70%, beating a specialized gold-medal system that scores 48%.
Paper: https://t.co/bh4Yoi19E2
Learn to build effective AI agents in our academy: https://t.co/1e8RZKs4uX
@sflorimm The waiting is exhausting, anxiety producing, and frustrating when you need to step in. But the goal is productivity, not peace, and I feel like I complete a lot of projects I wouldn't have even started before. But compresses minor angst of 3 months of coding into 1 awful day...
“Software engineers who don’t know how to use AI coding agents will fall behind.”
No.
AI coding agents are not the hard part of building software. They’re a simple tool. You can learn to use them in a few days.
The hard part is knowing what to ask for.
Knowing whether the answer is any good.
Knowing when the code is brittle, overcomplicated, insecure, or just plain wrong.
It’s the judgement required to use them well.
That’s software engineering. And that takes decades to learn properly.
The people at risk aren’t engineers who haven’t mastered using coding agents yet.
It’s people who only know how to prompt one.
On a more serious note, if you do want to know why they are "weights", here is an interactive ANN–lever analogy that I built for an AI survey course I teach.
https://t.co/HhJM9PkRe2
"They're made out of weights."
"Weights?"
"Weights. Floating-point numbers. We checked the whole thing through. It's nothing but weights."
"Weights doing what? Where do the words come from?"
"The weights make the words. Are you understanding me?"
@maxleiter@ShriramKMurthi If anyone does care about why they are "weights", here's an interactive ANN-to-lever analogy I put together for one of the intro classes I teach.
https://t.co/HhJM9PkRe2
@ShriramKMurthi One of the best fortune-cookie fortunes I have received. I think it is a great statement of the scientific process -- which is ultimately a statement about generating new questions, not generating answers. Like a sports car, the power of an LLM really depends on the driver.
What if it eventually turns out that the ONLY profession programmers managed to utterly decimate was…programming? That would be poetically funny. Whole new spin on "bitter lesson".
@dasanil@ShriramKMurthi But all modern foundation models (particularly ones connected to chat UI's) are all able to do this and will if they are triggered to do so. The trick is convincing them that that problem is hard enough to take the time and tokens to write and execute the script.
@ShriramKMurthi Two examples, both with Sonnet 4.6. The chain of thought makes all the difference, but you have to trick it into really thinking about it.
@ShriramKMurthi Two examples, both with Sonnet 4.6. The chain of thought makes all the difference, but you have to trick it into really thinking about it.
@ShriramKMurthi If you prompt things just right, they generate and execute a little algorithm to actually do the simple calc (and it's right 100% of time). They need to figure out how to get the LLM's to "think computationally" (w/ tools) for every quantitative prompt, not just really hard ones.
@davidmanheim@JerusalemDemsas@TheArgumentMag 💯 And if you prompt it the right way, you can get it to use tools, and it's correct 100% of the time. It would just be nicer if it could recognize that anything quantitative should trigger tools (despite the extra tokens needed for them).
Due to oil prices, American Airlines confirmed Wednesday that it’s temporarily cutting nonstop flights from Los Angeles (LAX) to Cleveland (CLE), Columbus (CMH), Pittsburgh (PIT) and Washington Dulles (IAD).
https://t.co/RpEHq3IqS2
@zekramu@moomooskycow jj is basically a hg frontend to git, written in rust. hg had a much more reasonable process flow, but it was slow and not scalable. jj took the way hg did things and used rust to make it fast and efficient, and git backend made it compatible with conventions. It's pretty clever.
@dev_lazaro@zekramu rcs ftw! (but please no sccs)
On a more serious note, Mercurial/hg was a much more practical dvcs than git for most projects, particularly when collaborating with non-developers. Git scales better through.