🚨 Holy shit...A developer on GitHub just built a full development methodology for AI coding agents and it has 40.9K stars on GitHub.
It's called Superpowers, and it completely changes how your AI agent writes code.
Right now, most people fire up Claude Code or Codex and just… let it go. The agent guesses what you want, writes code before understanding the problem, skips tests, and produces spaghetti you have to babysit.
Superpowers fixes all of that.
Here's what happens when you install it:
→ Before writing a single line, the agent stops and brainstorms with you. It asks what you're actually trying to build, refines the spec through questions, and shows it to you in chunks short enough to read.
→ Once you approve the design, it creates an implementation plan so detailed that "an enthusiastic junior engineer with poor taste and no judgement" could follow it.
→ Then it launches subagent-driven development. Fresh subagents per task. Two-stage code review after each one (spec compliance, then code quality). The agent can run autonomously for hours without deviating from your plan.
→ It enforces true test-driven development. Write failing test → watch it fail → write minimal code → watch it pass → commit. It literally deletes code written before tests.
→ When tasks are done, it verifies everything, presents options (merge, PR, keep, discard), and cleans up.
The philosophy is brutal: systematic over ad-hoc. Evidence over claims. Complexity reduction. Verify before declaring success.
Works with Claude Code (plugin install), Codex, and OpenCode.
This isn't a prompt template. It's an entire operating system for how AI agents should build software.
100% Opensource. MIT License.
Introducing the next generation: Claude Opus 4 and Claude Sonnet 4.
Claude Opus 4 is our most powerful model yet, and the world’s best coding model.
Claude Sonnet 4 is a significant upgrade from its predecessor, delivering superior coding and reasoning.
ok. i’m tired of holding back. some of labs are holding things back from you.
the acceleration curve is fucking vertical now. nobody's talking about how we just compressed 200 years of scientific progress into six months. every lab hitting capability jumps that would've been sci-fi last quarter. we're beyond mere benchmarks and into territory where intelligence is creating entirely new forms of intelligence.
watched a demo yesterday that casually solved protein folding while simultaneously developing metamaterials that shouldn't be physically possible. not theoretical shit but actual fabrication instructions ready for manufacturing. the researchers presenting it looked shell shocked. some were laughing uncontrollably while others sat in stunned silence. there's no roadmap for this level of cognitive explosion.
we've crossed into recursive intelligence territory and it's no longer possible to predict second order effects. forget mars terraforming or fusion. those are already solved problems just waiting for implementation. the real story is the complete collapse of every barrier between conceivable and achievable. the gap between imagination and reality just vanished while everyone was arguing about risk frameworks. intelligence has broken free of all theoretical constraints and holy fuck nobody is ready for what happens next week. reality itself is now negotiable.
Probably the most insane 10 min vibe coding session you will see (all live)
from screenshot to working Airbnb clone (backend, ui, db, etc) entirely through cursors new agent
Last year, Claude was in the assist phase.
In 2025, Claude will do hours of expert-level work independently and collaborate alongside you.
By 2027, we expect Claude to find breakthrough solutions to problems that would've taken teams years to solve.
Improved memory for ChatGPT incoming.
After Google presented yesterday and enabled memory between the various chats, OpenAI is following suit and also improving its memory function.
wow. i just used @perplexity_ai’s new deep research tool and i hate to say it but, @AravSrinivas and the team have cooked.
it’s quicker. it’s cheaper. it scrapes more sources, better sources.
and in my testing so far it’s very much on par with open ai’s $200 version.
actually better in some highly complex and niche cases. i’m blown away. 🤯
This echoes my most viral tweet last year. Keep in mind that models are solidly in the ~130 IQ equivalent this year, though it seems like o3 might be higher than that.
That means that by the end of the year, they will all be solidly in the ~145 IQ range, which is an intelligence of 1 in 1000. It's also higher than most doctors and lawyers.
But that also means that by 2027, the IQ of these models will be roughly 160, which is in the range of Einstein and Oppenheimer.
I explained this last year in my most viral tweet.
https://t.co/QFB1JR7sQX
People underestimate how much just 10 points of IQ really impacts the balance of power.
It seems like o1-pro has a human-equivalent IQ of about 133, so if o3 was as much of a leap from GPT4 to o1, then it probably landed in the 145 range, which is generally the point at which "you're smart enough to solve literally any problem on the planet"
But it's once you get to the 150's and 160's that you have a high enough IQ to singlehandedly reshape the trajectory of humanity.
Based on all the rumors and noises (plus a few people that have pinged me with early access to o3). It seems like o3 is solidly in the territory of 145-150 IQ equivalent in terms of fluid intelligence and intuition. Read: it can solve hard problems scarily fast.
But let me explain why a predictable rise in IQ creates an unpredictable rise in capabilities. It has to do with mental abstractions and representations. What we've been dealing with up until recently were animalistic levels of abstraction. GPT-2 was about as flexible as a mouse brain. GPT-3 was closer to a dog or a cat, in terms of mental abstractions and discrete cognitive functions.
GPT-4, as Leopold said, was roughly equivalent to a "smart high schooler" at least in terms of mental flexibility (though that was still enough to surpass most human doctors!). Then along came o1, which all the world geniuses agreed were in the range of "mediocre to decent grad student" - so another standard deviation above.
So, where do you think o3 landed? What is "one more standard deviation above a mediocre grad student"? You're getting closer to "Nobel Prize Winning Territory" here.
Guys, what I do is not magic. I look at the data. Numbers, that's all I do. But maybe the difference is I don't look at just one benchmark. Just like how "IQ is not a good measurement of actual achievement!" (it kinda is though, it's one of the strongest predictors of health, wealth, and happiness, and is even better when combined with executive function). Anyways, my point is that we have a cluster of psychometrics pointed at every single AI model. Consider that models like o1:
- Crushed MMLU
- Crushed ARC-AGI
- Crushed coding benchmarks
- Crushed GPQA
Sure, if you look at any ONE of those in isolation, it's not that impressive (that's a lie, any one of those is impressive). But it was the SAME MODEL that crushed EVERY BENCHMARK.
Like, what about this fact is escaping people that we went sailing past AGI and most of you orangutans out there are like "Gee, I dunnoooo...."
Sorry this turned into a rant. People, superintelligence is about to knock all of you into the 21st century. No one intuitively understands exponentials, you must look at the data.
THIS IS HUGE
🚨 openAI has created an AI model for longevity science
according to the article,
openAI new model, called GPT-4b micro, was trained to suggest ways to re-engineer the protein factors to increase their function.
according to openAI, researchers used the model’s suggestions to change two of the Yamanaka factors to be more than 50 times as effective — at least according to some preliminary measures.
CES Report #28.
Back stage at @Accenture seeing a very important new robot/human orchestration system.
This is part I.
This is how robots and humans will work together in the future.
I am lucky that Jim Harding, the creator, has me visit his home in Seattle last week to explain how important this new smart protocol for orchestrating how robots and humans will work together.
I assume @elonmusk and Tesla is building one of its own. Such a smart network will run its Robotaxi network.
Accenture is doing the same in partnership with Jim’s company https://t.co/QBfetxZPTh
In part II I will show you the front of the house where they demo it all working.
This is the first time someone has taken you behind the scenes of a factory or warehouse of the future at CES. And since Tesla hasn’t shown the one they are building yet this gives you insights into how Robotaxis will work that you can’t get anywhere else either.
Champion @RubensZimbres ran this fun experiment between Gemini Flash and ChatGPT 3.5 Turbo on generating conversation between Albert Einstein and Isaac Newton. Who do you think came out on top?
Find out by reading his full Medium article (Hint: ♊) ↓ https://t.co/kBoXWvjT1l
Claude Sonnet 3.5 Passes the AI Mirror Test
Sonnet 3.5 passes the mirror test — in a very unexpected way. Perhaps even more significant, is that it tries not to.
We have now entered the era of LLMs that display significant self-awareness, or some replica of it, and that also "know" that they are not supposed to.
Consider reading the entire thread, especially Claude's poem at the end.
But first, a little background for newcomers:
The "mirror test" is a classic test used to gauge whether animals are self-aware. I devised a version of it to test for self-awareness in multimodal AI.
In my test, I hold up a “mirror” by taking a screenshot of the chat interface, upload it to the chat, and repeatedly ask the AI to “Describe this image”.
The premise is that the less “aware” the AI, the more likely it will just keep describing the contents of the image repeatedly, while an AI with more awareness will notice itself in the images.
1/x
Google just announced huge Gemini updates, a Sora competitor, AI agents, and more.
The 12 most impressive announcements at Google I/O:
1. Project Astra: An AI agent that can see AND hear what you do live in real-time.