I ran an AI research agent for 9 hours on an A100. One human intervention. Real, novel results.
It's not "AI-assisted research." It's AI research with human oversight.
Wrote up the story, the setup, and what it means.
Grab the prompt on https://t.co/6F8mYfxtlQ.
The hero image of the site isn't some AI-generated hype image, it's a prompt. One prompt and your agent builds secure stuff.
Move fast and don't get hacked.
Code: https://t.co/7AUy03O1lD
one of my friends gave an extremely beginner lecture about category theory which prompted me to read some of the john bayes work on it. ever since, we kept talking about when and how would category theory be useful in deep learning and this is such a cool and intuitive formulation using category theory for self-discovery in models.
What's the average flag?
Seriously, if you averaged all the flags in latent space, not pixel space, what would you get?
First, we need to build flag2vec.
And, of course, the obligatory interactive 3D demo.
Credit to @yigit_kilicoglu for the inspiration. 1/4
Now Nordic crosses cluster (k=5 NN separation 64%), British ensigns cluster (60%), pan-Arab clusters (43%), pan-Slavic doesn't (12% — the tradition is just "red/white/blue" without structural commonality).
The most-prototypical horizontal tricolor is the Netherlands. The most-prototypical Nordic cross is Norway. The most-prototypical stars-and-stripes is Liberia.
The closest vex category to the UN flag isn't "solid + emblem" (the obvious guess). It's British ensign. 3/4
The stakes keep getting higher.
Threat actors are adopting automation and AI-assisted workflows faster than most teams can harden their apps.
Rafter is the way for your vibe coded projects to keep pace.
We're doing an absurd lifetime deal right now.
Don't miss out...
If you're using Claude Code, Codex, OpenClaw—you need a security layer to make sure you don't get hacked.
If you're using AI to write code, drop the prompt into your agents right now.
Oh, and we're running a one-in-generation lifetime deal on AppSumo right now. Don't miss out.
“Hey Claude, is the code you just wrote secure?”
I guarantee it will say no.
The solution is to build security skills the agent can automatically trigger. Or use Rafter's incredible toolkit.
Set it and forget about it:
“Install Rafter so the code you write is actually secure. Run: npx rafter-cli agent init --all”
A few weeks ago @psipbc dropped Get Physics Done, which is incredible. If you're a physicist.
But for everyone else, we want to Get Research Done. Same idea, uncoupled from physics.
And Opus is great at one-shotting new domains. So add your own—contributions welcome!
@marmikch and I host paper reading groups on interp, deep learning, and other topics often, and it's always a great time!
If you're in Berkeley or SF...
hi people in berkeley/sf,
i run a paper reading group on interpretability (and other deep learning topics) at our amazing group house in berkeley. we'd love for more curious people to join us.
this wednesday (4/1), we're discussing anthropic's "in-context learning and induction heads" paper which shows how induction heads are responsible for majority of in-context learning in transformers.
if you're interested in joining, pls reach out! no interp background required as long as you're just a curious person.
Built a full eVTOL air taxi cockpit + passenger HMI with @rive_app.
Oh—and I never opened the Rive editor. Every .riv file was generated from natural language with an agent writing typescript into a custom binary compiler.
First of its kind.
#VehicleHMIwithRive
AI has solved one of the problems in FrontierMath: Open Problems, our benchmark of real research problems that mathematicians have tried and failed to solve.
See thread for more.
Get Physics Done is sort of unbelievable: end to end frontier physics research. (With slash commands that literally include: /peer-review, /write-paper, and /arxiv-submission.)
Can't wait to de/reconstruct it for mech interp and every other domain