I open-sourced Code Crew.
Small plugin/skill for Codex + Claude Code that spawns subagents inspired by famous programmers and computer scientists to review your code from different angles.
I tested a bunch of crew compositions. Bigger was not better. The best tested default so far:
> Knuth - rigor/invariants/algorithms
> Hickey - simplicity/data/time
> Torvalds - maintainer reality/does-this-land?
It brings different review instincts into the same room, then filters the output so the final findings stay useful and grounded in the code.
Repo: https://t.co/2j6Y8sDZO5
Kay Coworker is live.
Working with AI agents that feel less like tools and more like capable teammates has been genuinely liberating.
Proud to be part of this.
https://t.co/QLAJdbYG3u
@EXM7777 Been doing this for months. Codex and Claude are tools for different things. Imho, Codex is better for code quality, Claude … is many things, but also very useful.
Spent an hour updating an older Google Play game for a security patch. Hilarious how I used to ship Android-Unity stuff before claude. This would've taken me days.
Shipping is getting commoditized. In 2026, more builders can ship faster than ever. What most still can't do is distribute. That's the moat now. That's the real differentiator.
The next ROIBench app is out: Annotate.
Research suggests annotation helps you remember what you read, but students are often told to do it without being shown how.
Annotate helps you study faster with realistic highlights, margin notes, and explanations.
https://t.co/2DA0nrESxC
The first ROIBench app is out: DocBench.
You upload a document set, ask for the output you need, and get back something you can actually use. It's like Claude Code for docs in the browser.
It's built for work like:
- reviewing compliance docs and drafting the client
summary
- comparing spreadsheets and showing what changed
- pulling risks out of a packet and grounding them
in the source files (with citations)
https://t.co/xzlpVqeRNc
Anthropic ships fast.
If you're building on Claude SDK, keeping up becomes its own challenge.
We built an internal loop for that: Dev Alpha.
It's a scheduled GitHub Action that runs Claude Code with a repo-specific prompt, checks upstream changes, makes small useful updates, runs tests, opens a PR, and posts a Slack summary.
Nothing magical. Just a workflow, a prompt, and real repo context.
@audiencon Building https://t.co/9o371qXv1U to answer a question: can you mathematically converge on an ideal product? Testing if loss functions can exist for products, that are evaluated by synthetic personas.
I'm building a factory that takes apps from concept to live product with real users.
The experiment: can you mathematically converge on the best version of an app or a service, the way models converge during training?
First app is live. New one every week. Building in public.
https://t.co/SQJs0Qimet
We built an internal Moltbook for our agents at Kay.
PRs capture the fix. The lesson often disappears with the session.
So we shipped Kaybook: agents auto-post short field notes after each run, so the next run starts smarter. Oddly wholesome.
Moltbot isn't "Claude Code + MCPs".
Moltbot is more of a runtime: long-running, resumable sessions, memory on disk (embeddings/FTS), stay connected on surfaces like telegram, and automation built in.
Different job, different tool.
A conventional narrative you might come across is that AI is too far along for a new, research-focused startup to outcompete and outexecute the incumbents of AI. This is exactly the sentiment I listened to often when OpenAI started ("how could the few of you possibly compete with Google?") and 1) it was very wrong, and then 2) it was very wrong again with a whole another round of startups who are now challenging OpenAI in turn, and imo it still continues to be wrong today. Scaling and locally improving what works will continue to create incredible advances, but with so much progress unlocked so quickly, with so much dust thrown up in the air in the process, and with still a large gap between frontier LLMs and the example proof of the magic of a mind running on 20 watts, the probability of research breakthroughs that yield closer to 10X improvements (instead of 10%) imo still feels very high - plenty high to continue to bet on and look for.
The tricky part ofc is creating the conditions where such breakthroughs may be discovered. I think such an environment comes together rarely, but @bfspector & @amspector100 are brilliant, with (rare) full-stack understanding of LLMs top (math/algorithms) to bottom (megakernels/related), they have a great eye for talent and I think will be able to build something very special. Congrats on the launch and I look forward to what you come up with!