I disagree.
Companies need agents that can do things and the way to do things is having access to data.
It doesn't matter what skill library you have without a way to actually utilize it.
AI tool you should know: /grill-me
1) Go to github[.]com/mattpocock
2) Click skills
3) Scroll down and click grill-me
4) Click the download raw file button on the right
5) Head to Claude or Codex
6) Upload the .md file you downloaded
7) Start any task with /grill-me and get interrogated about any plan or decision in a way that leaves no stone unturned and your best ideas & thoughts are pulled out of you.
I love using it for my daily to-do list.
I run "/grill-me about my todo list for the day" and it forces ruthless prioritization as well as clarity of thought around how I spend my time. Some questions it asked me:
Prioritizing questions:
- What's on your list for today?
- Or should I pull it from one of your tools instead?
- How much real focus time do you actually have today (outside meetings)?
- If the day blows up at noon and you only ship one of these four, which one is it?
Clarifying questions:
- What's the actual deliverable for this task today? And what makes it "done" today?
- What's your best estimate of how long this will take? And how confident are you in that timeline?
AI tool you should know: /grill-me
1) Go to github[.]com/mattpocock
2) Click skills
3) Scroll down and click grill-me
4) Click the download raw file button on the right
5) Head to Claude or Codex
6) Upload the .md file you downloaded
7) Start any task with /grill-me and get interrogated about any plan or decision in a way that leaves no stone unturned and your best ideas & thoughts are pulled out of you.
I love using it for my daily to-do list.
I run "/grill-me about my todo list for the day" and it forces ruthless prioritization as well as clarity of thought around how I spend my time. Some questions it asked me:
Prioritizing questions:
- What's on your list for today?
- Or should I pull it from one of your tools instead?
- How much real focus time do you actually have today (outside meetings)?
- If the day blows up at noon and you only ship one of these four, which one is it?
Clarifying questions:
- What's the actual deliverable for this task today? And what makes it "done" today?
- What's your best estimate of how long this will take? And how confident are you in that timeline?
I tried out dynamic workflows and didn't get much out of them at first. I spent today trying to see what I was doing wrong and learned a bunch.
Mainly, I found it so useful for adversarial agent tasks. I just used it to review a 7 PR stack. Here's how I approached it:
Claude really likes his own work, and that makes it tricky to get quality reviews, even with other standalone agents.
However, splitting the review into many focused segments with ultracode is killing it for me.
Here's my process:
1 - Define the task, often reviewing a plan or code diff, and the main things to care about. Think of this as the PR description.
2 - Define the swimlanes for the reviewers. For me this comes in a few flavors, but it's often correctness, code duplication, safety, maintainability, etc. This is where you should put the most brain power into developing your own. (I like to add a philosophy section in my skill to get the agent in the right headspace.)
3 - Turn on ultracode (I hope you already did this).
4 - Tell the model to review what it has first (the diff, the plan, the doc, etc.), then create a workflow to cut up the reviews as thinly as possible to achieve what you want, and to have them be incredibly adversarial. I often like to say "have them be super mean."
5 - Then (my favorite) include that each agent must come back with a way to verify their findings.
6 - Finally, end with the main Claude verifying all the findings once the list comes back, then prioritizing and presenting them to you.
My advice would be to copy this tweet, give it to Claude, and add your own flavor and goals for what you're actually reviewing. So far I have seen this catch bugs, produce cleaner code, and stop me from having 28 date formatters across my codebase.
We just launched Sites into Codex!
Software creation was always about more than writing code. Sites in Codex fundamentally gives the power of end-to-end software creation to every user, no matter their technical fluency.
These Sites are fully deployed to a URL, private to workspaces, come with authentication, can have static files, and can store dynamic data in databases.
It is in preview for business and enterprise teams and will be rolling out to all workspaces over the next day. Give it a try by typing @ Sites into Codex and ask it to build anything!
This project took a massive amount of effort across hundreds of people at OpenAI - proud that we were able to get this out and excited to see what you all build with it!
Most enterprises step on rakes as they try to implement AI.
And it's mostly not their fault.
The technology is moving nauseatingly fast, burning everything down and rebuilding isn't always practical or productive, and making sure you're bringing your people along for the ride is mandatory, but far from simple.
When execs asked me, "What should I do?" my response typically starts with understanding what good looks like by studying the exceptional few that are doing AI transformation the right way.
One company worth studying is @SharkNinja.
This $16.2 billion, 4,200-person consumer products behemoth has figured out how to be nimble like a startup, while having the resources & global footprint of a goliath.
A tactical way they've been able to pull this off is through Jailbreak Live, a four-day company-wide AI hack, where employees across all ranks & functions learn & build with AI to reimagine this 32-year-old company.
Why is it so damn good?
Because it's the perfect example of how leadership can truly be a partner to its people in a post-AI world.
Step 1: Leadership gives everyone the time, space, and tools to understand how AI works & then reimagine core products & processes in a psychologically safe, supported way. Part of the support comes by bringing in AI swat teams (like our team @tenex_labs) to provide individualized/group applied AI guidance to SharkNinja employees.
Goal: Show (vs. tell) that the c-suite is authentically investing in its employees to be supercharged (vs. replaced) in a post-AI world through resources, tools, time, and expert support.
Step 2: JailBreak results in hundreds of incredible AI use cases/product ideas geting bubbled up by talented people, who sit closest to the work and the customer.
Goal: Opensource the creative & ideation process, so that opportunities for the company aren't limited to what's discussed in an ELT meeting.
Step 3: Leadership bookends this process by prioritizing & resourcing into the use cases/ideas that have opportunity to be productionized and scaled across functions and the company.
Goal: Further invest resources meritocratically in the ideas that could have the most outsized impact on the company's growth. This means prioritizing and focusing the organization on key AI initiatives, investing $ (through people and technology) in executing those initiatives successfully, and then further supporting employees so that they are enabled by the output of these initiatives.
This process & SharkNinja's JailBreak Live is going to become the gold standard for how companies run & execute empathetic and effective AI transformation for years to come.
P.S. I want to give a massive shoutout to our team at @tenex_labs, who helped make this AI hackweek initiative wildly impactful. It was a full team effort and the results were profound.
I may be crazy, but I built a 20-level excel game to find a Finance savant to join our company.
The game is called "Bug Hunt," and any excel junkie interested in becoming the leader of our Finance function at @tenex_labs can play.
If you complete all 20 levels, you are accelerated to a final round interview with my cofounder & me.
Here's how it works:
1) Open the model. It's a live workbook in your browser with the "finished" financials of a fictitious SaaS company.
2) Mark every bug. Click any cell, write one line of reasoning. Submit when you're sure. There are 20 total.
3) Climb the tiers. Each correct catch unlocks the next. The final three are veteran CFO-level.
4) Hit level 20 & auto-move to a final round interview.
Play the game: https://t.co/75eDEEjTIt
P.S. you can still apply to be our Senior Director of Strategic Finance (application below) the normal way, it's just a little less fun & you don't get an auto-invite to final round.
@thsottiaux Diffs with repos one directory down. Would love to have a parent folder with three repos in it and see the diff in each. Think Cursor/VSCode git extension
Anthropic released Claude Design TODAY and it's now accessible at https://t.co/BrFOt3Fqjz
I spent the last hour giving it a first look, and shared my thoughts and results in the video below.
This is a BIG drop. This is a new design surface from Anthropic, and it changes what "AI design" means.
Short version: Claude can now design. Not "describe a design." Not "generate an image of a design."
Actual production work — prototypes, wireframes, high-fidelity mocks, slide decks, landing pages — editable, on-brand, and ready to hand off.
Here's what stood out on first look:
→ Real design surfaces
Prototypes, wireframes, hi-fi, and slide decks — each with templates and proper structure, not just pretty screenshots.
→ Comment-based edits
Leave a comment on any element and Claude revises it. This is the Figma-style review loop, with the designer replaced by a model that works at 3am.
→ Brand design systems
You can feed it your system — colors, type, components — and it actually respects it. On-brand output, not generic AI slop.
→ Export anywhere
PDF, PowerPoint, Canva, standalone HTML. Plus a built-in handoff straight to Claude Code for engineers to implement.
→ Import from real tools
Figma, GitHub, and captured web elements come in as inputs. Your existing work is the starting line, not the discard pile.
→ Collaboration
Share links for view / comment / edit — the exact tier system teams already expect.
What I tested on Opus 4.7:
• A 5-slide deck generated from a single screenshot. Claude asked clarifying questions BEFORE generating and shipped speaker notes by default.
• A landing page build. Solid first pass, real components, real layout logic.
• Multiple chats running concurrently. You can parallelize design work across threads like a small team.
Why this matters:
PMs, founders, marketers, and non-engineers can now create designs that engineers can actually ship with production-ready output and a claude code handoff built in.
The gap between "I have an idea" and "here's a working prototype with my brand applied" just collapsed to minutes.
Full walkthrough, live demos, exports, and honest takes on where it breaks below.
P.S.
• This is an Anthropic Labs product — NOT GA yet.
• Claude Design is currently webapp only (no API), and does not yet support the Analytics API, Compliance API, or cost/usage reporting.
• Availability:
– Default ON for Pro / Max / Team
– Default OFF for Enterprise
Enterprise admins can toggle it on via RBAC in console
(comes with a ~$20/user initial credit).