๐ Weekly AI news recap: Google I/O is a wrap, Anthropic acquired Stainless, and OpenAI solved an 80-year-old math problem.
1) Stainless and Andrej Karpathy join Anthropic
Anthropic acquired Stainless, the company behind every SDK. Stainless turns API specs into production SDKs, CLIs, and MCP servers. The vision is clear: AI is shifting from models that answer to agents that act. Agents are only as capable as the systems they can reach.
Andrej Karpathy, former OpenAI co-founder and Tesla AI Director, joins Anthropic.
2) The Erdลs Breakthrough
An unnamed OpenAI model disproved a long-standing assumption about the planar unit distance problem, posed by Paul Erdลs in 1946. The problem asks how many pairs of points in a plane can be exactly one unit apart. For 80 years, mathematicians believed optimal arrangements resembled square grids. The model proved them wrong, finding a proof that humans missed for eight decades.
3) Welcome to the agentic Gemini era
Google shipped over 100 updates at I/O. Gemini 3.5 Flash outperforms the previous Pro at 4x the speed, half the cost. It powers Spark and AI Search, the engine behind Google's agent push.
Gemini Omni creates video from any input with natural language editing and generates avatars, a digital version of you.
Gemini Spark is a personal agent on 3.5 Flash that connects to Gmail, Docs, and Slides and works after you close your laptop.
AI Search surpassed 1 billion monthly users. Google added background agents that monitor topics 24/7 and a generative UI that builds mini-apps inside results.
This week, all three left the chatwindow. Anthropic bought the AI-tools infrastructure. OpenAI aimed a model at unsolved science. Google turned everything into an always-on agent. Same direction, different approaches.
๐ Weekly AI news recap: Anthropic launched Claude for Small Business. OpenAI started a $4B deployment company. Google I/O starts Monday.
1) Anthropic shipped for small businesses and developers
Claude for Small Business launched as a plugin with connectors to QuickBooks, PayPal, HubSpot, Canva, and DocuSign. It included 15 workflows and 15 skills. Agent View shipped for Claude Code: one screen to manage all coding agents. Claude Platform arrived on AWS with full API parity, IAM auth, and a single AWS invoice.
The catch: Small Business is a plugin. It only works if those tools are in your stack. Agent View is a research preview.
2) OpenAI launched deployment, cybersecurity, and mobile Codex
The Deployment Company launched with $4B from 19 firms and acquired Tomoro's 150 engineers. Forward Deployed Engineers will embed in organizations. Daybreak launched as a cybersecurity product. Codex arrived in the ChatGPT mobile app.
The catch: the Deployment Company is just starting. Codex mobile and Daybreak are previews.
3) Google is saving its news for Monday
AI Pointer demoed a Gemini-powered cursor that reads on-screen context. But Google I/O starts May 19. After last weekโs Gemini 3.2 Flash leak, expect official model announcements across the Gemini lineup.
Three OpenAI launches mirror three Anthropic predecessors: last week's AI services venture, Daybreak follows Claude Security, and Codex mobile follows Dispatch. The model race is becoming an implementation race, and Anthropic is a few steps ahead.
Google gave the mouse pointer AI. Point at anything on the screen, and Gemini knows what you're looking at.
Google DeepMind built an AI-powered cursor using Gemini. Instead of dragging content into a chat window, you point at something and talk. It reads visual and semantic context, not just pixel coordinates.
In Chrome, you can point at any webpage element and ask Gemini about it. Select products to compare, point where to place furniture, mark code to fix.
What that looks like in practice: in the demo, someone edits a Google Doc by pointing and talking. The text rewrites in place. "Merge the columns." "Make this more human."
It's experimental. Chrome is rolling out and AI Studio demos are early. The concept is strong, but the polish isn't there yet.
You can try the demos in Google AI Studio right now. Worth five minutes.
๐ Weekly AI news recap: Anthropic taught agents to learn between sessions, OpenAI halved hallucinations, and Google accidentally revealed its new model.
1) Claude expanded on two fronts
Claude agents now learn between sessions through a feature called dreaming. They review past tasks and consolidate what worked. You define success, and agents evaluate their work against it. Multiple agents can split work across a task.
Claude for Excel, PowerPoint, and Word are now generally available, and Claude for Outlook is in public beta. Your conversation history is shared across all four apps.
Anthropic also launched a $1.5B AI services company with Blackstone, Goldman Sachs, and H&F.
The catch: dreaming is research preview. Office add-ins need a paid plan.
2) OpenAI shipped voice, accuracy, and Codex
GPT-5.5 Instant replaced GPT-5.3, with 52.5% fewer hallucinated claims on accuracy-critical questions. Now, ChatGPT searches your conversations, files, and Gmail to personalize answers.
GPT-Realtime-2 brings GPT-5-level reasoning to voice agents.
Codex now works directly in Chrome on macOS and Windows. It runs across tabs in the background without taking over your browser.
The catch: Realtime-2 is API only. Personalization requires Plus or Pro, and works on web only.
3) Google leaked and launched
The iOS app and AI Studio saw the appearance of Gemini 3.2 Flash without announcement. Coding performance is near 3.1 Pro at Flash pricing.
Nano Banana 2 now handles prototyping, from idea to visual mockup inside Gemini. It combines Pro-level image quality with Flash speed.
The catch: 3.2 Flash isn't officially launched. Benchmarks could shift before Google I/O on May 19.
No single AI lab dominated the news this week. Claude agents got smarter, ChatGPT got more accurate, and the next Gemini model is almost here.
๐ Weekly AI news recap: chat is out, finished files are in.
1) Gemini turns prompts into files
From one prompt, Gemini now outputs PDFs, Word docs, Excel sheets, Slides, and CSVs โ ready to download. No template, no copy-paste. Free for all users. The same week, Veo 3.1 Lite set a new floor for API video at $0.05 per second at 720p.
The catch: no direct PowerPoint export. Veo clips cap at 8 seconds. Complex formatting still needs a human pass.
2) Codex for daily work
A new "Codex for Work" page targets non-developers. Pick your role, connect your apps, and generate docs, sheets, and decks from a single conversation. Draft to finished file in one thread.
The catch: no net-new capability. Better onboarding and prompt examples get you to output faster.
3) Claude for creative work
Nine MCP connectors: Adobe Creative Cloud, Blender, Autodesk Fusion, SketchUp, Ableton, Splice, Affinity by Canva, and Resolume. Blender's connector is open source โ any LLM can use it.
The catch: connectors give access. The model still determines how far you can push a creative workflow.
The shift from chat to finished output is no longer one lab's bet. All three shipped it in the same week.
The Claude desktop app can do a lot. Knowing what to use and when shouldnโt be the hard part.
Hereโs a plain-English guide to the modes and features, plus the fundamentals they run on.
1) Chat โ you ask, Claude responds.
โ Prompt: one-off instructions
โ Project: grouped files, instructions, and chats
โ Artifact: files Claude creates in the conversation
Also in Chat:
โ Skill: instruction packs that teach Claude how to execute tasks
โ Connector: bridges to Gmail, Slack, Drive, and 200+ apps
โ Memory: context Claude remembers across chats
2) Cowork โ describe what you need, Cowork does the work.
โ Project: persistent local workspace
โ Scheduled tasks: prompts that run on autopilot
โ Dispatch: assign tasks from your phone
3) Code โ describe what you want, Code builds it.
โ Session: coding tied to a local folder
โ Routine: cloud automation that runs when your laptop is closed
โ Dispatch: assign code tasks from your phone
Every mode runs on the same foundation: prompts, projects, skills, and connectors.
The difference is what Claude does with them: Chat answers, Cowork delivers, Code builds.
View the full guide here:
https://t.co/t9hyuHCUdr
4) Gemini
Launched last week. Not all features are there yet. Automation is in Workspace Studio, coding in Antigravity. But it'll likely become a super app too.
The browser still works. The real capabilities are where the desktop app is.
Have you switched over yet, or are you still using the browser?
Stop using ChatGPT and Claude in the browser. Instead, download the desktop apps.
They're not wrappers. They're standalone apps with features the web can't offer. Each lab has its own version.
3) Codex
OpenAI's answer to Cowork and Code:
- Skills for recurring tasks and scheduled automations.
- Background computer use that doesn't take over your screen.
- Autonomous agents working independently across projects.
- 90+ plugins for tools like Jira, Slack, and Gmail.
Your AI content sounds generic because it lacks context, not better prompts.
The fix: a three-part writing system that gives AI your audience, message, and voice.
Here's how to build it:
1) Create personas
Profile your target audience. Identify their needs, frustrations, and decision drivers. Use a thinking model with deep research to ground them in real data.
2) Build value propositions
Connect benefits to each persona's interests to keep content focused on driving action.
3) Write a writing guide
Capture your tone, structure rules, and language preferences in four sections: about, tone of voice, guidelines, and writing rules.
Apply all three to your AI tool. Use projects or skills to give AI permanent access. Every new conversation starts with full context.
I built this system for my own content and cut my editing time in half.
AI starts from context, not assumptions.
๐ Weekly AI news recap: OpenAI shipped GPT-5.5 and background agents. Anthropic gave its agents memory. Google turned Deep Research into an autonomous pipeline.
1) OpenAI shipped its smartest model
GPTโ5.5 posts 82.7% on agentic coding benchmarks, has a 1Mโtoken context window, and runs at GPTโ5.4โlike latency.
Workspace agents in ChatGPT now connect to Slack and keep working after you log off. Close your laptop and wake up to a resolved thread.
Images 2.0 handles text rendering cleanly for production use.
The catch: GPTโ5.5 costs double GPTโ5.4 at $5/$30 per million tokens. Workspace agents are free until May 6, then creditโbased.
2)Anthropic gave Claude agents memory and 200+ connectors
Memory for Managed Agents is in public beta. Agents learn across sessions and write memories to exportable files. Rakuten reports 97% fewer first-pass errors.
The connector directory now includes 200 apps, including Instacart, Spotify, and Uber.
The catch: memory lives in the developer API, not as a chat-app toggle. New consumer connectors are US-only.
3) Google launched Deep Research Max
Deep Research Max, powered by Gemini 3.1 Pro, runs multi-step background research, connects to internal data sources, and drops charts straight into the write-up.
The catch: it's a developer API, not a Gemini app feature.
Three labs, same bet. Agents that keep working between sessions. The race isn't about the smartest model anymore. It's about what the agent does when you're not watching.
ChatGPT Images 2.0 turns image generation into a production tool. It creates readable text and usable layouts.
The model reasons through layout before generating. It searches the web, analyzes your uploads, and produces up to 8 coherent images at once.
The big change is text rendering. This one can render readable text consistently, while previous models couldn't. Headlines, infographic labels, barcodes, multilingual typography; all professionally aligned.
You can now prototype slides, marketing materials, and visual assets without a designer. The API supports aspect ratios from 3:1 to 1:3, and up to 2K.
The catch: paid users get better outputs. You still need a human eye before publishing, but for mockups and internal materials, the quality gap has closed.
What is going to be your first test?
"Prompting is dead" is everywhere right now. It isn't.
The prompt is still the foundation. Only the label changed. Skills, project instructions, scheduled tasks โ each still has a prompt underneath.
Stop copying prompts from the internet. Build them with your own context and purpose.
Add my prompt as instructions to a Prompt Engineer project in Claude or ChatGPT, or as instructions for a Prompt Engineer Gem in Gemini.
Describe your goal, answer its questions, refine together.
Save your finished prompts in one document and add it as knowledge. The more prompts you add, the better it builds the next one.
DM me for the full prompt.
Besides Opus 4.7, Anthropic launched Claude Design on Friday. It builds prototypes, mockups, presentations, and marketing assets through conversation.
Turn static mockups into interactive prototypes for user testing. Sketch feature flows and hand them to Claude Code. Explore diverse design directions quickly.
Go from a rough outline to an on-brand pitch deck, then export to PPTX or Canva. Create landing pages, social assets, and campaign visuals. Or build code-powered prototypes with voice, video, 3D, and built-in AI.
Figma lost 6.89% of its market value and is down nearly 50% since the start of the year. The reaction seems premature, but I haven't tested the tool yet.
The question I keep coming back to is: does this make better design faster, or just make similar design easier?
If you design for a living, what's your first reaction?
๐ Weekly AI news recap: three labs, one theme. AI stopped being a chat window in the browser.
1) Anthropic released Claude Opus 4.7
Their most capable Opus yet, Opus 4.7, handles long-running tasks rigorously, follows instructions precisely, and verifies outputs before reporting. It stays on task longer, checks its work, and flags uncertainties. It is pitched at complex engineering, deep research, and agentic workflows where a model needs to finish, not just respond.
The catch: self-verification is a capability claim, not a guarantee. It lifts the reliability floor but doesn't remove review.
2) Google put Gemini on your Mac
The Gemini app is now on Mac. You can access it from any screen with Option + Space, and share your window so it answers based on your documents, code, or data.
Gemini also received an update: Notebooks. Multimodal context, persistent memory, project organization, and sync with NotebookLM.
The catch: new surface, not a new model. The gains come from access and context, not a smarter Gemini.
3) OpenAI made Codex do (almost) everything
Codex now uses Mac apps, connects to more tools, creates images, learns from previous actions, remembers work preferences, and handles ongoing tasks. Plus GPT-Rosalind for biology and drug discovery, and GPT-5.4-Cyber for verified security pros.
The catch: Codex is one app absorbing functions from separate tools. If adoption sticks, it's the biggest assistant surface of the year.
Chat windows were wheels. Next is AI that sits beside your work, keeps context, and finishes tasks without you.
"AI is making us dumber." Maybe. More often, it's making us skip the hard part.
Most people type a request, get a response, move on. Instant answers erase the hard part that builds analytical thinking.
The fix isn't "use AI less." It's changing AI's role from oracle to interviewer.
A wizard prompt interviews you before producing anything:
- Step by step, no jumping ahead
- Max 4 questions per step
- Propose 2-4 directions when info is missing
- You react, refine, choose
AI asks, you think. The hard part stays with you โ and your analytical thinking sharpens.
We've used wizard prompts for problem statements, value propositions, desk research, and business models while building Elah.
DM me for the full prompt.
Anthropic shipped a better interface and automations that run independently. OpenAI and Google went opposite directions from each other.
1) Anthropic shipped two updates to the Claude Code desktop app
The app got a full redesign around parallel sessions with a sidebar, integrated terminal, file editor, and drag-and-drop layout.
On top of that, routines let you schedule automations, trigger them from an API, or connect them to GitHub events. They run on Anthropic's servers, not your laptop.
The catch: routines draw from your subscription limits, and the redesigned desktop app is only for Mac and Linux.
2) OpenAI released GPT-5.4-Cyber
A model with fewer refusals for verified security professionals, including binary reverse engineering. Last week, Anthropic gave Claude Mythos to a closed coalition of 40+ partners. OpenAI's counter is to open it up to thousands of verified individuals.
The catch: the superapp merging ChatGPT, Codex, and Atlas was leaked weeks ago, but there is still no release date. The enterprise pivot is loud, the product is not.
3) Google launched Skills in Chrome
Save any Gemini prompt as a reusable Skill and run it on any page with one click. This is a smart move to keep Chrome dominant and make AI a habit for general use.
The catch: English-US only for now.
Anthropic is the only one shipping work features. Google went broad. OpenAI went niche. Are they giving up on the consumer market?