Gemma 4 looks at a parking lot. Decides what to ask. Calls SAM 3.1.
"Segment all vehicles." 64 found.
"Now just the white ones." 23 found.
One model reasoning and orchestrating. One model executing.
Both running locally on a MacBook. MLX. No cloud. No API.
Gemini is free on Workshop for Max subscribers from May 12–20!
Use Gemini for apps, workflows, internal tools, dashboards, agents, image tools, and more.
Available for both existing and new Max subs.
It's weird that the US still doesn’t have a truly competitive open-source model lab.
It’s clearly not a money problem. Several neolabs have raised billions.
It’s not a compute problem. US labs have easier access to B200s/B300s than Chinese labs.
So what is the issue?
Want to win an election? Change your name.
Live data from London's #electiontresult2026 shows candidates higher in the ballot get more votes than their party colleagues in the same ward 72% of the time.
17 candidates so far that missed out on a seat:
https://t.co/UpgUuSqdmf
Prompt-to-app is table stakes.
The hard part is what comes next: backend, data, APIs, auth, services, jobs, infrastructure.
Workshop is for building real software.
@googlegemma ❤️
you can also download and use Gemma 4 models for subagents in @WorkshopAI.
A favorite setup of mine is to use Gemini 3.1 Pro for the main agent and Gemma 4 31b for subagents on my M4 Max.
Want to build with AI for free?
With local models in Workshop, you can build websites, dashboards, internal tools, workflows, prototypes, and more.
That means:
- Zero API costs
- Offline access
- Full privacy
Try local models today in Workshop Desktop.
You don’t have to use one model (or one provider!) for everything.
With Workshop, you can combine frontier and local models in the same workflow.
For example: Opus can be the main agent, and delegate specific tasks to Gemma 4 via subagents.
Better quality where it matters. Better privacy, speed, and cost where it counts.
One workflow, best model for each task.
Gemma 4 looks at a parking lot. Decides what to ask. Calls SAM 3.1.
"Segment all vehicles." 64 found.
"Now just the white ones." 23 found.
One model reasoning and orchestrating. One model executing.
Both running locally on a MacBook. MLX. No cloud. No API.
A real bottleneck in shipping projects is getting assets that feel polished enough to share.
@WorkshopAI can generate them as part of your building process, in the same conversation.
Here, I asked for a site for a baseball field next to the Charles River in Boston. Workshop planned the site, decided what images each section needed, wrote and ran the image generation prompts (using Nano Banana 2), and assembled the whole thing in one go.
Prompt was one line.
Also: this field would be pretty awesome.
Most on-device AI isn't useful for running agents beyond a demo.
But the models are no longer the bottleneck. Most people just aren't using them the right way.
We've found that a simple paradigm change takes on-device models from demo to actually useful:
Oversee on-device models with a more powerful AI model.
It's an overlooked approach, because all of the major agent harnesses are optimized to keep you spending money with their cloud inference.
With yet another update to our agent harness this week, @WorkshopAI lets you use frontier and "open frontier" models for the main agent, which can itself delegate tasks to smaller, on-device models.
It eliminates >90% of the failure modes of using on-device models for agentic tasks, and it turns on-device AI usage from a demo to a legitimate tool for daily use.
Workshop keeps getting more flexible.
We just shipped a new set of updates in Workshop, including subagents and expanded model support.
That means more flexibility in how you build, more control over cost/performance, and better ways to tackle complex workflows.
More below:
With this update, Workshop’s harness is now uniquely capable of supporting both multi-provider and multi-environment orchestration.
Want to use a frontier model as the main agent and delegate specific tasks to a local model? We’ve got you.
https://t.co/LkWtZdzDzP
Judging by my tl there is a growing gap in understanding of AI capability.
The first issue I think is around recency and tier of use. I think a lot of people tried the free tier of ChatGPT somewhere last year and allowed it to inform their views on AI a little too much. This is a group of reactions laughing at various quirks of the models, hallucinations, etc. Yes I also saw the viral videos of OpenAI's Advanced Voice mode fumbling simple queries like "should I drive or walk to the carwash". The thing is that these free and old/deprecated models don't reflect the capability in the latest round of state of the art agentic models of this year, especially OpenAI Codex and Claude Code.
But that brings me to the second issue. Even if people paid $200/month to use the state of the art models, a lot of the capabilities are relatively "peaky" in highly technical areas. Typical queries around search, writing, advice, etc. are *not* the domain that has made the most noticeable and dramatic strides in capability. Partly, this is due to the technical details of reinforcement learning and its use of verifiable rewards. But partly, it's also because these use cases are not sufficiently prioritized by the companies in their hillclimbing because they don't lead to as much $$$ value. The goldmines are elsewhere, and the focus comes along.
So that brings me to the second group of people, who *both* 1) pay for and use the state of the art frontier agentic models (OpenAI Codex / Claude Code) and 2) do so professionally in technical domains like programming, math and research. This group of people is subject to the highest amount of "AI Psychosis" because the recent improvements in these domains as of this year have been nothing short of staggering. When you hand a computer terminal to one of these models, you can now watch them melt programming problems that you'd normally expect to take days/weeks of work. It's this second group of people that assigns a much greater gravity to the capabilities, their slope, and various cyber-related repercussions.
TLDR the people in these two groups are speaking past each other. It really is simultaneously the case that OpenAI's free and I think slightly orphaned (?) "Advanced Voice Mode" will fumble the dumbest questions in your Instagram's reels and *at the same time*, OpenAI's highest-tier and paid Codex model will go off for 1 hour to coherently restructure an entire code base, or find and exploit vulnerabilities in computer systems. This part really works and has made dramatic strides because 2 properties: 1) these domains offer explicit reward functions that are verifiable meaning they are easily amenable to reinforcement learning training (e.g. unit tests passed yes or no, in contrast to writing, which is much harder to explicitly judge), but also 2) they are a lot more valuable in b2b settings, meaning that the biggest fraction of the team is focused on improving them. So here we are.
most agent harnesses are insanely inefficient with spend.
But there's low hanging fruit for dramatically improving costs without sacrificing quality:
Step one: offload as much "gruntwork" as possible to open models, while keeping the frontier models overseeing the work.
This update to the Workshop harness lets you do just that.
Select the AI model for your main agent, whether it be OpenAI, Anthropic, Gemini, or an Open Source model.
But the real game changer is that the main agent can delegate to subagents powered by whichever model you ask for.
So you can tell it things like "Use small open source models for subagents when you can, but use GPT 5.4 for code reviews"
Or you can even setup on-device models and tell the agent "use on-device models for subagent and fall back to Claude when you need to"
Or "design with gemini subagents, build with claude, and review with OpenAI"