I've been thinking about art (and "not art") for a long, long time. I finally started figuring out how to put some of those thoughts into words this year. Here's a first stab at it: https://t.co/nMTTgZw9So
@AnthropicAI Big shoutout to the @Zapier team that built this:
Robin Salimans, Lukas Bergstrom, Jake Talgard, Michael Haarala, Daniel Shepard, and everyone else who helped make this happen.
AutomationBench tests how models perform on the trickiest, stickiest real-world workflows we know customers are actually trying to automate. 600 tasks, 6 domains, deterministic scoring.
And today our scores are featured on @AnthropicAI's official launch scorecard.
We built an AI benchmark that measures real work.
Today we're releasing it to everyone.
AI evals tell you whether a model can do complex reasoning or generate code. Useful, but usually not the question our customers ask. They want to know: can this model find the right CRM record, send the right follow-up, and not break anything along the way?
We went looking for a benchmark that tested that. Nobody had built one, so we did.
@Zapier’s AutomationBench drops AI models into realistic business environments across six domains (Sales, Marketing, Ops, Support, Finance, HR) and checks whether the work actually got done.
The tasks include live CRM data, inbox threads with ambiguous context, and multi-step tool chains where one wrong call cascades.
Scoring is deterministic: either the right records were updated and the right messages were sent, or they weren't.
It’s useful enough that we're releasing it publicly today. Open task set, open methodology, open leaderboard. Everyone should have access to this.
No model has cracked 10%. Yet.
Try it here: https://t.co/V7qHAGX7Ql
@lugg Thank you. Your drivers Jessiah and Ricco in SF yesterday (black pickup truck) came by to move some large items. It’s possible they picked it up by accident and still have it in their truck? I am willing to pay a high reward $$$ for the safe return of the sentimental items inside
SOS… my husband’s backpack was stolen while Lugg was moving an item from my house yesterday. The backpack contains incredibly sentimental items including my child’s first baby teeth. It’s possible the luggers stole it? I need someone to reach out to me ASAP. Will pay $$$
@lugg
@The_Coolector … any chance you know how to buy one of these any more? My husband’s backpack was stolen yesterday and it had his favorite jacket in it… this one 😭😭
I’ve been looking on eBay and poshmark, but no luck.
https://t.co/Z103umRmLn
SOS… my husband’s backpack was stolen while Lugg was moving an item from my house yesterday. The backpack contains incredibly sentimental items including my child’s first baby teeth. It’s possible the luggers stole it? I need someone to reach out to me ASAP. Will pay $$$
@lugg
Been a hot second, but I wrote a post this morning about a fun little framework I've been developing at the intersection of workplace productivity and personal growth
Using the ‘The Four Floors of Feeling’ to Give Feedback at Work https://t.co/IrUQzC2r4i
next Thursday in SF!
Come hear from @dballona@TweetAnnaMarie and I about how the way people use MCP and tools is changing. Night and day different than 5 months ago…
oh also the 'many hours/day chatgpt session lengths' might seem odd now, but give it a bit. the companion part will be big here, but the music+video+hardware over the next few years should be able to complete it, things take time
All eyes are on OpenAI DevDay.
Agent Builder was just announced: a new way to design AI-powered workflows right inside OpenAI. But it ships with only a few native integrations, and most businesses run on hundreds of tools.
That’s where Zapier MCP comes in. It instantly connects Agent Builder to Zapier's ecosystem of 8,000+ apps and 30,000+ actions.
Imagine this: an OpenAI Agent analyzes campaign performance data and, through Zapier MCP, updates budgets in Google Ads and syncs new leads to HubSpot.
Together, OpenAI’s Agent Builder and Zapier’s automation layer unlock the next wave of AI-native operations: intelligent logic meets real-world connectivity.
What you get:
- Production-ready connectors maintained by Zapier
- Secure, auditable calls from your agent to the tools your customers already use
- Faster time-to-value and fewer integration backlogs
Try Agent Builder with Zapier MCP today: https://t.co/yHNPiQr5AQ
I had the chance to catch @rafalwilinski & @vitorbal’s eval talk and it was 🔥🔥
Love to see all the eval goodness y’all have been cookin’ turn into great content for other agent builders to learn from.
Congrats to @aiDotEngineer 2025 Best Speakers!
MCP: @zeeg
Tiny Teams: @alxai_
LLM Recsys: @devanshtandon_
GraphRAG: @danielchalef
Fortune 500 Day 1: @hwchase17
Architects Day 1: @denyslinkov
Infra: @dylan522p
Voice: @bnicholehopkins
Product Management: @bbalfour
Agent Reliability: @itamar_mar
SWE Agents: @bcherny
Reasoning: @natolambert
Evals: @rafalwilinski@vitorbal
Retrieval+Search: @WilliamBryk
Fortune 500 Day 2: @ritakozlov
Architects Day 2: @pk_iv
Security: @renebrandel
Design Engineering: @JohnPhamous
Generative Media: @sharifshameem
Autonomy+Robotics: @nikhilabm
Online Track: @MrAhmadAwais
Overall Best Speaker: @simonw!
For each track's best speakers we actually have a photo plaque printed for each of you with your speaker photo. come collect from me this weekend if you are still in town!
thanks to ALL our keynote, breakout, expo, workshop, attendee and more speakers for generously sharing their knowledge and working on the best AIE talks we've ever had! We appreciate you and are working to get them all edited and up online ASAP.