“The future of high-stakes work is not AI replacing judgment. It is AI making judgment scalable, auditable, and continuously improvable”
Better, faster core compliance workflows. From the great @dorvonlevi and team making the whole industry safer.
Building an AI-native @Coinbase means rebuilding everything, especially the hardest parts. We've put a lot of time into redefining compliance, where the stakes are incredibly high, and we have to be extremely thoughtful about implementation.
We have invested heavily in rebuilding our compliance ops around AI with that reality as our starting constraint, not an afterthought. Here is an overview of what we've learned and what we built.
Most people assume compliance work is mostly checking whether a name appears on a sanctions list. That is the easy 5%. The other 95% is interpretive judgment under uncertainty: a customer claims their wealth came from real estate. Do the property records actually support it? Does the timeline hold? Is the documentation legitimate, or does it feel too polished? You need compliance staff and investigators who understand what “suspicious” actually looks like in context.
That's part of why compliance is so hard to automate—and so expensive.
The first obvious AI approach is to hand the model the existing procedures and ask it to run them faster. That approach misunderstands what procedures are for. Good procedures are not bad investigations; they are deliberately incomplete investigations. Their job is to create consistency, auditability, and a minimum standard across thousands of cases. They excel at saying what must happen. They are far worse at capturing everything a strong analyst actually notices: which sources they trust, when they widen the search, when a document feels off, when an explanation technically fits but still does not feel earned.
Procedures also carry the shape of the old operating model: fragmented systems, time pressure, queue pressure, and the hard limit of how much one human analyst can read, cross-reference, and hold in working memory at once. That is not a flaw in the procedure. It is how you design a process for humans.
AI changes the constraint set. Reading, searching, comparing documents, and tracing inconsistencies no longer have to be treated as scarce analyst time. Done carefully, with proper controls and human review, models can explore more context, test more hypotheses, and surface more inconsistencies than any single analyst could reasonably do case by case.
So if you simply automate the procedure exactly as written, you may gain efficiency. You will not unlock the full value of AI. You will just make the old bottleneck run faster.
The better question is not “Can AI follow the analyst playbook?”
It is: once the cost of reading, cross-referencing, and testing hypotheses collapses, what should the investigation become?
A second tempting approach: feed it historical Suspicious Activity Reports (SARs) and let it learn from outcomes. This breaks down too. You rarely have the full state of what the analyst actually saw during the investigation. A case that looks straightforward today might only look that way because information surfaced later. A fraud indictment that didn't exist when the original analyst made the call, news articles that hadn't been published yet. Hindsight can contaminate your training data. Also, regulators themselves acknowledge that SAR decisions can be subjective.
The architecture has four layers. The first is data: continuously enhancing the coverage, quality, and architecture of the signals the system depends on. The second is classical machine learning models that cluster and classify alerts to determine what type of investigation needs to run. The third is the investigation agent itself: a multi-agent system that orchestrates specialized agents to execute the investigation end to end. The fourth is a safety filter that runs independently of typology, ensuring no risk vector is missed regardless of how the alert is classified. Each layer is independently auditable and learns from the feedback provided by human reviewers.
Inside the investigation agent, specialized sub-agents run across the full case surface: alert context, customer and identity signals, access patterns, risk indicators, transaction behavior, source-of-funds, onchain activity, and public adverse media. Each writes its findings into a shared case memory. A coordinator agent reconciles and challenges them. When sub-agents disagree, such as when source-of-funds marks activity as “explained” while adverse media surfaces a recent indictment, the coordinator attempts to resolve these disagreements knowing the common patterns. The narrative agent prepares the final report with all collected evidence and suggested resolution. The last self-validation agent acts as a guardrail: if the system cannot support its conclusion with sufficient confidence or data quality, the case is routed to manual investigation instead of being surfaced as an automated result.
Before any of this touched a real customer case, we built what we call a “Golden Set” - historical cases with known right answers. "Known right answers" in compliance is harder than it sounds. It meant re-investigating old cases, getting multiple senior analysts to independently agree on what the right call would have been, then debating the disagreements until consensus. Months of work before we could even start measuring.
Here's an important part (for now) - cases currently get BOTH the AI's full investigation AND a senior human review. We didn't reduce scrutiny, in fact, we added more of it until it no longer proves valuable. Cases resolve significantly faster AND get more eyes than they ever did before. Every human correction feeds back into the model as a training signal. It gets better because it's wrong in front of people who know how to fix it.
None of this would have shipped without clearing structural blockers most financial institutions are still stuck on. Security and privacy sign-off to send customer data to LLMs at all. Senior compliance officer alignment on AI-assisted human decision making. Model Governance team embedded since December - they observed the entire Golden-Set Evaluation process and are running a formal validation review with our Internal Audit team now.
Today this handles roughly 55% of our US fraud case volume with significantly less analyst time per case. Time freed goes to the harder cases AI can't yet handle - and to teaching it.
Our internal compliance and quality teams are the ones who are building this system with the engineers, training it, validating it, and continuing to shape how it improves. In the process, they've developed skills that are incredibly valuable: how to design evals, how to think about model bias, how to think about human bias, how to architect human-in-the-loop systems, skills that are becoming among the most valuable at any company.
This entire project started ~6 months ago with a whiteboarding session between @galpa42 and I, and was built by an AI-pilled cross-functional and it’s just the first pod - there's a multi-month roadmap,rebuilding compliance from the ground up with AI. Huge thanks to everyone involved and congratulations to @galpa42 for shipping two babies to production this month :)
The future of high-stakes work is not AI replacing judgment. It is AI making judgment scalable, auditable, and continuously improvable.
There are definitely areas we'll improve here.
Our spot exchange lives in a single zone (see link) to optimize for low latency. We can typically fail over faster to a warm standby in another zone, and data is stored durably for DR.
This outage was particularly bad though, and we saw managed service failures impact multiple zones. We're resilient to that, but not automatically available. Those recoveries take us longer.
We posted more details earlier today, but will share a full RCA after we've had more time to investigate.
Happy to walk you through what happened if you want to talk live. Big fan of @Pragmatic_Eng!!
https://t.co/HMnxKz1HHB
Yesterday @coinbase experienced a multi-hour service disruption affecting trading, exchange access, and balance updates. Here's our initial read from Coinbase engineering on what happened, how we recovered, and what we're addressing.
At approximately 23:50 UTC on 2026-05-07, our monitoring detected cascading quote failures from internal services that triggered multiple Sev1 incidents that engineering immediately began investigating. Customer-facing impacts included spot trading, Prime, International and derivative exchanges.
Root cause: a thermal event (cooling system failure) inside a subset of racks within a single building in AWS us-east-1. We run a primary replica of our exchange infrastructure in a single zone, consistent with industry standards to reduce latency. To prepare for failures like this, we maintain a distributed standby, but during this incident, failures in the primary zone that were designed to be isolated were not, extending the duration of our outage.
The failure cascaded down two paths:
1. Multiple hardware components beneath our exchange’s matching engine failed, requiring recovery and failover
2. Distributed Kafka clusters that manage messaging across Coinbase systems failed to remain available, also requiring partition failovers to new hardware brokers with many TiBs of data
After isolating the incident: automated tooling drained ~10 Kubernetes clusters worth of related workloads out of the affected zone to stabilize internal services. Most services were back to normal within ~30 minutes of diagnosis. The two things we couldn't automatically drain: the exchange (dedicated hardware and storage) and Kafka (managed service that was designed to be resilient to this, with unique problems).
The exchange matching engine is the core system responsible for processing orders and maintaining order books. It is a distributed cluster and requires quorum to safely elect a leader and continue processing trading activity. During the incident, infrastructure-level constraints in the affected datacenter left only a subset of nodes healthy, preventing the cluster from reaching quorum. As a result, trading across Retail, Advanced, and Institutional exchanges were blocked.
Recovery required our oncall and engineering teams to execute our disaster recovery plan, restore quorum safely, and validate system health under constrained infrastructure conditions. The team built, tested, deployed, and validated the fix while continuing to manage the broader incident.
Kafka recovery was a much larger scale operation. Our primary managed Kafka partitions process many terabytes of data daily and are designed with resiliency guarantees for uninterrupted operation during a datacenter failure just like this. In this case, those guarantees failed and required manual recovery.
We again relied on disaster recovery procedures to recover stuck partitions onto new hardware (brokers) that enabled us to safely bring x-service messaging back online across Coinbase. During the lag, customers saw delayed balance streams which resolved automatically once replication caught up. No data lost.
Once the engine came back up as part of our standard runbooks, we re-opened markets carefully: all products to cancel-only mode first, audited product states, then moved all markets to auction mode, before restoring trading on Coinbase Exchange.
What went right: the team. Incident response across the company came together within minutes, followed well-rehearsed playbooks and used secure automation tooling to recover all services. We have a strong, senior team at Coinbase that worked through rare failure modes to recover all services.
To our customers: losing access to your account, even temporarily, is unacceptable. We know that. We're sorry, and we’ll publish a full root cause analysis in the coming weeks 🙏
Can confirm NASA grade QA standards.
I spent 5 years working at @NASAJPL before becoming responsible for infra, security and now eng at Coinbase 12 years ago.
Many of our eng + security + quality standards are modeled after, or better than what I grew up on there.
https://t.co/IIsuZTTdfx
@seslly@coinbase We’ll share more in our full RCA, but we had an appropriate RF to survive a zone outage. The way this hardware failed triggered a bug in the managed cluster that still took the cluster down, which we had to work around to recover with the vendor.
We are entering a golden age of cybersecurity.
Even small blue teams are starting to drive the cost per exploit up exponentially.
We’re not there yet, but the end state is clear and good for the good guys.
My AI coach gave me a B- this week.
Every week, agents check in on my digital life. They send back 4 things:
1. What I'm missing
2. What's changing
3. What's going well
4. What's not
All of this data is sitting there:
- iMessageDB
- ScreentimeDB
- EightSleep API (eightctl)
- Oura MCP
- Google Workspace MCP
- and more
Everyone will have a world class executive coach (agent) this year and it's going to change your life.
In the last 12 months, we’ve seen a 27x increase in non-engineers using dev tools like Claude, OpenCode and Cursor to build & automate how we work.
The goal is to turn everyone into a builder, and safely reduce the distance between idea → execution to near zero.
Trust is our most important asset at @coinbase, so this is fueled by a massive effort in quality, guardrails and simplification.
These will blow you away because Fred & Balaji are wired into Coinbase: github, drive, linear, slack & more.
Fred for the expert strategy, Balaji for challenging assumptions. A 10x team.
Wait until you see all the subagents + new capabilities we're wiring in now. #BestPlaceToBuild
Coinbase is testing AI agents that show up in slack/email at work, just like any human teammate. To start we're shipping two which are modeled after legendary former Coinbase employees, @FEhrsam and @balajis. (Who brutally frame mogged who in this matchup?)
Soon, it will be easy for any employee to spin up a new agent for themselves or their team. I suspect we will have more agents than human employees at some point soon.
The last autobiography I read had a style + verbosity that were awful, but had an important first-person view I wanted to understand.
So I had an agent rewrite and repackage the book for my kindle without the fluff, and more relevant historical context.
Just finished the book and loved it. 2026 is awesome.