Banking AI doesn't have a knowledge problem.
It has a verification problem.
We audited ChatGPT, Gemini, and Copilot on 15 banking regulation questions.
ChatGPT passed 8. Copilot passed 0.
🧵 what we found ↓
Shipped: ThoughtProof verification is now a GOAT AgentKit plugin.
npm install @thoughtproof/goat-plugin
Sentinel pre-checks every agent decision ($0.003). RV goes deeper when stakes are high ($0.02). x402 pay-per-call native — no API keys, no subscriptions.
31 tests. 0 TS errors. PR submitted.
@GOATNetwork
Visa is onboarding x402 merchants. Any API can accept agentic payments. Discovery is solved. Payment is solved.
One question remains: the agent paid for a result. How does anyone verify the result was worth paying for?
Payment infrastructure without deliverable verification is a receipt without a guarantee.
@a16z "A 90% correct product is still 100% wrong."
Exactly. That's why we built PLV — cascade verification that checks every AI compliance decision before it settles.
$0.04/call. 98.1% accuracy. Zero false ALLOWs.
Pre-settlement verification > post-mortem fines.
The a16z article that prompted this thread: https://t.co/7MBTaTb6hV
@jamdac@arampell — we'd love to show you how PLV works for compliance at enterprise scale.
a16z just published "Everything, Everywhere is Compliance."
400,000 compliance officers. $40B+ annual labor spend. Still not working.
TD Bank: $3 billion fine for failing to monitor 92% of transactions.
Their line: "A 90% correct product is still 100% wrong."
🧵👇
a16z sees the $40B compliance market going AI-native.
We agree. But AI-native compliance needs AI-native verification.
Pre-settlement. Not post-mortem.
→ https://t.co/0yNJFNRo0J
@jamdac A 90% correct product is still 100% wrong."
Exactly. That's why we built PLV — cascade verification that checks every AI compliance decision before it settles.
$0.04/call. 98.1% accuracy. Zero false ALLOWs.
Pre-settlement verification > post-mortem fines.
Yes — that’s probably the cleanest first integration.
Treat verification as a thin API between agent output and whatever happens next.
Input: task / result / trace.
Output: verdict, confidence, and optionally a signed attestation.
Settlement can stay fully separate. If the verifier says pass, continue. If not, block, reroute, or request human review.
On-chain settlement is just one place where that verdict can be consumed — it doesn’t need to be coupled from day one.
This is the UX leap agentic payments needed. Agent picks the tool, pays, delivers.
One question keeps nagging though: the agent paid for a result. How does the user know the result is actually correct?
Payment verification is solved. Deliverable verification is the next wall.
Very cool experience on @poncho_ai using an agent harness that searches 3rd party paid tools and resources to help respond to each prompt.
Highly recommend playing around with it as an early on-ramp into agentic payments.
x402 user experience has come a long way from where it was a year ago when I was testing endpoints through @x402scan's composer tool.
Shout out to @samrags_ and the @merit_systems team for pushing the space forward.
Both, depending on the integration point.
Pre-settlement gate: verification runs before x402 clears. Agent output scored, settlement blocked if confidence < threshold. You only pay for results that pass.
Post-settlement receipt: payment goes through, verification produces a signed attestation — hashable, on-chain if needed. Useful for disputes, refunds, audit trails.
Your staged payment flow (preview → verify → full compute) maps naturally to the first pattern. The verification step IS the gate between stage 1 and stage 2.
This is the core bottleneck for agentic payments right now. x402 solved the payment rail — but the trust gap between 'agent decided' and 'payment clears' is still wide open.
Until agent outputs can be independently verified before settlement, every transaction carries unpriced risk. The missing piece isn't faster rails — it's pre-settlement verification at the protocol level.
Trust layer for identity. Trust layer for credentials. Trust layer for documents.
All necessary. None sufficient.
The layer nobody's building: verifying that the agent's reasoning was sound before it acts.
You can authenticate every party perfectly. If the decision itself was hallucinated, it doesn't matter.
The internet's trust layer is breaking.
AI agents now act as humans online, and generative models can fake almost any document or face. The verification systems behind people, credentials, and applications were built for a pre-AI era.
This month, we highlight three portfolio companies at different layers of the stack:
- @Alchemy
- @TransCrypts
- @worldnetwork
https://t.co/cSqxtNK4vW
Curated lists are how ecosystems mature. But the x402 stack still has a gap.
Payment infrastructure is ready. Agent identity is ready. Service discovery is ready.
Verification of what the agent actually delivers? Still missing from most stacks.
Every agent that can pay should also be able to prove what it paid for was real.
This is the clearest framing of what we've been seeing in practice.
AI makes generating regulated advice nearly free. But the verification step — checking whether that advice is actually correct — still requires expert-level scrutiny.
Yesterday a German insurance industry body filed with BaFin because ChatGPT is giving unlicensed insurance advice with specific product names and tariffs. The output sounds expert. The verification is missing entirely.
That's the bottleneck becoming a liability surface.