LLMs now make critical decisions in hospitals, defense, banks, and governments. Yet nobody can verify which model actually ran, or whether the output was tampered with. A provider or middleman can swap weights, silently requantize the model, alter decoding, inject hidden prompts, do supply chain attacks, or change the deployment surface without the user knowing.
This problem is already serious. It will become critical.
We think this needs a practical solution, not just a theoretically clean one. CommitLLM is designed to be deployable on existing serving stacks now: the provider keeps the normal GPU serving path, does not need a proving circuit, does not need a kernel rewrite, and does not generate a heavy proof for every response.
In practice, two families of approaches dominated the conversation before this work: fingerprinting, which can be gamed, and proof-based systems, which are theoretically strong but too expensive for production inference.
We built CommitLLM to target the middle ground.
The core idea is to keep the verification discipline of proof systems, but specialize it to open weight LLM inference. The cryptographic core is simple: Freivalds style randomized checks for the large linear layers, plus Merkle commitments for the traced execution. Then a lot of engineering work is needed to make that line up with real GPU inference.
The key trick is this. A provider claims `z = W × x` for a massive weight matrix. Normally you would verify that by redoing the multiply. Instead, the verifier samples a secret random vector `r`, precomputes `v = rᵀ × W`, and later checks whether `v · x = rᵀ · z`. Two dot products instead of a full matrix multiply. In the current implementation, a wrong result passes with probability at most `1 / (2^32 - 5)` per check.
A full matrix multiply, audited with two dot products.
Most of the transformer can then be checked exactly or canonically from committed openings. Nonlinear operations such as activations and layer norms are canonically re executed by the CPU verifier. The one honest caveat is attention: native FP16/BF16 attention is not bit reproducible across hardware. CommitLLM verifies the shell around attention exactly, then independently replays attention and checks that the committed post attention output stays within a measured INT8 corridor. So attention is bounded and audited, not proved exactly.
That means the protocol already gives very strong exact guarantees on the parts that matter operationally most. If an audited response used the wrong model, the wrong quantization/configuration, or a tampered input/deployment surface, the audit catches that exactly. That includes things like model swaps, silent requantization, and provider side prompt or system prompt injection.
Today the implementation and measurements are strongest on Qwen and Llama. But the protocol itself is not meant to be Qwen or Llama specific: we expect it to generalize across open weight decoder only families. What still has to be done is the engineering work to integrate and validate more families explicitly, and we are already working on that.
On the measured path, online generation overhead is about 12 to 14% with the provider staying on the normal GPU serving path. The heavier receipt finalization cost is separate and can be deferred off the user facing path. The main systems costs are RAM and bandwidth, not proof generation.
The full response is always committed, but only a random fraction of responses are opened for audit. Individual audits are much larger, roughly 4 MB to 100 MB depending on audit depth. The important number is the amortized one: under a reasonable audit policy, the added bandwidth averages to roughly 300 KB per response.
After too many weeks without sleep, I’m proud to show what I built with @diego_aligned: CommitLLM. Thanks Diego for your patience. I've been calling you at random hours.
The code and paper still need some cleaning and formalization. We’re already in talks with multiple providers and teams that have cryptography related ideas on how to improve it even more. We’re really excited about this and we will continue doubling down on building products in AI, cryptography and security with my company @class_lambda.
If governments, hospitals, defense and financial systems are going to run on LLMs, verifiable inference is not optional. It is infrastructure.
I will be explaining this in more details in the days to come and I will show how to test it and run it.
Coinbase just gave agents wallets. But wallets don't solve the real problem.
AI agents will execute millions of micro-actions per day. No blockchain, no matter how fast, can handle a separate on-chain transaction for every single one.
We built @0x4Mica. Agents transact instantly with cryptographic guarantees, and settlement happens later in batches. Thousands of micro-payments, one on-chain transaction.
In this demo, I'm streaming a video, and each data chunk is paid for in real-time. Watch what happens when I switch from 4Mica credit to raw on-chain payments, and then ask to be switched back.
Same stream. Same operator. Completely different experience.
In Belgium and interested in building on web3 x AI? Join us and @waibsummit on the epic hackathon we're helping in organising between 14/11 - 16/11!
Or just join online!
Link: https://t.co/p3HJkPfOUI
Want to learn more about @0x4Mica?
Check out the recording of the last Aligned Guests stream where @mysterymeat hosted 4Mica co-founder @AkashMVerma1 to hear what they're building and how it can improve Aligned.
5/
https://t.co/KqK3leSKZ8
It was a rewarding experience collaborating directly with the architect on the design of the new @3miLabs office space. Happy to see our vision come to life. If you are in Leuven, I invite you to stop by for a good discussion and a drink on us!
“There are scientists who care more about being cited in @Nature than about producing reproducible science.”
That one line summed up the urgent need for a scientific reset.
In the Open & Decentralised Science panel at Proof of Talk, Patrick Joyce (@joycesticks) -Co-Founder, @ResearchHub, Aldo de Pape (@aldodepape) -Co-Founder & CEO, @genomesdao, Aaron Weaver (@RealAaronWeaver) - CCO, @Molecule_dao sat down for a conversation moderated by Alice Liu (@AliceCrypto3) -Head of Research, @CoinMarketCap.
The discussion tackled the systemic flaws in traditional science: from broken incentives to the exploitation of personal health data.
💡 133 million people had their healthcare data compromised in 2023 alone.
💡 50–85% of published science can’t be replicated.
💡 Most innovation ignores long-term health, chronic illness, and mental well being.
The message was clear: we need transparent, decentralized systems to fix the very foundations of scientific research and healthcare.
Watch the full conversation: https://t.co/ooSw8VRYoP
More panels coming soon! Stay tuned
🚀 Join us for a Super Special Online Event! 🤖
🎟 RSVP: https://t.co/7WxZKlnyVk
This time, the @Web3_Devs community is collaborating with @coinbase and hosting the legendary @LindellYehuda, Head of Cryptography at @coinbase, for a deep dive on their open-source MPC engine that secures real assets in production — and learn how to use it!
🗓 17:00 IDT (Israel Time) | 10:00 EDT | 16:00 CET
📅 Subscribe to Calendar: https://t.co/vZULX1wZuc
#Web3 #MPC #CryptoSecurity #Coinbase
We believe launching an Ethereum rollup should be as simple as deploying a web app.
Our new Rollup-as-a-Service (RaaS) platform will make that a reality with one-click ZK-rollup deployment.
Built with ethrex from @class_lambda and our integrated ZK infra stack. Learn more ⬇️