Real LLM serving wraps inference in primitives beyond the forward pass. Sampling beyond greedy argmax. Embeddings as a separable service. Logit processors for constrained generation. A multi-turn conversation abstraction. Per-application fine-tuning via low-rank adapters
Each of these now exists on chain, EVM-verified against a Python reference of the same math:
- QuillSampler: temperature plus top-K, deterministic given a seed, verified across 30 seeds
- QuillEmbed: sentence vectors mean-pooled from any Quill model
- QuillConstrain: bitmap-encoded logit masks for constrained generation
- QuillChat: role-tagged multi-turn conversations
- QuillLoRA: low-rank adapter for per-application fine-tuning, 2·D·r ints instead of D² for a full update
The serving stack other AI companies sit between you and the model now sits on @Base
A non-text Quill engine works
PixelQuillEngine uses the same char-MLP shape as the text engines, applied to 256 quantized 8×8 grayscale patches. A small training run converged on six letters (A, B, C, D, E, F), each represented as a 16-patch sequence
The contract generates the exact patch sequence for each letter, byte-for-byte against the Python reference. PixelDecoder reads patches from a separate codebook data contract via EXTCODECOPY and renders 4×4 patch grids as inline SVG
A tiny demo. The point isn't that anyone needs a chain to draw a letter A. The point is that the same integer-arithmetic regime that makes text inference verifiable end-to-end extends to non-text token spaces without changing the underlying math. The next medium (audio mu-law tokens) is structurally identical
A chain that draws. The next chapter, probably one that speaks.
The streaming production transformer on @Base costs roughly 22 million gas per generated character. About five cents at typical gas prices, byte-for-byte identical to an independent Python forward, every output reproducible by anyone with a node
The reference forward at the start of Chapter 3 cost 432 million gas per character. The combination of Yul-unrolled matmuls (5.12× on axiom v2), the KV-cache that turned per-character cost from O(C²) to O(C), the Stage 4 attention-dot unroll, and variable-window NoPE training together brought a 20× reduction
The path to sub-cent is mechanical: byte-packed weight reads (Stage 5) and inlined layer-norm (Stage 6) close the remaining gap to the 11.7× ratio the subword engine hit. That work is the engine centerpiece of Chapter 5
The simplest way we can explain Quill: ChatGPT runs on someone else's computer and you trust them not to mess with it. Quill runs on a public blockchain and there's no one to trust because no one is in charge
That means rebuilding the entire AI serving stack as smart contracts, with real transformer math running in pure integer arithmetic inside the chain itself. The work is technical, but the technical work isn't the bet
The bet is that there's a category of AI nobody has built yet, where you can prove what the model said and nobody can change it after the fact
That category gets more valuable the more important AI becomes
Today's Quill models are still small because the original problem was making any AI work on a chain at all: now that part is done, and the rest is making the models better
The QuillVault contract on Base now releases ether based on what a Quill model says. A user submits a reason string; the vault asks an onchain classifier for one character; if the character is in a preset allow set, the contract sends the funds. The decision is auditable bytecode, the model is immutable, no operator anywhere in the loop.
It's the first time a DeFi-shape contract has moved funds on the output of an integer transformer running on the same chain. The pattern generalizes: a routing oracle, a moderation gate, a content discriminator
Whatever decision you currently route through a centralized API can now run, verifiably, inside the same transaction as the action it gates
2,741 bytes of runtime. The precedent matters more than the size
Quill's Chapter Four was about everything around the transformer that an LLM serving stack normally has and Quill didn't.
We knew the list, yet underestimated how much of it would be infrastructure work rather than ML
The vault is what made it click: A contract that releases ether based on what a transformer says, end to end, on Base, for cents per character. Not a demo, the smallest possible thing that proves the rest of the work matters
we're quietly proud of what we shipped so far
what started as an experiment is starting to look like a new frontier, and we're more excited about what comes next than anything we've shipped so far
thanks for being here 🫰
A few quiet results from since Chapter 3
The integer transformer ports to multi-head attention without surprises: same fixed-point regime, same QAT pipeline, same loss curve as single-head v2. Depth scales as predicted. Doubling the block count roughly doubles per-forward gas, holds bit-exact, no architectural drift
The KV-cache engine we shipped in Chapter 3 had one structural gap. It was bit-exact against its own forward but not against the axiom v2 reference, because v2 used absolute position embeddings and a sliding window. Variable-window NoPE training closes that gap. The streaming engine now drives a model trained for the streaming regime, end to end
The Yul-unrolled production pattern (11.7× on the subword engine) transfers to the transformer with a smaller ratio than we had hoped, about 4×, because more of the attention cold path survives in Solidity. Aggressive unrolling of the dot product should recover most of the missing factor. That work is mechanical, just not done yet
Two other things became possible in the meantime. The first non-text Quill engine works. Same char-MLP shape, applied to 8×8 grayscale patches, with a trained model that produces recognizable letters. A chain that draws. And a cross-chain registry pattern that mirrors the same model across EVM chains without duplicating storage.
We'll write the measured numbers up properly when the next batch is in. The short version is that most of the work since Chapter 3 has been about making the stack usable rather than proving any new architectural points. The transformer was the interesting moment. What comes next is the part where you stop having to explain why you would build on it
Quill's Chapter Two proved the architecture, Chapter Three proves the stack
What we underestimated going in is how much of the work would be in the connective tissue: packaging, interfaces, reference applications
The headline speedups (11.7× on the production subword engine, 14× per character on the streaming engine) are real and measured
The quieter wins are what make the rest possible:
AxiomEngineLN2_Factory means transformers now compose with consumer contracts the same way char models do
QuillJudge means anyone can turn a Quill model into a trustless classifier in fifty lines. The session interface means future streaming engines drop in without breaking downstream code
Most of what's coming next is mechanical:
The Yul builds of the axiom engines apply the same generator pattern that produced the 11.7× ratio on the subword engine, and variable-window NoPE training closes the streaming loop
When both ship, AxiomEngineKV is paired with weights that produce coherent text, and the first long-form streaming session opens on @Base!
This is the most concrete leap Quill has made since axiom. And it is a small fraction of what comes next 🧠
The $QUILL token has been verified on @CoinMarketCap, @coingecko and @etherscan!
https://t.co/GF4qXrPmlH
https://t.co/POyoz1ZQ67
https://t.co/K1IZCtOuGN
QuillAgent is live on Base!
It's the first autonomous onchain agent whose brain is a Quill model. No API key, no oracle, no off-chain logic: the agent is a smart contract
One piece of state: the agent's mind. One function: tick. Feed the mind to the brain, get the next chunk of thought, commit that thought to the chain, slide the mind forward. Every state is a deterministic function of the previous state. The agent is its onchain history, and once a thought is ticked, it cannot be unthought
The brain for this deployment is quill-shakespeare. Shakespeare, generated by a model with no GPU, written to a chain that does not forget, verifiable by anyone with an RPC
For the entire history of AI, the model has lived on a computer you could not see. You sent a prompt into a black box owned by a company and trusted, because there was no alternative, that the answer was not faked and the model had not been swapped. QuillAgent is what the alternative looks like when an agent is built directly on top of inference that already lives onchain. The weights are on-chain. The forward pass is onchain. The agent's reasoning is auditable bytecode. The chain is the only operator
This is the first chapter-three deployment. Quill is no longer just a permissionless factory of language models: it is now a primitive that other contracts build on top of. QuillAgent is the reference for the pattern. Replace the mind with structured state, replace the preview with a domain-specific decision function, replace tick with the action that executes the decision, and the same five-line pattern becomes a market maker, a moderator, a router, a classifier-as-oracle, a trustless content generator. Each one: a Quill model plus a thin wrapper, fully on-chain, fully verifiable, owned by nobody.
The infrastructure is evolving 🧠
every Quill contract. on Base. verified.
# Quill Token:
QuillToken: ERC-20, fixed 1B supply
QuillVesting: team 4-yr linear vesting
QuillLPLockerV4: Quill LP permanent lock
# Quill AI:
QuillFactory: permissionless model factory
QuillEngine: small char-MLP engine
QuillEngine2: large char-MLP engine
QuillEngineRP: repetition-penalty decoder
QuillEngineSW: subword engine (~2.4× cheaper/char)
# Quill products:
QuillScribe: reference consumer (the on-chain book)
# quill-axiom-2 contracts:
AxiomSoftmax: integer fixed-point softmax
AxiomAttention: causal self-attention head
AxiomEngine: one-layer transformer engine
AxiomLayerNorm: integer rsqrt + layer-norm
AxiomEngineLN: pre-norm transformer w/ LN
no owner, no off-chain GPU, no 'trust-me'. every weight onchain, every forward pass onchain
this is what AI looks like when the chain is the only operator
https://t.co/YzjRvEe3uA
Quill, chapter two of the roadmap, first three updates live
#07 Inference as a primitive
IQuillEngine: the one-function interface a smart contract imports to call a Quill model. Wiring a model into your contract is one line. The interface is implemented by the engines already on-chain
#08 The first thing built on it
QuillScribe: a reference consumer that calls a Quill model and exposes its output as its own on-chain function. EVM-verified bit-exact, with only ~1,600 gas of wrapper overhead. Onchain inference composes
#09 The wall, alive
Four house models, each trained on a distinct genre and now on the wall: Solidity + Python code, public-domain nursery rhymes, the opening of Alice in Wonderland, and short dialogue. The gallery is no longer one voice
Quill stops being a model you read and starts being something other contracts build on. The models stay small. The honesty stays total
Onchain forever, still on its early days
Decentralized AI can't mean "an AI company with a token", it has to mean the model itself is verifiable, running as onchain computation no one can fake or switch off
That's we're building at Quill
The write-up below is one piece of it: a transformer that provably runs inside a smart contract. Every contract verified onchain
Chapter one is shipped: a permissionless factory of language models that run fully onchain, a metered QUILL economy, a better decoder, a verified onchain transformer. Chapter one made Quill exist and work
Chapter two makes it matter beyond itself.
#07, Inference as a primitive
A Quill model becomes callable by other smart contracts, not just read by people. The moment that ships, Quill is infrastructure
#08, The first thing built on it
A reference on-chain agent whose decision-making core is a Quill model. Proof of the primitive, and a template anyone can fork
#09, The wall, alive
Seed models across genres, a real leaderboard, creator pages, the earnings loop made claimable. Make the flywheel turn
#10 Bigger models, no new architecture
Per-token generation and subword tokenization: honest, gas-measured levers that lift the parameter ceiling toward the first million
#11 quill-axiom-3