As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently:
"openai/gpt-5.1",
"google/gemini-3-pro-preview",
"anthropic/claude-sonnet-4.5",
"x-ai/grok-4",
Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response.
It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses.
Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain.
That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored.
I pushed the vibe coded app to
https://t.co/EZyOqwXd2k
if others would like to play. ty nano banana pro for fun header image for the repo
> be you
> want to actually learn how LLMs work
> sick of βjust start with linear algebra and come back in 5 yearsβ
> decide to build my own roadmap
> no fluff. no detours. no 200-hour generic ML playlists
> just the stuff that actually gets you from βwhatβs a token?β to βI trained a mini-GPT with LoRA adapters and FlashAttentionβ
> goal: build, fine-tune, and ship LLMs
> not vibe with them. not "learn the theory" forever
> build them
> you will:
> > build an autograd engine from scratch
> > write a mini-GPT from scratch
> > implement LoRA and fine-tune a model on real data
> > hate CUDA at least once
> > cry
> > keep going
> 5 phases
> if you already know something? skip
> if you're lost? rewatch
> if youβre stuck? use DeepResearch
> this is a roadmap, not a leash
> by the end: you either built the thing or you didnβt
> phase 0: foundations
> > if matrix multiplication is scary, youβre not ready yet
> > watch 3Blue1Brownβs linear algebra series
> > MIT 18.06 with Strang, yes, heβs still the GOAT
> > code Micrograd from scratch (Karpathy)
> > train a mini-MLP on MNIST
> > no frameworks, no shortcuts, no mercy
> phase 1: transformers
> > the name is scary
> > itβs just stacked matrix multiplies and attention blocks
> > Jay Alammar + 3Blue1Brown for the βahaβ
> > Stanford CS224N for the theory
> > read "Attention Is All You Need" only AFTER building mental models
> > Karpathy's "Let's Build GPT" will break your brain in a good way
> > project: build a decoder-only GPT from scratch
> > bonus: swap tokenizers, try BPE/SentencePiece
> phase 2: scaling
> > LLMs got good by scaling, not magic
> > Kaplan paper -> Chinchilla paper
> > learn Data, Tensor, Pipeline parallelism
> > spin up multi-GPU jobs using HuggingFace Accelerate
> > run into VRAM issues
> > fix them
> > welcome to real training hell
> phase 3: alignment & fine-tuning
> > RLHF: OpenAI blog -> Ouyang paper
> > SFT -> reward model -> PPO (donβt get lost here)
> > Anthropic's Constitutional AI = smart constraints
> > LoRA/QLoRA: read, implement, inject into HuggingFace models
> > fine-tune on real data
> > project: fine-tune gpt2 or distilbert with your own adapters
> > not toy examples. real use cases or bust
> phase 4: production
> this is the part people skip to, but you earned it
> inference optimization: FlashAttention, quantization, sub-second latency
> read the paper, test with quantized models
> resources:
> math/coding:
> > 3Blue1Brown, MIT 18.06, Goodfellowβs book
> PyTorch:
> > Karpathy, Zero to Mastery
> > transformers:
> > Alammar, Karpathy, CS224N, Vaswani et al
> > scaling:
> > Kaplan, Chinchilla, HuggingFace Accelerate
> > alignment:
> > OpenAI, Anthropic, LoRA, QLoRA
> > inference:
> > FlashAttention
> the endgame:
> > understand how these models actually work
> > see through hype
> > ignore LinkedIn noise
> > build tooling
> > train real stuff
> > ship your own stack
> > look at a paper and think βyeah I get itβ
> > build your own AI assistant, infra, whatever
> make it all the way through?
> ship something real?
> DM me.
> I wanna see what you built.
> happy hacking.