Beer makers spend part of the year brewing small quantities of high-octane beer with as much as 30% alcohol. It’s as scary to make as it is to drink. https://t.co/SdTx3cLubI via @WSJ
As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently:
"openai/gpt-5.1",
"google/gemini-3-pro-preview",
"anthropic/claude-sonnet-4.5",
"x-ai/grok-4",
Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response.
It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses.
Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain.
That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored.
I pushed the vibe coded app to
https://t.co/EZyOqwXd2k
if others would like to play. ty nano banana pro for fun header image for the repo
@BrendanFoody@felicis@benchmark@generalcatalyst Well done mate... if you are looking to scale your Inference then check out https://t.co/D4MJ4kxr2f -- orthogonal approach to scaling frontier models.
@Electron_Cowboy Well done. Have you seen what we have done with 1/100 of the capital you have raised? Launched an Inference Cloud on ASIC based technology that can run in an existing 2018 data centres; supporting all the major open models -- check out https://t.co/2N0hqGbsMY