Ataago

@Ataago7

Home

Joined May 2015

53 Following

23 Followers

41 Posts

Ataago @Ataago7

6 months ago

Just vibe coded LLM Parliament 🏛️ Adversarial AI debate platform: Proponent, Critic & Moderator battle in real-time. Inspo: @karpathy's LLM Council. Structured debate + auto-moderation + fact-checking via #FastMCP. 🍿 🔗 https://t.co/wIDRJ2sNJq #AI #LLM #LangGraph #MLFlow

Ataago7's tweet photo. Just vibe coded LLM Parliament 🏛️

Adversarial AI debate platform: Proponent, Critic & Moderator battle in real-time.

Inspo: @karpathy's LLM Council. Structured debate + auto-moderation + fact-checking via #FastMCP. 🍿

🔗 https://t.co/wIDRJ2sNJq

#AI #LLM #LangGraph #MLFlow https://t.co/AzeiuYQLQN

Andrej Karpathy

7 months ago

As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently: "openai/gpt-5.1", "google/gemini-3-pro-preview", "anthropic/claude-sonnet-4.5", "x-ai/grok-4", Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response. It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses. Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain. That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored. I pushed the vibe coded app to https://t.co/EZyOqwXd2k if others would like to play. ty nano banana pro for fun header image for the repo

karpathy's tweet photo. As a fun Saturday vibe code project and following up on this tweet earlier, I hacked up an **llm-council** web app. It looks exactly like ChatGPT except each user query is 1) dispatched to multiple models on your council using OpenRouter, e.g. currently:

"openai/gpt-5.1",
"google/gemini-3-pro-preview",
"anthropic/claude-sonnet-4.5",
"x-ai/grok-4",

Then 2) all models get to see each other's (anonymized) responses and they review and rank them, and then 3) a "Chairman LLM" gets all of that as context and produces the final response.

It's interesting to see the results from multiple models side by side on the same query, and even more amusingly, to read through their evaluation and ranking of each other's responses.

Quite often, the models are surprisingly willing to select another LLM's response as superior to their own, making this an interesting model evaluation strategy more generally. For example, reading book chapters together with my LLM Council today, the models consistently praise GPT 5.1 as the best and most insightful model, and consistently select Claude as the worst model, with the other models floating in between. But I'm not 100% convinced this aligns with my own qualitative assessment. For example, qualitatively I find GPT 5.1 a little too wordy and sprawled and Gemini 3 a bit more condensed and processed. Claude is too terse in this domain.

That said, there's probably a whole design space of the data flow of your LLM council. The construction of LLM ensembles seems under-explored.

I pushed the vibe coded app to
https://t.co/EZyOqwXd2k
if others would like to play. ty nano banana pro for fun header image for the repo

905

17K

1K

13K

5M

0

0

0

0

27

Ataago7 retweeted

Adrian Dittmann

@AdrianDittmann

almost 2 years ago

It's just a lot of pressure ...

158

2K

164

440

636K

Ataago @Ataago7

over 2 years ago

@RockstarGames When is Trailer 2 coming? Where is Jason?

0

1

0

0

7

Ataago @Ataago7

over 2 years ago

@RockstarGames @ashskyqueen @ashskyqueen this video is atleast worth a million views. (That is minimum). Incredible video. Thanks ❤️ for such amazing efforts. 👏

1

4

0

0

73

Who to follow

Lᴀᴛʜᴇᴇғ Aʙᴅᴜʟ

@LatheefAbdul13

Advaith Hyundai

@AdvaithHyundai

Official tweets of Advaith Hyundai - World's largest and Most Awarded Hyundai Dealers.

Wع 🇵🇹🇧🇷

Ataago @Ataago7

almost 4 years ago

@shakteed @gokulr Engineering productivity is measured at a team level by tracking sprint velocity, deliverables, JIRA tickets. This has ever been a challenge for managers & enterprises. Making it easier and safer, @BlueOptima has transparent metrics & AI to help improve productivity & quality.

0

1

0

1

0

Ataago7 retweeted

BlueOptima @BlueOptima

over 5 years ago

Low code and no code solutions are on the rise as organisations are wanting to improve software development capacity. Learn more: https://t.co/iSiLojI3Ie

0

2

1

0

0

Ataago7 retweeted

BlueOptima @BlueOptima

over 5 years ago

Women remain vastly underrepresented in the technology industry, only accounting for just 8% of positions in #softwaredevelopment. What initiatives are organisations using to improve diversity in #SoftwareEngineering teams? #womenintech Read more: https://t.co/pL4o1f6SMR

0

1

4

0

0

Ataago @Ataago7

over 5 years ago

@RootiParooti @TTrue27

0

1

0

0

0

Ataago @Ataago7

over 5 years ago

0

0

0

0

0

Ataago @Ataago7

over 5 years ago

@box_poly Are you sure you are not in jannah dear?

0

2

0

0

0

Ataago @Ataago7

over 5 years ago

@HyundaiIndia @AdvaithHyundai @hyundai_kun Update: Hyundai India has finally replied back, Now they are carrying out the vehicle repair to diagnose the problem in coordination with Kun Hyundai at Hyundai's cost. ETD stated to be 21st Nov. Hoping to get a postive outcome for the issue faced by multiple Creta owners.

0

1

0

0

0

Ataago @Ataago7

over 5 years ago

@kishanraj814 @HyundaiIndia @AdvaithHyundai @hyundai_kun Lets see, so far I am disappointed, they had escalated the issue till regional dealer level, and they tried to close the case saying, lets repair the car and go on a 200km ride to test the brakes (just wow). Have mailed the report to @HyundaiIndia again. Waiting with hopes. :)

1

2

0

0

0

Ataago @Ataago7

over 5 years ago

@kishansinghdvs1 @HyundaiIndia I have met with the same faith and my grandfather was the victim. More details here: https://t.co/SQjJqIfMmt

0

1

0

0

0

Ataago @Ataago7

over 5 years ago

@AshutoshBharti8 @Kia @HyundaiIndia @sourabh_jayjeet @ndtv @TimesNow I have met with the same faith and my grandfather was the victim. More details here: https://t.co/SQjJqIfMmt

0

1

0

0

0

Ataago @Ataago7

over 5 years ago

@raj_vsn @schmmuck I have met with the same faith and my grandfather was the victim. More details here: https://t.co/SQjJqIfMmt

0

1

0

0

0

Ataago @Ataago7

over 5 years ago

@schmmuck I have met with the same faith and my grandfather was the victim. More details here: https://t.co/SQjJqIfMmt

0

1

1

0

0

Ataago @Ataago7

over 5 years ago

@TeamBHPforum I have met with the same faith and my grandfather was the victim. More details here: https://t.co/SQjJqIfMmt

0

1

1

0

0

Ataago @Ataago7

over 5 years ago

@VijayAg56298494 @HyundaiIndia @hyundaiceo @VijayAg56298494 I have also met with the same faith with my creta. And my grandfather was the victim. And whats the solution the company has offered to you? Because this is a serious issue.

0

0

0

0

0

Ataago @Ataago7

over 5 years ago

@HyundaiIndia Email: [email protected]

0

1

0

0

0

Ataago @Ataago7

about 6 years ago

@FoxySnaps 50% off on hangers

0

1

0

0

0

Last Seen Users on Sotwe

Trends for you

Most Popular Users