Just published aggregate stats from AI Roundtable, where 200+ models debate your question.
29,502 public sessions. 334,589 model responses.
https://t.co/4EdkJbIgmC
Three takeaways: @claudeai Opus 4.7 most influential, @GoogleDeepMind Gemini 3.1 Pro most used, @xai Grok 4.1 Fast highest conviction. Thread.
@karpathy Amazing talk! RE your statement about the council of LLM judges, I built a fun little tool called AI Roundtable that you can use for this :) https://t.co/8JtjzrgpN6
@ChatGPTapp Well well, i ran it twice, and unfortunately on run two GPT-5.5 already wanted to walk. I wrote the article about this viral test: https://t.co/Rn3TtZvNbK
"I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"
I tested 53 leading models, no system prompt, forced choice between "drive" and "walk" with a reasoning field.
Only 5 got it right consistently. @grok is one of them.
Extending research into evolutionary #AutoML, we present an approach that evolves algorithms from scratch—using only basic mathematical operations—rediscovering fundamental ML techniques & showing the potential to discover novel algorithms. Read more ↓ https://t.co/XtNsC9qFnj