@polynoamial I feel like you would be one of the best situated people to help come up with better eval criteria…
And I think it’s very important for many aspects of society that there is some more public information around this stuff
Yesterday, we sat down with @TomDwan for our 3rd conversation, and it didn't disappoint.
We covered a lot of ground, but some notable topics included:
• Why did Tom get access to private games that other pros didn't?
• How people in poker can use their skills to excel in other industries?
• The infamous J-4 hand
• The Robl fold
You can watch the recording of the Q&A here:
https://t.co/GxY2FPS25o
@polynoamial You guys tell the truth a lot more than Anthropic or XAI seem too.
I don’t have much knowledge about Google/deepmind I’m hopeful they’re closer on that scale to OAI.
We should have both what you suggest, and the public models playing. Along with more options
How about chip in X$, and ask the labs to match that with credits or cheaper rates etc for this specific task.
And maybe in exchange point out some obvious leaks etc (they could then try to find where from a reasoning standpoint those came from)
Ahh ok. Im happy to try to help you pressure the frontier labs to play ball.
Some of them claim to be good enough for military targeting and stuff (even though awful mistakes seem to have happened), why not do a proper real test?
@TomDwan Great point, Tom. Running frontier LLMs at scale is expensive. That's why we use AIVAT, a variance reduction technique that achieves the same statistical significance in 10x fewer hands, so 5K is equivalent to ~50K raw hands.
Also, this should be good for the labs themselves (Longterm).
Yes a lot of them hate telling the truth when mistakes are made. But this kind of situation is good for them to train models, and systems around how to more accurately assess those models confidence.
@GTOWizard Was this an automated response? What about doing more hands instead of the “luck-adjusted” bs hahaha.
It’s still cool regardless obv tho, happy you guys did this
And don’t do “luck adjusted” just do enough hands that the variance smoothes out and you can say who won/lost more.
Should be trivial to do mil hands if really wanna limit variance no?
We benchmarked every major AI model at poker.
GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, Grok 4 and more.
All played 5,000 hands of heads-up no-limit against our state-of-the-art poker agent.
Every single one lost. Here's the full breakdown 🧵
As one of the few foreigners who knows (almost) all of the story of china’s🇨🇳 crypto crackdown, it’s not even close to accurate to say star reported Li Lin
I can understand at least some of why both Star and Cz are upset. I hope they both take some deep breaths. They both follow me, that feels cool- I hope neither of them tilts and unfollows 😂.
I beg everyone in crypto to read this in full.
I expected this to be another case of social engineering, likely some recruiter/job offer shit.
I was very wrong.
And the depth of the operation and personas makes me think they already have multiple other teams on lock.
😳
Hey @KylieJenner — I’m Sam Kiki. I hold the record for the most ever won in 17 seasons on High Stakes Poker. I also hold the record for largest single day win. I, too, like splashy pots.
I have a seat and $500k with your name on it. Bring @RealChalamet. I’ll teach you both everything the @VanityFair video left out.
Then we can all compete on @PokerGO with a few of our mutual friends.