Eric Malmi @ericmalmi - Twitter Profile

Pinned Tweet

over 1 year ago

Language models can't play chess, right?♟️ Excited to share our latest experiment that let's you play chess against a Gemini model!

ericmalmi's tweet photo. Language models can't play chess, right?♟️ Excited to share our latest experiment that let's you play chess against a Gemini model! https://t.co/1FY25dyjWx

4

56

11

13

12K

Eric Malmi @ericmalmi

about 1 month ago

@yoavgo agreed on the importance of teaching data! I've used a course project where the students are given a fine-tuning environment and need to produce the dataset (e.g. for GEC) a few good data papers: https://t.co/A3YehnvwQa https://t.co/rGTIpRv9G8 https://t.co/kEjul1C5Ei

0

4

1

4

211

Eric Malmi @ericmalmi

11 months ago

come chat to Jakub Adamek, @anianruoss, and me at the poster

0

6

0

1

226

Eric Malmi @ericmalmi

11 months ago

if you're at #icml2025, come check out our spotlight poster on "Mastering Board Games by External and Internal Planning with Language Models" ♟️ 📜: https://t.co/Ro4FhZFULh ⏲️: Wed 16 Jul 11 am - 1:30 pm PDT 📍: East Exhibition Hall A-B #E-2508 demo: https://t.co/jGISrpY8cq

1

6

0

1

396

Who to follow

Sean Ren

@xiangrenNLP

🍦Building @SaharaAI🍦| Professor @USCViterbi @nlp_usc | @MIT TR 35 , @ForbesUnder30 | Prev: @allen_ai, @Snapchat, @Stanford, @UofIllinois

Pei Zhou

@peizNLP

Senior Applied Scientist @Microsoft #OAR | PhD @nlp_usc | X-@GoogleDeepMind @allen_ai @AmazonScience @UCLA | Common Ground Reasoning for Communicative Agents

Swaroop Mishra

@Swarooprm7

Research Scientist at @GoogleDeepmind, Opinions mine

ericmalmi retweeted

Petar Veličković

@PetarV_93

11 months ago

Poster Spotlight! 🔦 Mastering Board Games by External and Internal Planning with Language Models ♟️ https://t.co/46CYU2w64W On Wednesday (Poster Session 3 East) Presented by Jakub Adamek and @ericmalmi

PetarV_93's tweet photo. Poster Spotlight! 🔦

Mastering Board Games by External and Internal Planning with Language Models ♟️

https://t.co/46CYU2w64W

On Wednesday (Poster Session 3 East)

Presented by Jakub Adamek and @ericmalmi

1

17

1

5

651

ericmalmi retweeted

Arena.ai

@arena

about 1 year ago

🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again! 🥇 #1 in Text, Vision, WebDev 🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories Huge congrats @GoogleDeepMind!

arena's tweet photo. 🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again!

🥇 #1 in Text, Vision, WebDev
🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories

Huge congrats @GoogleDeepMind! https://t.co/gYYjkuJsX4

21

999

120

144

311K

Eric Malmi @ericmalmi

about 1 year ago

thank you for the recognition @GaryMarcus! there's room for improvement, but I find it quite remarkable that an LLM learns to play creative sacrifices like this (best move according to Stockfish)

Gary Marcus

@GaryMarcus

about 1 year ago

@cfchabris @ericmalmi has kind of done that and it does pretty well except in weird positions - where it still sometimes make illegal moves. Confirming your conjecture and mine, if I understand his results correctly. https://t.co/a7omX7bDZl

1

3

0

675

0

10

0

2

534

Eric Malmi @ericmalmi

about 1 year ago

@GaryMarcus @RepresenterTh you're welcome to test the MAV model (w/o MCTS) at: https://t.co/jGISrpXAmS a few things to note: * for now, comments come from a different model so they can be ungrounded * MAV can play chess960, Hex, Connect4, but the Gem only supports chess

1

4

0

2

302

ericmalmi retweeted

Arena.ai

@arena

about 1 year ago

🚨Breaking: @GoogleDeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆 Highlights: - #1 in all text arenas (Coding, Style Control, Creative Writing, etc) - #1 on the Vision leaderboard with a ~70 pts lead! - #1 on WebDev Arena, surpassing Claude for the first time This is the first-ever sweep across text, vision, and WebDev by any model!🥇 Huge congrats to @GoogleDeepMind on this incredible breakthrough!

arena's tweet photo. 🚨Breaking: @GoogleDeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆

Highlights:
- #1 in all text arenas (Coding, Style Control, Creative Writing, etc)
- #1 on the Vision leaderboard with a ~70 pts lead!
- #1 on WebDev Arena, surpassing Claude for the first time

This is the first-ever sweep across text, vision, and WebDev by any model!🥇

Huge congrats to @GoogleDeepMind on this incredible breakthrough!

37

1K

213

248

531K

Eric Malmi @ericmalmi

over 1 year ago

@LaatoSamuli @anianruoss Thanks, Samuli!

0

1

0

39

Eric Malmi @ericmalmi

over 1 year ago

multiple long-time dreams coming true at once: ✅ give a talk at NeurIPS ♟️ play chess on a stage 🤡 make my international debut as a rapper thanks to the audience for a lively discussion that went on for a good hour after the talk and to my amazing co-presenters @anianruoss @weballergy @MatejJusup!

ericmalmi's tweet photo. multiple long-time dreams coming true at once:
✅ give a talk at NeurIPS
♟️ play chess on a stage
🤡 make my international debut as a rapper

thanks to the audience for a lively discussion that went on for a good hour after the talk and to my amazing co-presenters @anianruoss @weballergy @MatejJusup!

2

35

7

3

4K

ericmalmi retweeted

Google Gemini

@GeminiApp

over 1 year ago

Think you can outsmart Gemini? We challenge you to a chess match! Play Gemini in a game of chess with our newest Gem: Chess champ. Explore different openings as you banter back and forth with Gemini. Available in the Gemini web app. ♟️Can you beat it? → https://t.co/2M1GyJcRNL

34

795

108

161

59K

Eric Malmi @ericmalmi

over 1 year ago

@Bayesprof Thanks for joining!

0

1

0

75

Eric Malmi @ericmalmi

over 1 year ago

try playing against our no-search model at: https://t.co/R0230hGa3g and check out the paper: https://t.co/cdeUA7o0He

1

21

1

15

4K

Eric Malmi @ericmalmi

over 1 year ago

our work establishes new test-time scaling results for chess-playing LLMs ♟️📈 honestly, I think it's quite mind blowing that an LLM can learn to perform minimax tree search within a single model call and smoothly improve its Elo the more output tokens you give it 🤯

ericmalmi's tweet photo. our work establishes new test-time scaling results for chess-playing LLMs ♟️📈 honestly, I think it's quite mind blowing that an LLM can learn to perform minimax tree search within a single model call and smoothly improve its Elo the more output tokens you give it 🤯 https://t.co/uyuKfqxHVP

1

13

3

1

489

ericmalmi retweeted

Justin Zhao

@justinxzhao

over 1 year ago

LLMs can play chess! In-context minimax search bootstrapped with values from Stockfish, implemented in Gemini. Paper: https://t.co/tiwrLXwC7z Breadth 4, depth 2, you start running out of context window. Chess Elo improves with more test-time compute. Really cool work from @ericmalmi @GoogleDeepMind

justinxzhao's tweet photo. LLMs can play chess!

In-context minimax search bootstrapped with values from Stockfish, implemented in Gemini.

Paper:
https://t.co/tiwrLXwC7z

Breadth 4, depth 2, you start running out of context window. Chess Elo improves with more test-time compute.

Really cool work from @ericmalmi @GoogleDeepMind

1

17

5

6

4K

Eric Malmi @ericmalmi

over 1 year ago

Paper: https://t.co/cdeUA7o0He Demo: https://t.co/R0230hGa3g

0

155

Eric Malmi @ericmalmi

over 1 year ago

if you're at #NeurIPS2024, want to learn how to make LLMs really good at chess and see a live demo, come and visit the @GoogleDeepMind booth tomorrow at 9:30 am!

ericmalmi's tweet photo. if you're at #NeurIPS2024, want to learn how to make LLMs really good at chess and see a live demo, come and visit the @GoogleDeepMind booth tomorrow at 9:30 am! https://t.co/jB3nlszr8f

4

23

4

10

2K

Eric Malmi @ericmalmi

over 1 year ago

@PreethiLahoti Haha, you know me :) This is actually a great example of Gemini's generalization capabilities (no, I did not produce training data for this use case 😁)!

0

2

0

122

Eric Malmi @ericmalmi

over 1 year ago

Language models can't play chess, right?♟️ Excited to share our latest experiment that let's you play chess against a Gemini model!

4

56

11

13

12K

Eric Malmi @ericmalmi

over 1 year ago

You can try it at: https://t.co/R0230hFCdI (requires Gemini Advanced subscription but you can get 1 month for free). The experiment is powered by the MAV–small model from our new paper: https://t.co/cdeUA7nsRG

1

0

1

463

Eric Malmi

@ericmalmi

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users