@yoavgo agreed on the importance of teaching data!
I've used a course project where the students are given a fine-tuning environment and need to produce the dataset (e.g. for GEC)
a few good data papers:
https://t.co/A3YehnvwQa
https://t.co/rGTIpRv9G8
https://t.co/kEjul1C5Ei
if you're at #icml2025, come check out our spotlight poster on "Mastering Board Games by External and Internal Planning with Language Models" ♟️
📜: https://t.co/Ro4FhZFULh
⏲️: Wed 16 Jul 11 am - 1:30 pm PDT
📍: East Exhibition Hall A-B #E-2508
demo: https://t.co/jGISrpY8cq
Poster Spotlight! 🔦
Mastering Board Games by External and Internal Planning with Language Models ♟️
https://t.co/46CYU2w64W
On Wednesday (Poster Session 3 East)
Presented by Jakub Adamek and @ericmalmi
🚨Breaking: New Gemini-2.5-Pro (06-05) takes the #1 spot across all Arenas again!
🥇 #1 in Text, Vision, WebDev
🥇 #1 in Hard, Coding, Math, Creative, Multi-turn, Instruction Following, and Long Queries categories
Huge congrats @GoogleDeepMind!
thank you for the recognition @GaryMarcus!
there's room for improvement, but I find it quite remarkable that an LLM learns to play creative sacrifices like this (best move according to Stockfish)
@cfchabris@ericmalmi has kind of done that and it does pretty well except in weird positions - where it still sometimes make illegal moves.
Confirming your conjecture and mine, if I understand his results correctly.
https://t.co/a7omX7bDZl
@GaryMarcus@RepresenterTh you're welcome to test the MAV model (w/o MCTS) at: https://t.co/jGISrpXAmS
a few things to note:
* for now, comments come from a different model so they can be ungrounded
* MAV can play chess960, Hex, Connect4, but the Gem only supports chess
🚨Breaking: @GoogleDeepMind’s latest Gemini-2.5-Pro is now ranked #1 across all LMArena leaderboards 🏆
Highlights:
- #1 in all text arenas (Coding, Style Control, Creative Writing, etc)
- #1 on the Vision leaderboard with a ~70 pts lead!
- #1 on WebDev Arena, surpassing Claude for the first time
This is the first-ever sweep across text, vision, and WebDev by any model!🥇
Huge congrats to @GoogleDeepMind on this incredible breakthrough!
multiple long-time dreams coming true at once:
✅ give a talk at NeurIPS
♟️ play chess on a stage
🤡 make my international debut as a rapper
thanks to the audience for a lively discussion that went on for a good hour after the talk and to my amazing co-presenters @anianruoss@weballergy@MatejJusup!
Think you can outsmart Gemini? We challenge you to a chess match!
Play Gemini in a game of chess with our newest Gem: Chess champ. Explore different openings as you banter back and forth with Gemini. Available in the Gemini web app.
♟️Can you beat it? → https://t.co/2M1GyJcRNL
our work establishes new test-time scaling results for chess-playing LLMs ♟️📈 honestly, I think it's quite mind blowing that an LLM can learn to perform minimax tree search within a single model call and smoothly improve its Elo the more output tokens you give it 🤯
LLMs can play chess!
In-context minimax search bootstrapped with values from Stockfish, implemented in Gemini.
Paper:
https://t.co/tiwrLXwC7z
Breadth 4, depth 2, you start running out of context window. Chess Elo improves with more test-time compute.
Really cool work from @ericmalmi@GoogleDeepMind
if you're at #NeurIPS2024, want to learn how to make LLMs really good at chess and see a live demo, come and visit the @GoogleDeepMind booth tomorrow at 9:30 am!
@PreethiLahoti Haha, you know me :) This is actually a great example of Gemini's generalization capabilities (no, I did not produce training data for this use case 😁)!
You can try it at: https://t.co/R0230hFCdI (requires Gemini Advanced subscription but you can get 1 month for free).
The experiment is powered by the MAV–small model from our new paper: https://t.co/cdeUA7nsRG