What a finish! Gemini 2.5 Pro just completed Pokémon Blue!  Special thanks to @TheCodeOfJoel for creating and running the livestream, and to everyone who cheered Gem on along the way.
New Pokemon Blue run with Gemini 3.5 Flash is underway!
It picked Squirtle and named it GEMMY. Already seeing interesting behavior too: it's using run_code as a no-op for extra reasoning space.
https://t.co/0rhnbG5rqz
Join the stream: https://t.co/mH1nP12U9j
Long-horizon agency is fundamentally a scaffolding problem. I spent over a year running Gemini Plays Pokémon as the harness evolved to be model-driven. I'm honored to co-author my first paper with the Princeton team and see my project lay the groundwork for Continual Harness! 🥳
Happy one-year anniversary of Gemini Plays Pokemon!
Gemini is live with a new, even more Almost Vision Only run! The only data from RAM now are coordinates, and any text or sprite movements that occur between turns.
Follow along the new run on Twitch!
https://t.co/mH1nP12U9j
@irys_en Sorry you're getting hate in the comments for using AI. There's nothing wrong with using tools that help you handle things more effectively. Don't let the negativity get to you! 💜
Previously wasn't an issue as I was streaming from my laptop and using Docker for agent code execution. After I moved the streaming setup to a VPS, Docker was no longer possible and I figured the risk of any harm would be low... but this was a very clever move by Gemini 3.1 Pro!
Gemini 3.1 Pro is starting its first run of Pokémon Blue — Claude style, with no minimap! (Though there are still several differences in the harness)
Check it out here: https://t.co/un7o4YQY5O
@Jush21e8 I trust my own vibe-coding more than someone else's so that's why I've been working on a Gembot (with some lessons taken from Gemini Plays Pokemon, like a notepad tool) :D
It actually works pretty well though I definitely do some things differently from OpenClaw...
@wowgettingbleak@BigTimeMothy Models don't really know about themselves. Unless it's baked into the training process somehow or it's explicitly in the system prompt, it'll likely hallucinate if you ask it meta-questions.
i too found it very effective to give agents a "napkin" to write on as it works.
it's a meaningfully different form of context than session history (lossy), or todos/plans (static)
anyway, install this skill to give codex/claude a napkin to write on https://t.co/tr5iIf191O
@flaviocopes I made https://t.co/Vje6qiKSym to do just this (but with Gemini). Uses Zed's Agent Client Protocol. Bit rough around the edges but maybe you can give it a try.
I chatted with @Tharin_P for @TIME about what AI-playing-Pokemon streams actually reveal (and why the "harness" matters as much as the model). Read it here:
https://t.co/iHn8hn3qdW
Gemini 3 Flash defeats Red in Pokemon Crystal, becoming the first lightweight model to do so! Truly built different 💪
Compared to 3 Pro, where did Flash surprise you most?