Excited to finally share our progress in developing a reinforcement learning system to beat Pokémon Red. Our system successfully completes the game using a policy under 10M parameters, PPO, and a few novel techniques.
Blog posted below
A few years ago, I wanted to find a way to maximize shade during my long runs in the summer. Recently, I decided to experiment with a solution. Presenting Shady Route Finder.
https://t.co/bFxtrDpTnq . Supports 8 cities, can tell you what side of the street to walk on, mobile and even has a sunny route finder mode on desktop. I've tested some routes, but obviously not all. Any feedback would be great.
i did start with a rudimentary implementation of pokemon stemming from a native rewrite of pokemon firered. the starting point i used gets around 4,000,000 steps per second as an rl env.
here is the entire prompt (caution: long!!!):
Reflection is partnering with Shinsegae Group to build a 250-megawatt sovereign AI factory for the Republic of Korea.
Open intelligence. Built on trust between allies. Owned by the nations that need it most.
The future of sovereign AI. Read more in the @WSJ.
Had some fun helping out @kywch500 and @jsuarez simplifying Pufferlib's 2048 env the last couple of weeks. 2x better results with fewer observations, rewards and a new model architecture!