Make your RL environments and libraries play nice with PufferLib 0.4, out now with one-line wrappers, pain-free vectorization, and more!
Demo it on Colab: https://t.co/C9DeCO4TBt
New Features
- One-line wrappers for your Gym and PettingZoo environments
- Serial, Multiprocessing, and Ray vectorization backends
- PufferTank, a container preloaded with PufferLib and common environments
More importantly, we have rewritten the entire core for simplicity and extensibility. While this is not a flashy new feature, you will notice significantly fewer rough edges working with PufferLib. For example, your Gym environments are no longer converted to PettingZoo environment internally, and your discrete action spaces are no longer returned as MultiDiscrete: WYSIWYG.
Emulation: Previously, PufferLib required you to wrap your environment class in a binding, which then provided creation and additional utilities. Now, you pass in a Gym/PettingZoo environment and get back a Gym/PettingZoo environment. All of the benefits described in our 0.2 blog post are included.
Vectorization: Previously, PufferLib’s vectorization expected a binding object. Now, you pass it an environment creation function (as above) or a Gym/PettingZoo PufferEnv, if you prefer to subclass directly. Compared to 0.2 PufferLib includes Serial and Multiprocessing backends, in addition to Ray.
PufferTank: Many common RL environments are notoriously hard to set up and use. PufferTank provides containers with several such popular environments tested to work with PufferLib. These are preloaded onto base images so you can build the container over a coffee break.
Policies: Previously, PufferLib required you to subclass a PyTorch base class for your models. Now, you can use vanilla PyTorch policies. We still provide a base class as an option, which allows you to use another of our wrappers to handle recurrence for you. Pass your model to our wrappers and we will convert to framework-specific APIs for you.
Error Handling: Previously, PufferLib applied expensive runtime checks to all environments by default. These could be disabled by running with -O. This was inconvenient and easily forgotten. Now, these checks only run once at startup with negligible overhead. Thus far, we have observed no bugs with the new version that would have been caught by the previous checks.
Misc: We have added sane default installations, setup, and policies for several more environments. Check our home page for an updated list.
The new environment and policy changes means that PufferLib no longer breaks serialization. This is useful for saving environment and model states.
We have written an optimized flatten and unflatten function for handling observation and actions. This was previously a bottleneck for environments with complex spaces. Expect a separate post on this, since it was an interesting case study for Python extension options.
We have an experimental custom CleanRL derivative to correctly handle environments with variable numbers of agents, without training on padding. Doing this simply has been a longstanding challenge in RL. More on this once it is more stable.
Code examples and more at https://t.co/LNIUzcCqo7. Comment your RL pain points below and we might just fix them.
PufferLib 0.5 is out now! Make RL libs + envs play nice with:
- Native Python EnvPool implementation that works with ALL your environments, not just c++
- New bindings for Pokemon Red, Minigrid, and more
- New & improved training code built on @vwxyzjn's CleanRL
Happy New Year!
PufferLib 0.6 is out now! RL from scratch + track online + upload model publicly + watch it play = 1 cpu core & 1 minute
Integrates with @wandb & @vwxyzjn's CleanRL
PufferLib 0.7 out now!
- 65% faster CleanRL training with just our vectorization
- 2-3x faster training w/ our async on-policy sampling
- Pokemon trains at 3000x real time on a single desktop
Links soon because algorithm
Should I tie the latest PufferLib CleanRL training script to WandB for slightly cleaner and more stable code or build on TensorBoard and use WandB's sync integration?