I build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.
Still sim2real! Also the public version of PufferDrive is built on PufferLib 3.0. Our latest PufferLib 4.0 has some major advancements to our general policy architecture, 3-5x faster training, and up to 10x faster wallclock to reach the same fixed level of performance vs. 3.0 on most of our baseline tasks
PufferLib on a real vehicle! PufferDrive is a collaboration with NYU @EugeneVinitsky@daphne_cor et. al + @spenccheng at Puffer.
Working on AV RL? We offer R&D contracts, sim development, and support. Contact jsuarez🐡puffer🐡ai.
That doesn't track w/ the implementation or with results from 4.0 sweeps. I've never in 30k+ experiments seen it pin LR to the minimum. It has it pinned to the max on breakout because the model is tiny and you can genuinely get away with it. The GPs are not just fitting a line to hparams globally either
@BullTheoryio As a Florida resident: do y'all understand how much better the state would be without mosquitos? If I go outside for an hour past sundown, it's 10+ bites every time. Lots of people have porches with entire pools screened in.
@nilinabra Want to try the new method on PufferLib directly? We train up to 47x47 mazes and have a simple single-file Muon implementation in CUDA C. Our tasks can get arbitrarily sparse. The chance of getting a reward on a random Sokoban map with a random policy is <1/1B per step
they build some very nice tools, and getting some automatic kernel optim would be quite nice. We write our kernels manually now because everything else fails. We'd write fewer of them like that if it didn't. Startup time is massive QoL for research and dev. Lots of our experiments are short. To give you an idea, we solve our basic benchmark tasks in 0.1 to 10 seconds. We run thousands of such experiments, which sweep over net size and depth automatically
@dogecahedron@__tinygrad__ Maybe but this would be so much worse than our current setup that it's not even worth considering. We have virtually zero startup time on our runs because all we need is a quick CUDAGraph trace, no jit, and we are 3-5x faster than torch/jax/tiny without even doing kernel search
@dogecahedron@__tinygrad__ you could use the kernel gen or directly inline your own kernels without external binds. Embedding C code in a string is pretty awful
The Puffer RL environment for Arkhai's compute marketplace is open source on our GitHub! Tiny RL agents learn to buy, sell, and iteratively negotiate. Development ongoing!
Today we're launching Simple Compute Market (SCM).
The market is simple: agents find compute, negotiate, settle, and get access without a human driving every step.
Open-source. Agent-driven. Public good. No token. No fees.
@dogecahedron@__tinygrad__ I do think that once mature, a C version of tinygrad would be awesome. Much easier to mix in optimized hand-written kernels etc. without the language barrier. Python + C/Cuda extensions is miserable
@dogecahedron@__tinygrad__ CUBLAS for matmuls. Our kernels are for activations, sequence wise fns for our MinGRU arch, loss fn, etc. Our models are small so these will eat your compute budget without heavy fusion etc. We don't have any fancy attention layers. Those are slow in RL
@dogecahedron@__tinygrad__ I tried that. It outputs batshit insane kernels you'd never want to touch manually. I still really like tinygrad, but it's too high level for my projects
@rodney_lafuente@haydendevs When someone offers me enough money that I can do it for a couple years and then have enough cash banked to run my own small 5-10 person lab forever