Day 27-28
Finished makemore part 5: Building a WaveNet
Went from a flat MLP that squashes all context at once to a hierarchical model that fuses characters in pairs across layers just like DeepMind's WaveNet dilated convolutions.
The model got so better, here is before vs now👇