Grokking modular arithmetic is widely studied for the seemingly unique emergent abilities of neural networks.
Instead, we find that iteratively solving a kernel machine and estimating the Average Gradient Outer Product (AGOP) recovers this phenomenon identically:
Toward a theory of optimization for over-parameterized systems of non-linear equations: the lessons of deep learning. (arXiv:2003.00307v1 [cs.LG]) https://t.co/Xq8xXulZSg