@LukeGessler@ZhiyingJ
Digging a bit deeper into the "GZIP beats BERT" paper, I think that a large part of why it works is because it compares character n-grams between documents. You can use this to make the implementation O(n) instead of O(n^2).
Here's a write-up: https://t.co/UidLKICb8N