@no_stp_on_snek An update: we haven't found a benefit in KV workloads yet. The ones we tried all have a small normalized third moment, which suggests that a second RHT is not needed.
We did find promising improvements in approximate nearest neighbors search and will update the writeup when done.
@no_stp_on_snek Thanks for the update. Is there a test we can easily reproduce with your data?
We played a bit with synthetic data for now.
2 RHTs seem important on sparse or skewed data, which seems to agree with the theory (large normalized third moment).
@no_stp_on_snek Hey. The point is that in many cases 1 RHT is perfectly fine, but in others you need 2. We describe how to test that for a given input in linear time, to avoid running the second RHT when not needed.
The benefit also depends on the dimension; we're running tests now.
@no_stp_on_snek This is a theory paper for now and we didn't run tests yet. The gain would necessarily be workload dependent as we know 1 rht is often enough. I'll update when we have some results.
@no_stp_on_snek (While most of the paper discusses how 2/3 RHTs recover the guarantees of uniform random rotations, we also provide a linear time method for determining how many RHTs are actually needed for a given input.) @no_stp_on_snek
UCL Computer Science is hiring a Lecturer / Associate Professor in Systems & Networks!
Strong teaching + research role in a great group.
Deadline Jan 18.
Details: https://t.co/SIvj4Pu086
Please share, and feel free to reach out!
The graph depicting the number of COVID-19 deaths in Israel that currently appears on the Our World in Data (OWID) website is mistaken.
We are working with the OWID team to correct this mistake >>>