if anyone is willing to organise this in the himalayas this summer we would be forever grateful to you and just imagine the service you would be doing to everyone being there!
it is time to go back to the original ways
New research from @japhba and I!
Activation Oracles are a pretty cool interpretability tool. They answer natural questions about activations, but they suffer from vagueness and hallucinations. Can AO training be improved?
Turns out: Yes! We identify four fixes that make AOs substantially more useful!