For those interested in mechanistic interpretability and AI safety — tagging a few folks who might find this relevant.
@NeelNanda@ch402@hendrycks
(Happy to share draft, code, and full protocol.)
LLM hallucinations aren’t “noise” or “chaos.”
They’re the opposite: a state of excessive internal order.
I call this Obsessive Coherence.
I’ve identified a topological signature in attention maps that detects hallucinations with p ≪ 10⁻⁶⁴.
Thread 🧵👇
I’m preparing an arXiv submission (https://t.co/bj9RZKPIXY / cs.LG / https://t.co/Tis6lJZwJE).
As an independent researcher, I’m seeking an arXiv endorsement.
Draft, code, and full protocol are public — happy to share via DM.