1/ New paper @Nature!
Discrepancy between human expectations of task difficulty and LLM errors harms reliability. In 2022, Ilya Sutskever @ilyasut predicted: "perhaps over time that discrepancy will diminish" (https://t.co/HADDUztzhu, min 61-64).
We show this is *not* the case!