@PhDemetri Read Bernstein's "Against the Gods". Although it doesn't directly answer this question, none of what the book describes is possible without statistics.
@karlrohe I wrote a slightly longer version of this at https://t.co/8Lzv87LTND (but in terms of unsupervised learning rather than multivariate analysis).
@karlrohe For a multivariate analysis, you don't have any notion of accuracy, but you do have some notion of description length. That opens the door to the same information-theoretic tools that have been so useful in analyzing the predictive setting.
@alex_peys For binary classification, looking at paired differences has a pretty high false discovery rate. Those issues may be present in LLM evaluations as well. See T.G. Dietterich, "Approximate statistical tests for comparing supervised classification learning algorithms".
@codydroux PSPACE = IP so if NP = IP we live in a world where being able to ask any finite number of questions doesn't buy you much more than being able to ask one. That's hard to swallow.
@kareem_carr If a distribution has a finite second moment, the difference between the mean and the median is at most one standard deviation. So an upper bound can be derived from Chebyshev's inequality.