Computational social scientist at UCLA focused on science and culture. Interested in Bayesian stuff, causal inference, and deep learning. AP at UChicago in 2024
old-style benchmarks that measure narrow capabilities precisely, and qualitative assessments that are holistic but can't produce clear evidence of progress. Without better evaluation methods that bridge these perspectives, it's hard to gauge the true potential and limits of LLMs
Hi friends,
I wanted to share a TIME op-ed David Peterson and I wrote about the history of evaluation in AI.
https://t.co/6A9krRUtjF
Core argument: The inability to evaluate AI's potential precipitated the bubble and winter of the 1980s. Today, we face a similar problem.
Many creative and process-based tasks we now seek to automate can’t be benchmarked. There is no "correct" PowerPoint or scientific hypothesis.
As a result, new models are evaluated as much by "vibe tests" as concrete metrics. We're caught between two limited approaches:
Wow! Honored that our paper with Bernie Koch, @cephaloponderer, and Jacob Foster won a best paper award at the NeurIPS Dataset and Benchmark track!
So pleased that a sociology of science paper won such an honor at NeurIPS. https://t.co/oYlJNTKsrO
@pablogerbas Thanks for sharing beyond my loyal 13 followers Pablo. :) I'd just add that this literature is exciting because it provides interesting directions not just for heterogeneous effect estimation, but also CI with text, graphs, and images!
And, to make things even better, the review paper is accompanied by a detailed tutorial in TensorFlow 2, so you can try it by yourself!
https://t.co/U4HZEUgW0m
@pablogerbas Thanks for sharing beyond my loyal 13 followers Pablo. :) I'd just add that this literature is exciting because it provides interesting directions not just for heterogeneous effect estimation, but also CI with text, graphs, and images!
The amazing @bernardjkoch just posted on Arxiv "Deep Learning of Potential Outcomes". If you're either familiar with causality but not with deep learning, or the other way around, this is a great place to start!
https://t.co/CQBRPQ9yqQ