New Paper: We unlock AI Evaluation with explanatory and predictive power through general ability scales!
-Explains what common benchmarks really measure
-Extracts explainable ability profiles of AI systems
-Predicts performance for new task instances, in & out-of-distribution
🧵
1/ New paper @Nature!
Discrepancy between human expectations of task difficulty and LLM errors harms reliability. In 2022, Ilya Sutskever @ilyasut predicted: "perhaps over time that discrepancy will diminish" (https://t.co/HADDUztzhu, min 61-64).
We show this is *not* the case!
New and shiny AI systems have superseded the ones we reference (it took a while to publish), but our perspectives and suggestions for evaluating them have only become more relevant. Go have a read! 👽👽
Is it time to rethink how we perform system evaluations in AI? In our new @ScienceMagazine paper, we show that over-reliance on aggregate metrics and a lack of transparency in reporting threatens public understanding and hinders progress in the field. 1/8 https://t.co/kZMNCEALbG
📐Our Evaluation Beyond Metrics workshop at IJCAI got accepted... so prepare your cool papers!
💻https://t.co/RRRSJvt1hR
With @LucyCheke, @DanajaRutar, @JohnJBurden, @DrRyanBurnell, @TomerUllman and twitterless Josh Tenenbaum, José Hernández-Orallo and Fernando Martínez-Plumed
Our paper "Training on the Test Set: Mapping the System-Problem Space in AI" (https://t.co/rF76aUKOPN) is the runner up for the Blue Sky Awards in @RealAAAI 2022!
Artificial Intelligence is a fantastic opportunity for Europe.
And citizens deserve technologies they can trust.
Today we present new rules for trustworthy AI. They set high standards based on the different levels of risk.
🎉🎉🎉 I'm excited to introduce "a ggplot2 grammar guide". Here is part of the **visual table of contents** (viztoc). You can click through to get at-your-own-pace guidance from *flipbooks* showing code-output plot evolution! More in 🧵 1/
https://t.co/XCBwKTLfJo
Today we are happy to announce #DigitalECAI2020! A digital conference of the highest scientific level which will offer to the #AI community lots of possibilities to meet, debate and interact.
Read our statement at: https://t.co/JeOHZ6Ejp4
Join us at https://t.co/r71h9Q3zhK!
PREINSCRIPCIÓN @MUIinfUPV 20-21 ¡Hasta el 12 de junio!
@etsinfupv@upv@UPVCampusAlcoy
El MUIINF sigue con el mismo entusiasmo, empresas involucradas, alumnos extranjeros ya admitidos, y formación semi-presencial, que os aporta una gran flexibilidad
https://t.co/VdoYxyxEMI
🚨 UPDATE: In the light of the #COVID19 situation and having the health and safety of all the community as top priority, #ECAI2020 has been rescheduled to August 29-September 2.
➡️ Complete statement is at: https://t.co/B22P00e3b9
We look forward to seeing you next August!
Seeking paper submissions for the 1st Evaluating Progress in AI workshop at ECAI20. Experimental and theoretical papers on developing benchmarks, indicators, measuring progress, forecasting societal impacts of AI advances. Submission deadline March 20th!
https://t.co/mFpGzART0K
@dmonett@_KarenHao Sorry for the delay! I haven't connected on twitter much lately. The shinyApp was developed by Aiden (co-author), so I do not have the code. Send me an email ([email protected]) and I'll send you the data I collected (IJCAI, AAAI and AITopics).