👩🔬AutoBench goes scientific!🎉Started 6 months ago almost as a game, then turned into business, now it's got a fancy arXiv paper to prove it's not just fun and money. Introducing the first scientific paper that validates our Collective-LLM-as-a-Judge method!🤖📜
1/12
@athleticKoder Great post! We address many of these points in our latest paper https://t.co/ESJGqy90ux where we introduce UDCG, a new metric for evaluating retrieval for RAG, you might want to check it out!
LLMs aren't humans, so why evaluate RAG with human-centric metrics?
Traditional IR metrics fail to predict RAG performance because:
❌They assume sequential reading
❌They ignore that some docs actively distract LLMs
Meet UDCG: a new metric w/ 36% ⬆️ corr. to actual performance✅
Retrieval evaluation needs to evolve for the LLM era. UDCG is a step toward metrics that align with machine consumers, not human browsing patterns.
📄 Full paper: https://t.co/ESJGqy90ux
💻 Code & data: https://t.co/p2gW6dw7Ni
(8/n)
🚀Huge congratulations to Florin! 🏆 If reading the research paper isn't your thing, no worries—you can listen to the generated podcast instead, and it's actually pretty good. Quoting from it, "It's going to sound crazy!
https://t.co/OScA07oDrn
Well done to our 'Best Poster Award' winners! One of our winners is @FlorinCuconasu with "The Power of Noise: Redefining Retrieval for RAG Systems". Congrats!! #M2LSummerSchool
Research published by Google Deepmind reveals OpenAI Strawberry's approach.
Searches at inference through potential responses to reason better.
“test-time compute can be used to outperform a 14× larger model.”
Almost ready for the second keynote here at the IR-RAG workshop!
Finish up your lunch and come to the presidential ballroom for Dr. Zhang presentation! @yuhaozhangx
"Seldom is a glance at the statistics enough to understand the meaning of the figures" 🤔
Fromm had clearly RAG in mind when uttering these words! ✍️
Read our paper to find out more about BaseVsInstruct for RAG! 🚩
📢 Check out the refined version of “The Power of Noise: Redefining Retrieval for RAG Systems”! accepted @SIGIR2024.
A huge thank you to the reviewers and the research community for your invaluable feedback—it has significantly improved the paper. (1/n)