🧑🔬I’m recruiting PhD students in Natural Language Processing @UniLeipzig Computer Science, together with @Sca_DS!
Topics include, but aren’t limited to:
🔎Linguistic Interpretability
🌍Multilingual Evaluation
📖Computational Typology
Please share!
#NLProc#NLP
We just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use. https://t.co/FWmDdd4p2j
Honored to win the ICTIR Best Paper Honorable Mention Award for "Axioms for Retrieval-Augmented Generation"!
Our new axioms are integrated with ir_axioms: https://t.co/T6i9S324Fh
Nice to see axiomatic IR gaining momentum.
Happy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation.
📄 https://t.co/kG75fYl17H
Thank you @cadurosar for shout-out of Lightning IR in the LSR tutorial at #SIGIR2025
If you want to fine your own LSR models, check out our framework at https://t.co/X3Mm6MhmVO
Do not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :)
The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.
More details are available at: https://t.co/VKM5P4ELcT
Our paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness.
The fourth edition of ReNeuIR @ #SIGIR2025 is back!! Check https://t.co/cFaPXT7H7E to see what we have in mind this year! Paper submission deadline: May 20, 2025.
📢 Our paper "The Viability of Crowdsourcing for RAG Evaluation" has been accepted to #SIGIR2025 ! We compared how good humans and LLMs are at writing and judging RAG responses, assembling 1800+ responses across 3 styles, and 47K+ pairwise judgments in 7 quality dimensions. 🧵➡️
@DeepLcom After the recent update, it is possible anymore to resize the window to one-column layout. Why? This was the only viable way of working fast on small screens...
Not leaving quite yet, but I’m finding my Bluesky feed less noisy and more enjoyable.
@martinpotthast created a starter pack with some members of the IR community, which we hope it keeps growing!
https://t.co/l2VdMNQpHi
Evaluating generative retrieval systems that directly return generated texts with references is an interesting research problem for the coming years. Now we have Niklas Deckers at #SIGIR2024 presenting perspectives on "Evaluating Generative Ad Hoc IR". https://t.co/3sACw1Ui9s
On Thursday, we're running the #reneuir workshop at #sigir2024. The workshop will have two amazing keynote speakers (@ZhuyunDai and Qi Chen) as well as a number of original and invited talks, plus a shared poster session. Please come along!