Webis Group @webis_de - Twitter Profile

7 months ago

For full technical details + compliance Datasheet see our preprint @ https://t.co/rLPIMCNhqW As for German-specific models trained on this data... stay tuned 👀

0

6

0

226

Webis Group @webis_de

7 months ago

We just released "German Commons", the largest openly-licensed German text dataset for LLM training: 154B tokens with clear usage rights for research and commercial use. https://t.co/FWmDdd4p2j

3

87

18

35

19K

Webis Group @webis_de

7 months ago

The data spans 7 text domains: 🌐 Web: Wikis, GitHub, social media 💬 Political: Parliamentary proc., speeches ⚖️ Legal: Court decisions, law 📰 News: Newspaper archives 🏦 Economics: Public tenders 📚 Cultural: Heritage collections 🔬 Scientific: Papers, books, journals

webis_de's tweet photo. The data spans 7 text domains:
🌐 Web: Wikis, GitHub, social media
💬 Political: Parliamentary proc., speeches
⚖️ Legal: Court decisions, law
📰 News: Newspaper archives
🏦 Economics: Public tenders
📚 Cultural: Heritage collections
🔬 Scientific: Papers, books, journals https://t.co/bLT0cIRF3v

1

6

0

1

286

Webis Group @webis_de

11 months ago · Padua

@H1iReimer @maik_froebe @martinpotthast @matthias_hagen @bennostein Congratulations to the authors @H1iReimer, @maik_froebe, @bennostein, @martinpotthast, @matthias_hagen from @UniJena, @bauhaus_uni, @uni_kassel, @Hessian_AI, @Sca_DS!

0

6

0

182

Who to follow

Jia-Chen Gu

@Jiachen_Gu

Postdoc @UCLA @UCLANLP

SIGIR-AP 2025

@ACMSIGIR_AP

3rd International ACM SIGIR Conference on Information Retrieval in the Asia Pacific (December 7-10, Xi'an, China)

Guido Zuccon

@guidozuc

Professor at The University of Queensland, leader of @IELabGroup (https://t.co/yLTRjRQAWA), Information Retrieval researcher

Webis Group @webis_de

11 months ago · Padua

Come join us at the poster session at ICTIR 2025 to discuss: - Axioms for Retrieval-Augmented Generation https://t.co/eDmCHt07fc - Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins https://t.co/8gt8EAUpBQ

webis_de's tweet photo. Come join us at the poster session at ICTIR 2025 to discuss:
- Axioms for Retrieval-Augmented Generation https://t.co/eDmCHt07fc
- Learning Effective Representations for Retrieval Using Self-Distillation with Adaptive Relevance Margins https://t.co/8gt8EAUpBQ https://t.co/diubLDuy85

1

7

1

2

471

Webis Group @webis_de

11 months ago · Padua

Honored to win the ICTIR Best Paper Honorable Mention Award for "Axioms for Retrieval-Augmented Generation"! Our new axioms are integrated with ir_axioms: https://t.co/T6i9S324Fh Nice to see axiomatic IR gaining momentum.

webis_de's tweet photo. Honored to win the ICTIR Best Paper Honorable Mention Award for "Axioms for Retrieval-Augmented Generation"!
Our new axioms are integrated with ir_axioms: https://t.co/T6i9S324Fh
Nice to see axiomatic IR gaining momentum. https://t.co/KDOE6rRQvw

1

15

5

0

607

Webis Group @webis_de

11 months ago

Thrilled to announce that @MattiWiegmann has successfully defended his PhD! 🎉🧑‍🎓 Huge congratulations on this incredible achievement! #PhDDefense #AcademicMilestone

webis_de's tweet photo. Thrilled to announce that @MattiWiegmann has successfully defended his PhD! 🎉🧑‍🎓 Huge congratulations on this incredible achievement!
#PhDDefense #AcademicMilestone https://t.co/zCAGtUdWlg

0

6

0

175

Webis Group @webis_de

11 months ago

Congrats to the authors @LukasGienapp, Tim Hagen, @maik_froebe, @matthias_hagen @bennostein, @martinpotthast and @hscells – from @uni_kassel, @Hessian_AI, @Sca_DS, @uni_tue, @UniJena & @bauhaus_uni

0

5

0

131

Webis Group @webis_de

11 months ago

Happy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation. 📄 https://t.co/kG75fYl17H

webis_de's tweet photo. Happy to share that our paper "The Viability of Crowdsourcing for RAG Evaluation" received the Best Paper Honourable Mention at #SIGIR2025! Very grateful to the community for recognizing our work on improving RAG evaluation.

📄 https://t.co/kG75fYl17H https://t.co/YNJVkWKOXj

1

20

6

0

573

Webis Group @webis_de

11 months ago · Padua

@LukasGienapp presents "The Viability of Crowdsourcing for RAG Evaluation" at #SIGIR2025 The paper is available at: https://t.co/kG75fYl17H

webis_de's tweet photo. @LukasGienapp presents "The Viability of Crowdsourcing for RAG Evaluation" at #SIGIR2025

The paper is available at: https://t.co/kG75fYl17H https://t.co/jirvEjOxAc

0

8

2

1

188

webis_de retweeted

Maik Fröbe @maik_froebe

11 months ago

Do not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :) The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API. More details are available at: https://t.co/VKM5P4ELcT

maik_froebe's tweet photo. Do not forget to participate in the #TREC2025 Tip-of-the-Tongue (ToT) Track :)

The corpus and baselines (with run files) are now available and easily accessible via the ir_datasets API and the HuggingFace Datasets API.

More details are available at: https://t.co/VKM5P4ELcT https://t.co/oQuO28FtUO

0

14

7

1

599

Webis Group @webis_de

12 months ago

Credit & thanks to the author team @LukasGienapp @DeckersNiklas @martinpotthast @hscells 📄 Preprint: https://t.co/lSyd3jzivz 💻 Code: https://t.co/GWnzyoglon

0

85

Webis Group @webis_de

12 months ago

Our paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness.

webis_de's tweet photo. Our paper on self-distillation for training bi-encoders got accepted at #ICTIR2025! By exploiting pretrained encoder capabilities, our approach eliminates expensive teacher models and batch sampling while maintaining the same effectiveness. https://t.co/lntEOIihjN

1

6

2

3

283

Webis Group @webis_de

12 months ago

Results on BEIR demonstrate that our method matches teacher distillation effectiveness, while using only 13.5% of the data and achieving 3-15x training speedup. This makes effective bi-encoder training more accessible, especially for low-resource settings.

webis_de's tweet photo. Results on BEIR demonstrate that our method matches teacher distillation effectiveness, while using only 13.5% of the data and achieving 3-15x training speedup. This makes effective bi-encoder training more accessible, especially for low-resource settings. https://t.co/QHzrutijSF

1

0

92

webis_de retweeted

Ferdinand Schlatt @fschlatt1

about 1 year ago

@maik_froebe @hscells @ShengyaoZhuang @bevan_koopman @guidozuc @bennostein @martinpotthast @matthias_hagen Short: Rank-DistiLLM: Closing the Effectiveness Gap Between Cross-Encoders and LLMs for Passage Re-ranking https://t.co/DkBWXYczJn Full: Set-Encoder: Permutation-Invariant Inter-Passage Attention for Listwise Passage Re-Ranking with Cross-Encoders https://t.co/Zm8CpQ2ASR