We are at #LREC2026 presenting HPLT v3 Datasets, monolingual, parallel, massive, highly curated. For depth info and analysis, please:
Join us in room Menorca 1 at 16:20!!!
Also, today, know more about bechmark contamination impact goint to the poster of our colleagues from the unversities of Helsinki and Turku and the ELLIS Institute Finland.
Quite a nice "representation" of the OpenEuroLLM crowd will be at the International Conference on Learning Representations (ICLR) this week.
On Friday 24, come to poster "OpenThoughts: Data Recipes for Reasoning Models", work partially supported by our project, and meet us!๐
Experimenting with model-based annotation for better data selection? A candidate to consider is propella-1, a multi-property annotator partially funded by #OpenEuroLLM which is fully open-source.
๐Code, annotations and paper available! https://t.co/oemVhuO8pR
We released propella-1, a small model for advanced pre-training data annotation ๐.
Work led by @maxidahl within the @OpenEuroLLM project. Link to model + annotations for important pre-training datasets below ๐
๐ One year of OpenEuroLLM!
๐ช๐บWeโre building Europeโs next-gen open-source LLMs to boost digital sovereignty.
More about our achievements and next steps for infrastructure, data, models and evaluation at https://t.co/ldZBYE9oDA.
Year 2 = full speed ahead. ๐
Go #OpenEuroLLM
First OpenEuroLLM Winter School in collaboration with the @CircleU_eu Alliance ๐งโ๐and the Nordic Language Processing Laboratory ๐งโ๐ป
Focus on Multilinguality in LLM Development and Evaluation with speakers from world organisations, academia and industry.
https://t.co/uDQokvCivN
Strategic access to EuroHPC resources granted to OpenEuroLLM!!!
-first AI project granted strategic access across multiple EuroHPC centres
-for over 10 million GPU hours
Thanks @EUComission and @EuroHPC_JU!
Proud to present the @OpenEuroLLM project and its results so far, with Sampo Pyysalo (@UniTurku) at the 1st Workshop on Open Source Sovereign LLMs in Berlin https://t.co/gMalOuRGm3 Great opportunity to talk to many OS LLM developers! @CharlesUniPRG@hplt_eu
Future-proof AI in all EU languages isnโt a dream, itโs OpenEuroLLM ๐ฃ๏ธ๐ฌ
9 countries, the EU budget & STEP join forces to build transparent, AI Act-compliant tech for Europeโs innovators.
Find out how we will turn ambition into action for 2028-2034: https://t.co/ZekDsBaz6M
The #HPLT crowd is at #EMNLP2025!!!
If you are around, please visit our booth to discuss:
- multilingual datasets ๐
- dataset insights and stats ๐
- dataset performance ๐
- efficient MT models โฑ๏ธ
- and the future of multilingual LLMs ๐ก
We don't want to miss U!
OpenEuroLLM completing 2 days of sharing progress and next steps pursuing the goal of developing strong multilingual foundation models aligned with European strategic vision & standards.
Gathering at BSC nearby MareNostrum 5 supercomputer made us feel home.
#Barcelona#NLProc
LeoLM has since been an inspiration for many other projects (like our DiscoLM 8b, the @occiglot models, and more) and serves as a conceptual baseline for some ideas within the @OpenEuroLLM project to bring strong LLMs to all European languages.
Our co-founders project #LeoLM highlighted by @bmftr_bund.
Today, weยดre continuing what started as a student`s side-project with @OpenEuroLLM (and more to come).
If you want to work on Open Source AI, multilingual applications and AI evaluations as well - weยดre hiring! ๐
This was first, but surely not the last colab between open-sci, @laion_ai and @openEuroLLM. Establishing baselines and good starting grounds for experiments to create strong open foundation models is important, and I am happy to see it worked out so well.
๐ข First release: 38 monolingual reference LLMs (2.15B params) via @hplt_eu + #OpenEuroLLM
โ๏ธTrained on 100B tokens from HPLT v2 dataset
๐ Cover EU langs + others
โ๏ธ Based on LLaMA, trained on #LUMI
๐ Useful for evaluation
Downloads + more info at https://t.co/vp1RwD9YFy
It's time for transparent AI in Europe. It's time for open LLMs as a robust foundation for developing future private and public AI services. It's time for:
OPEN = open-source
Euro = under EU regulations, representing EU values
LLM = LLMs
https://t.co/K5MlOVS7DX