So, Claude Sonnet 4. Since I’ve got my own benchmark for historical Russian OCR (seeing how well LLMs handle those beautiful, weird old letters like ѣ and ї), testing new models only takes a few minutes. The result? It’s fine, but Gemini 2.5 is still the winner for now.
❗️"Aligned Translations for Digital Scholarly Editions: Methodologies, Methods, and Workflows"❗️
🗺️Rome
🗓️26-27/09
I will present "Manual and Automatic Alignment of I Promessi Sposi: Results and Open Issues" with @taiga
Program: https://t.co/tIwOHMjghg
@AIUCD@TizMancinelli
New blog: Challenges in Evaluating LLMs
I list five challenges to evaluating LLMs, which unfortunately seem to be ignored by many researchers. Which means that many published LLM evaluations cannot be trusted. This blog is based on a recent workshop talk
https://t.co/xPj0yPd4l1
Nice report on challenges in evaluating LLMs.
It also includes a section on best practices for language model evaluation.
Great read and lessons on the very difficult task of LLM evaluation.
https://t.co/ARV9YCeJ0w
@bylinina@average_ms 2) Usually I ask for "un caffe d'asporto per favore", but you rarely see Italians with a paper cup in their hands. It's quite a different coffee culture, as you said
The DH2024 @DHinDC2024 conference is approaching! Hurry and log in to the Conftool to register. For more details, visit the conference website https://t.co/rFxV9HgHdR #digitalhumanities#dh2024
Подписала петицию Greenpeace, отправила донат Мемориалу - вроде и день прошел не зря. А что не поработала - так еще завтра день будет. Хотя и недолгий, шестичасовой. https://t.co/TikesCKqLd
После истории со "Слугой народа" на ТНТ с удовольствием посмотрела этот сериал полностью (так бы и в голову не пришло интересоваться актерской работой Зеленского). Теперь вот думаю, не... https://t.co/b86XnExkqG