Jindřich Libovický @jlibovicky - Twitter Profile

over 1 year ago

Join Mu-SHROOM 🍄, a SemEval 2025 shared task on detecting hallucination spans in multilingual LLM outputs! 🌍 Includes Czech with regional Czech questions 🇨🇿. Do you think you can spot when something isn’t true? 🤔 Try it out! 👉 https://t.co/SOU1YTtq2g #SemEval2025 #NLProc

0

7

2

1

624

Jindřich Libovický @jlibovicky

over 1 year ago

Happy holidays! 🎄🎅🤩🎁

0

7

0

1

243

Jindřich Libovický @jlibovicky

over 1 year ago

Highlights from multilingual #NLProc and machine translation papers I found on arXiv in November are now on my blog: https://t.co/X9X3OSRUPn

2

0

1

176

Jindřich Libovický @jlibovicky

over 1 year ago

This is going to be fun! 🤓 We have three years to spend 6.5M CZK on improving multilingual tokenization. The goal is to make subwords more alignable across languages and help languages that suffer from over-segmentation with current models.

Institute of Formal and Applied Linguistics @ufal_cuni

over 1 year ago

Good news! 🥳 GAČR will fund two of our projects: 👉 @jlibovicky proposes to better tokenization for #LLMs and machine translation 👉 Veronika Kolářová will study syntactic features of Czech non-verbal predicates ➕ Dominik Macháček receives Postdoc Individual Fellowship! 💪

0

17

2

1

873

0

13

0

2

459

Who to follow

EMNLP 2026

@emnlpmeeting

EMNLP 2026 - The 2026 Conference on Empirical Methods in Natural Language Processing Hashtag: #EMNLP2026 Dates: October 24 –29 Submission: ACL ARR March and May

eaclmeeting

@eaclmeeting

The European Chapter of the Association for Computational Linguistics An annual Top-tier *ACL conference. #EACL2027 #NLProc March 9-14, 2027

Ivan Titov

@iatitov

Professor of Natural Language Processing at Uni Edinburgh / Uni Amsterdam

Jindřich Libovický @jlibovicky

over 1 year ago

@ufal_cuni @tuetschek @prg_ai @CharlesUniPRG @UniKarlova @Marfyz @ERC_Research @vedavyzkum_cz @PavlaHub Congrats! 🍻

0

1

0

111

Jindřich Libovický @jlibovicky

over 1 year ago

Just shared my takeaways from #EMNLP2024 on my blog: https://t.co/NutdStiWrM

0

5

0

3

368

Jindřich Libovický @jlibovicky

over 1 year ago

Find me on 🦋 (and the rest of #NLProc folks too).

0

1

0

238

Jindřich Libovický @jlibovicky

over 1 year ago

There's no clear winner this year's MRL shared task, but we ended up in the cluseer of top-3 teams. I'm so proud of you, folks ☺️

Institute of Formal and Applied Linguistics @ufal_cuni

over 1 year ago

Finally, @kat_haem and Gianluca Vico presented one of the three price-winning 🏆🤑 submissons for the shared task on multilingual named entity recognition and question answering! w/ @AndreiM85400815, @jindra_helcl and @jlibovicky. Congrats! https://t.co/kZxNr3tKpY

ufal_cuni's tweet photo. Finally, @kat_haem and Gianluca Vico presented one of the three price-winning 🏆🤑 submissons for the shared task on multilingual named entity recognition and question answering! w/ @AndreiM85400815, @jindra_helcl and @jlibovicky. Congrats! https://t.co/kZxNr3tKpY https://t.co/CWPU6MpkqR

0

6

2

0

965

0

12

0

539

Jindřich Libovický @jlibovicky

over 1 year ago

Thanks to everyone who stopped by the poster ☺️

Institute of Formal and Applied Linguistics @ufal_cuni

over 1 year ago · Miami

#EMNLP2024 starts today and @ufal_cuni is here! We start with @jlibovicky presenting work with @jindra_helcl: Lexically Grounded Subword Segmentation https://t.co/R5FNXF31MA

ufal_cuni's tweet photo. #EMNLP2024 starts today and @ufal_cuni is here! We start with @jlibovicky presenting work with @jindra_helcl: Lexically Grounded Subword Segmentation https://t.co/R5FNXF31MA https://t.co/d0Fvtuvz2B

2

25

1

1K

0

4

0

219

Jindřich Libovický @jlibovicky

over 1 year ago

This week I am at #EMNLP2024 in Miami 🌴🇺🇸. Find me 🕵️ or message 💌 me if you want to chat about multilinguality or tokenization and stop by our poster on Tuesday at 2 p.m., I'll present our paper on lexically Grounded Subword Segmentation https://t.co/R7W28p5BeZ

jlibovicky's tweet photo. This week I am at #EMNLP2024 in Miami 🌴🇺🇸. Find me 🕵️ or message 💌 me if you want to chat about multilinguality or tokenization and stop by our poster on Tuesday at 2 p.m., I'll present our paper on lexically Grounded Subword Segmentation https://t.co/R7W28p5BeZ https://t.co/54iPy55Ueg

0

9

1

648

Jindřich Libovický @jlibovicky

over 1 year ago

Summaries of #multilingual #LLM and machine translation papers I liked in October are now on my blog https://t.co/Pg6mMtNe9J and also on Medium https://t.co/6clmWCKbLq

0

7

0

3

301

jlibovicky retweeted

Jindra Helcl @jindra_helcl

over 1 year ago

... starring @jlibovicky and me as young and perspective scientists with their impeccable movie editing skills

0

4

1

0

169

Jindřich Libovický @jlibovicky

over 1 year ago

If you liked the video, read our paper https://t.co/nAw9SCuGNh or check our code https://t.co/sGnpr1Uane https://t.co/92y9Uc4sP3

Jindřich Libovický @jlibovicky

over 1 year ago

In our #EMNLP2024 paper with @jindra_helcl, we present a new subword tokenization method that is more morphologically plausible but maintains the nice properties of existing tokenizers. Pre-print: https://t.co/Dqx0N6k7kr Code: https://t.co/s3pztuSk8N 👇🧵1/4

jlibovicky's tweet photo. In our #EMNLP2024 paper with @jindra_helcl, we present a new subword tokenization method that is more morphologically plausible but maintains the nice properties of existing tokenizers.

Pre-print: https://t.co/Dqx0N6k7kr
Code: https://t.co/s3pztuSk8N
👇🧵1/4 https://t.co/2BW16q1F7H

4

25

2

10

3K

0

153

Jindřich Libovický @jlibovicky

over 1 year ago

In a week, @jindra_helcl and I will present our paper Lexically Grounded Subword Segmentation at #EMNLP2024 in Miami 🌴🇺🇸. You can already watch our video 🎥 https://t.co/g88FRIeVoo or stop by our poster 👋 next Tuesday at 2 p.m...

2

12

1

376

Jindřich Libovický @jlibovicky

over 1 year ago

Summaries of a few papers that I noticed on arXiv during summer are now on my blog: https://t.co/kjHP1Bhj9y and on Medium https://t.co/t94pmSq7Af.

0

9

1

4

846

Jindřich Libovický @jlibovicky

over 1 year ago

👍 It works great for preserving morpheme boundaries. 👍 Does a good job in POS tagging. 👎 No improvement in machine translation. And bad news, @zouharvi, our downstream performance does not correlate with Rényi efficiency. 🤷‍♂️ 🧵4/4

jlibovicky's tweet photo. 👍 It works great for preserving morpheme boundaries.
👍 Does a good job in POS tagging.
👎 No improvement in machine translation.

And bad news, @zouharvi, our downstream performance does not correlate with Rényi efficiency. 🤷‍♂️
🧵4/4 https://t.co/HsXupxLQLg

1

4

0

226

Jindřich Libovický @jlibovicky

over 1 year ago

In our #EMNLP2024 paper with @jindra_helcl, we present a new subword tokenization method that is more morphologically plausible but maintains the nice properties of existing tokenizers. Pre-print: https://t.co/Dqx0N6k7kr Code: https://t.co/s3pztuSk8N 👇🧵1/4

4

25

2

10

3K

Jindřich Libovický @jlibovicky

over 1 year ago

Then, we find segmentations with subwords with the closest embedding closest to the word embedding. We collect bigram stats from those and use them in a bigram-LM-based segmenter (a generalization of SentencePiece). And we also do some experiments... 🧵3/4

1

3

0

208

Jindřich Libovický @jlibovicky

over 1 year ago

In the paper introducing the dataset https://t.co/cj7OrNW5mF, we also present a method based on hard-negative sampling on the text side of the model that significantly improves the model's ability to distinguish details.

jlibovicky's tweet photo. In the paper introducing the dataset https://t.co/cj7OrNW5mF, we also present a method based on hard-negative sampling on the text side of the model that significantly improves the model's ability to distinguish details. https://t.co/XFWxqz6rzt

0

108

Jindřich Libovický @jlibovicky

over 1 year ago

📣 We have a dataset! ❓Have you also noticed that language-vision encoders like CLIP do not pay attention to details? ❓ Do you think your model is doing better? 👉 InpaintCOCO dataset https://t.co/GBJJgVAU9j is here for you. Work of @phiyodr, folks from @unibw_m, and myself.

jlibovicky's tweet photo. 📣 We have a dataset!
❓Have you also noticed that language-vision encoders like CLIP do not pay attention to details?
❓ Do you think your model is doing better?
👉 InpaintCOCO dataset https://t.co/GBJJgVAU9j is here for you.
Work of @phiyodr, folks from @unibw_m, and myself. https://t.co/pPqdVgKgYr

1

8

1

462

Jindřich Libovický @jlibovicky

over 1 year ago

It consists of minimum pairs of images and captions derived from the MS COCO test set. Annotators used object detection and Stable Diffusion Inpanting 👨‍🎨👩‍🎨 to get images with either different objects or objects of different colors and sizes. Everything's 100% human-supervised. 💪

jlibovicky's tweet photo. It consists of minimum pairs of images and captions derived from the MS COCO test set. Annotators used object detection and Stable Diffusion Inpanting 👨‍🎨👩‍🎨 to get images with either different objects or objects of different colors and sizes. Everything's 100% human-supervised. 💪 https://t.co/hr1z1Drp2U

1

0

131

Jindřich Libovický

@jlibovicky

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users