Excited to share that I'll be joining @thomsonreuters Labs in Zug, Switzerland, as an Applied AI Scientist Intern from April to September 2026! ๐๏ธ
A nice bridge between finishing my MSc at JKU Linz and starting my PhD later this year - let me know if you're around!
@usmasfr Cool use case! You can start with our default setup, using 30 epochs, here:
https://t.co/5Ybn8dpQMh
Otherwise, I'd focus more on clean segmented training data than tuning epochs; >100 segmented training sentences would be good. You can also tune LoRA rank a bit if needed.
wtpsplit now supports length-constrained segmentation โ๏ธ
min/max chunk length (chars) while preserving semantic chunks - should be great for RAG!
Example (โค30 chars):
[Landing 5pm โ Beimen.]
[Let's meet at: Ximen Exit 6.]
[Then: Ningxia Night Market...]
[Late-night snack!']
I'm at #EMNLP2025 in Suzhou this year!
Looking forward to connecting with the community after a year's break and spending some time abroadใใใๅ่ฆ๏ผ
Excited to share two new papers on AI-generated music detection from my research internship at @Deezer, published in @ismir_conf #ISMIR2025 and @aclmeeting#ACL2025 Findings! ๐ถ๐ค
The problem: most AI music detectors are impractical or unreliable in real-world settings.
@Deezer@aclmeeting I had a great time working on this with @deezer in Paris! Big thanks to my mentors @evpure, @Gabolsgabs, and @m_schedl!
๐ป Code: https://t.co/mVoJWH1tZ7
๐ ISMIR Paper (foundation): https://t.co/vXtQZp4xce
๐ ACL Paper (Multi-View Double Entendre): https://t.co/SkmKsqFS7b
Excited to share two new papers on AI-generated music detection from my research internship at @Deezer, published in @ismir_conf #ISMIR2025 and @aclmeeting#ACL2025 Findings! ๐ถ๐ค
The problem: most AI music detectors are impractical or unreliable in real-world settings.
@Deezer@aclmeeting I view this work as an important extension of current single-modality detectors while maintaining flexibility and modularity. It's not production-ready, but it highlights key paradigms for detection:
Using all available information from just the audio and a focus on robustness.
Wtpsplit, our text segmentation tool, just reached โญ๏ธ1000 starsโญ๏ธ on GitHub! Excited to see it is proving useful!
Check it out here: https://t.co/IHH1GVemv3 ๐
We created Approximate Likelihood Matching, a principled (and very effective) method for *cross-tokenizer distillation*!
With ALM, you can create ensembles of models from different families, convert existing subword-level models to byte-level and a bunch more๐งต
Curious about our SoTA text segmentation tool? ๐ช It's gonna help you across all kinds of NLP tasks!
Learn more at our poster session: Tuesday, 4pm, Jasmine room at #EMNLP2024! ๐๏ธ
See you there!
I'll be attending the whole conference - happy to connect with everyone! ๐
Excited to share that I joined @researchdeezer as a research intern to work with @evpure and @Gabolsgabs on detecting AI-generated lyrics !๐ถ
The first few weeks have been amazing, and I am excited about what is to comeโlife in Paris certainly has unparalleled charm!
This was an awesome summer! I can only recommend ETH's summer research fellowship program ๐๏ธ
Also happy about the project's progress - integrating videos into existing architectures is quite exciting, stay tuned! Super grateful to Ryan Cotterell and @glnmario for supervising me.
Excited to share that I joined @ETH Zรผrich as a summer research fellow, supervised by Prof. @ryandcotterell, working on โจMultimodal LLMs! โจ
The first few weeks have been a blast, and I'm looking forward to the weeks ahead! ๐ฝ๏ธ
Congratulations to C4AI Research Grant recipient
@FrohmannM and all authors of "Segment Any Text: A Universal Approach for Robust, Efficient and Adaptable Sentence Segmentation" for their EMNLP acceptance!๐ฅณ