Happy to have been part of this paper. This is my first EACL paper, but wont be the last. I thank everyone who enabled me to reach this height, especially my supervisors Dr. @ransurangika and Dr. @NisansaDdS
In our upcoming @eaclmeeting paper on the utilization potential of the quality of web-mined corpora, we discuss how you may build better translation models by automatically sorting the training samples and using the top samples.
Paper: https://t.co/4bLuYdMqOQ
1/n
@PontiEdoardo@FSoudan By latent tokenization you mean the concept introduced by byte latent transformer?
https://t.co/tHHlepraxY
Was this concept existing before this paper?