Building on the SHAS proposed by Tsiamas et al. (2022) using a pre-trained speech encoder called wav2vec 2.0, our research aimed to improve the accuracy and efficiency of speech translation by refining the segmentation process.
We've posted a paper on arXiv titled "Improving Speech Translation Accuracy and Time Efficiency with Fine-tuned wav2vec 2.0-based Speech Segmentation."
https://t.co/H3ychSHdfb