3 papers accepted at Interspeech 2026! ๐ Proud to see my role slowly transitioning from a mentee to a mentor. Huge thanks to my collaborators for letting me be a part of their journey!
What if you had nano-banana for audio?
AudioChat is a multi-modal LM that performs fine-grained understanding, generation, and editing of multi-source scenes
By diffusing continuous latents, it generates 48khz stereo edits with great input adherence:
https://t.co/BV08OkjOCT
โจNew paperโจ
We find script (e.g. Cyrillic, Latin) to be a linear direction in the activation space of Whisper, enabling transliteration at test-time by adding such script directions to the activations โ producing e.g. Cyrillic Japanese transcriptions.
4 papers submitted & accepted at ACL 2026! ๐ So grateful to work alongside & learn from amazing minds, pushing the boundaries of speech technologies, machine learning, and computational linguistics. See you in San Diego!
๐๐๐ฅ๐-๐ฌ๐ฎ๐ฉ๐๐ซ๐ฏ๐ข๐ฌ๐๐ ๐๐ฉ๐๐๐๐ก ๐๐จ๐๐๐ฅ๐ฌ ๐๐ซ๐ ๐๐ก๐จ๐ง๐จ๐ฅ๐จ๐ ๐ข๐๐๐ฅ ๐๐๐๐ญ๐จ๐ซ ๐๐๐๐ก๐ข๐ง๐๐ฌ!
๐ฃ๏ธ Excited to be giving an invited talk this Thursday (March 19th, 3pm Amsterdam time)!
Huge thanks to @mariannedhk at Univ. of Amsterdam for the invite ๐
Huge thanks for my wonderful coauthors, Eunjung and Cheol-jun, and my two favorite Davids, Mortensen ๐ and Harwath ๐ค โ best advisors I could ask for ๐ Can't wait to see what we cook up next! ๐
๐๐๐ฅ๐-๐ฌ๐ฎ๐ฉ๐๐ซ๐ฏ๐ข๐ฌ๐๐ ๐๐ฉ๐๐๐๐ก ๐๐จ๐๐๐ฅ๐ฌ ๐๐ซ๐ ๐๐ก๐จ๐ง๐จ๐ฅ๐จ๐ ๐ข๐๐๐ฅ ๐๐๐๐ญ๐จ๐ซ ๐๐๐๐ก๐ข๐ง๐๐ฌ!
๐ฃ๏ธ Excited to be giving an invited talk this Thursday (March 19th, 3pm Amsterdam time)!
Huge thanks to @mariannedhk at Univ. of Amsterdam for the invite ๐
๐งต Together, both papers take a step beyond the usual "what info do S3Ms encode" probing paradigm. We aim to answer how is that info actually encoded geometrically? Come see for yourself Thursday! ๐
Slides: https://t.co/N8LiKPcpid
๐Apply to CMU LTIโs Summer 2026 โLanguage Technology for Allโ internship๐Open to preโdoctoral students new to language tech (nonโCS backgrounds welcome). ๐ฌ12-14 weeks inโperson in Pittsburgh; travel + stipend paid.๐ธDeadline: Feb 20, 11:59pm ET. https://t.co/7SuItDHH98
๐ต๐ฌ If you are interested in Audio Tokenisers, you should check out our new work!
We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more.
Grab yourself a โ/๐บ and sit down for a read!
Excited to receive the Best Student Paper Award at #Interspeech2025 I started the OWSM project in 2023. It took me great effort to design a robust and scalable training framework using ESPnet, prepare unified data formats, and conduct large-scale training with academic resources.
Our work on OWSM v4 received the Best Student Paper Award at #Interspeech2025! ๐๐
Huge congratulations to the team! ๐๐
Iโm especially happy to see our open science efforts for speech foundation models recognized by the community. ๐
๐ https://t.co/wx7uN7PYNw
I will be presenting 3 papers from @WavLab at #Interspeech2025 ๐ณ๐ฑ
One is OWSMv4 (led by @pengyf21), nominated for best student paper
https://t.co/UF9JHNrP8b
It focuses a lot on data cleaning, particularly for non-English languages
It will be an oral on Tues 15:10 at dock 10B.
This wouldn't have been possible with my awesome co-first-author @mmiagshatoy and wonderful supervisors @shinjiw_at_cmu and Emma Strubell!
I'll see you at Rotterdam, Wed 17:00-17:20 Area8-Oral4 (Streaming ASR)! (10/10)
We also verified that DSUs are learnable with smaller weights (# of layers), i.e., more lightweight! This implies that we're using self-supervised models inefficiently when extracting DSUs. (8/n)
There's also bunch of engineering tricks that can improve the performance. We provide a pareto-optimal baseline after applying all the available tricks, positioning our work as a foundation for future works in this direction. https://t.co/Nbi8Mo06Qi (9/n)