Excited to share that I’m joining @MIL_UTokyo at The University of Tokyo as a Project Assistant Professor! 🎉 Working at the cutting edge of Speech × AI. 🇯🇵🔊🤖 #AI#SpeechTech
Honored to receive the Gold Reviewer Award and complimentary registration for @icmlconf 2026 🏅
Reviewing has been one of the most fulfilling parts of research for me, and I’m grateful for the opportunity to contribute to the community through constructive feedback.
#ICML2026
@unilightwf If you do not have sudo access to apt install, you could install cmake via conda from conda-forge, provided you are on a conda environment.
@unilightwf Regarding not reading whole audio, for SE/MSS task, training could be done on fixed-length segments and for inference overlap-add could be used. That way we just need to read short segments from the file rather than the whole audio.
@unilightwf Speed: On my system, I get around 8-10 batches of 16 samples per second with 8 workers for online and around 15-16 batches per second for offline with same settings. But my model (~20M param) can only consume 2-3 batches per second, so online is not a bottleneck.
@unilightwf Additionally, it is faster to first read audio metadata and sample smaller segments, rather than reading the whole file and then chopping it up.
@unilightwf URGENT 2026 Track 1 baseline includes both on-the-fly and offline noise/distortion augmentation:
🔗 https://t.co/8lfUDqmdw1
Large models, using 8–16 DataLoader workers is typically enough to avoid I/O bottlenecks, even with sox/ffmpeg augmentations in on-the-fly mode.
If you missed my keynote at INTERSPEECH-2025 (or would like to see it again), it’s now available online at https://t.co/sjXWmsaz9L - my bit is Keynote 1 and it starts at 1:05:30
Traveling to Rotterdam 🇳🇱 for INTERSPEECH 2025!
I’ll be presenting our paper in the URGENT Challenge Special Session (Area 14, SS2) at 15:45.
Our system ranked 3rd in the challenge.
📄 https://t.co/EBai5fKDnR
#INTERSPEECH2025#SpeechEnhancement#URGENTChallenge
🔗 My PhD work covers the following papers:
• HyperVQ: MLR-based Vector Quantization in Hyperbolic Space (TMLR)
👉 https://t.co/sCKxdA9UGx
• EDM-TTS: Efficient Dual-Stage Masked Modeling for Alignment-Free TTS (TMLR)
👉 https://t.co/3c2MdJSmc5
🎓 Successfully defended my PhD thesis today!
Title: Efficient Discrete Speech Modeling via Non-Autoregressive Methods for Joint Synthesis and Recognition
Grateful to my advisor, committee, and everyone who supported me on this journey. Onwards!🚀 #PhDDefense#SpeechAI#TTS#ASR
Excited to present my poster in ICLR!
Do drop by if you are around!
#50: T2V2: A Unified Non-Autoregressive Model for Speech Recognition and Synthesis via Multitask Learning
Poster Session 3
Friday, April 25, 2025
10:00 - 12:30
Hall 3 + 2B
#ICLR25#ICLR