Yin-Jyun Luo @jun_luolo - Twitter Profile

Yin-Jyun Luo @jun_luolo

26 days ago

@liyzhen2 Now I am even more excited 🫨

0

17

Yin-Jyun Luo @jun_luolo

8 months ago

Diffusion autoencoders have been theoretically backed in terms of learning informative representations, and exploring the modellability of the learnt repre makes a lot sense! The tweak on leveraging frozen SSL model is cool.

jo.schb @jo_schb

8 months ago

💡 The idea We start from a frozen self-supervised encoder (DINOv2, MAE, or CLIP) and combine it with a generative decoder. Then we fine-tune only the [CLS] token embedding - injecting low-level info while keeping the rest frozen.

jo_schb's tweet photo. 💡 The idea

We start from a frozen self-supervised encoder (DINOv2, MAE, or CLIP) and combine it with a generative decoder.

Then we fine-tune only the [CLS] token embedding - injecting low-level info while keeping the rest frozen. https://t.co/eygmFuodpr

1

12

1

4

3K

0

14

1

8

2K

jun_luolo retweeted

Peter Sobot

@psobot

8 months ago

This is my team. I’m super excited to work on this stuff. If you are too, let’s talk: we’re hiring. https://t.co/D1apaw3Nqx

1

11

2

2K

jun_luolo retweeted

Kwang Moo Yi @kwangmoo_yi

11 months ago

Preprint of today: Vavilala et al., "Generative Blocks World: Moving Things Around in Pictures" -- https://t.co/2UV2B2qzXL I have a soft spot for reviving old ideas in modern methods -- block world via primitives now with Diffusion models for generating/editing images.

kwangmoo_yi's tweet photo. Preprint of today: Vavilala et al., "Generative Blocks World: Moving Things Around in Pictures" -- https://t.co/2UV2B2qzXL

I have a soft spot for reviving old ideas in modern methods -- block world via primitives now with Diffusion models for generating/editing images. https://t.co/AnooCbbuBL

1

71

16

44

6K

Who to follow

Emir Demirel

@_emir_demirel_

Music & Speech Tech Researcher

Qiuqiang Kong

@QiuqiangK

Assistant Professor at @CUHKofficial, previously at @ByteDanceTalk, Ph.D. at @UniOfSurrey

최형석 (Hyeong-Seok Choi)

@92HsChoi

Research @ElevenLabs

Yin-Jyun Luo @jun_luolo

9 months ago

@tkipf Yes - AudioSlots is yet another motivator behind our work. Though they are not directly comparable due to the problem domain, they share a common high level idea to encode constituent sources as separate latent entries which I can really appreciate

0

1

0

42

Yin-Jyun Luo @jun_luolo

10 months ago

@zacknovack The acronym that sticks in my head like the song does 😂 Good job!

0

1

0

143

Yin-Jyun Luo @jun_luolo

11 months ago

@keunwoochoi Oh wow 🤩 congrats 🥳

0

1

0

124

Yin-Jyun Luo @jun_luolo

over 1 year ago

@unilightwf fair enough 😅🫡

0

52

Yin-Jyun Luo @jun_luolo

over 1 year ago

Pumped to see a comeback of GMVAE among a sea of VQ! https://t.co/ZqPqIWRkTj Speaking of, Wei-Ning's https://t.co/W9q7jHCzeP on TTS has a substantial impact to my research on style transfer via (unsupervised) disentanglement. But it seems overshadowed by his own work HuBERT😅

jun_luolo's tweet photo. Pumped to see a comeback of GMVAE among a sea of VQ!

https://t.co/ZqPqIWRkTj

Speaking of, Wei-Ning's https://t.co/W9q7jHCzeP on TTS has a substantial impact to my research on style transfer via (unsupervised) disentanglement. But it seems overshadowed by his own work HuBERT😅 https://t.co/7orlMdDcTx

2

22

0

12

1K

Yin-Jyun Luo @jun_luolo

over 1 year ago

@92HsChoi The two-stage paradigm relies on 1st's reconstruction and 2nd's distribution modelling. Not noising posterior defo gives better 1st reconstruction. Not sure about the effect for 2nd but I think having a proper prior during 1st training is more impactful.🤔

0

51

jun_luolo retweeted

Keitaro Tanaka @Kakanat1105

over 1 year ago

主著論文がAPSIPA Trans.にアクセプトされました🙌 Our paper has been accepted for publication in APSIPA Transactions!!🚀 A big thanks to the co-authors (professors!), reviewers, and everyone who supported this work. Special mention to @jun_luolo, @_tai_shi, and @yoshipon0520🙏

Kakanat1105's tweet photo. 主著論文がAPSIPA Trans.にアクセプトされました🙌
Our paper has been accepted for publication in APSIPA Transactions!!🚀
A big thanks to the co-authors (professors!), reviewers, and everyone who supported this work. Special mention to @jun_luolo, @_tai_shi, and @yoshipon0520🙏 https://t.co/h8XxzCfw1S

0

36

6

1

5K

Yin-Jyun Luo @jun_luolo

over 1 year ago

At the NeurIPS workshop of Audio Imagination, we present a supervised method as a preliminary step towards answering these questions. https://t.co/tCnStND5Fw

0

5

0

1

233

Yin-Jyun Luo @jun_luolo

over 1 year ago

How to train a model to extract separate entities of rep. associated to individual sources of a music mixture? How to also divide each entity into subspaces of pitch and timbre? How to then have the model take arbitrary comb. of these building blocks to sample novel mixtures?

Yin-Jyun Luo @jun_luolo

over 1 year ago

By feeding to the decoder different combinations of pitch and timbre latents, we achieve applications such as: - instrument swapping between two mixtures or within a mixture - stem exchange between two mixtures

jun_luolo's tweet photo. By feeding to the decoder different combinations of pitch and timbre latents, we achieve applications such as:
- instrument swapping between two mixtures or within a mixture
- stem exchange between two mixtures https://t.co/BrdZZ0d45y

1

0

3

2K

1

22

2

4

2K

Yin-Jyun Luo @jun_luolo

over 1 year ago

There goes the one proof to my ISMIR presence. It was very nice to catch up w/ the Taiwanese Gang, and it's my honour to be confronted by "why are you still doing disentanglement?" That's right, I will also be presenting DisMix https://t.co/tCnStND5Fw at the NeurIPS Workshop!

Hao-Wen (Herman) Dong 董皓文 @hermanhwdong

over 1 year ago

@yoyolicoris @affige_yang @jun_luolo 😂

1

0

1K

0

15

0

820

Yin-Jyun Luo @jun_luolo

over 1 year ago

@ArxivSound A recent work has extended the proof to time series https://t.co/kDIneAYgdP

0

72

Yin-Jyun Luo @jun_luolo

over 1 year ago

@ArxivSound Great to more interests in pitch-timbre disentanglement. Using paired data with shared attributes has shown good results in speech https://t.co/yXfF1uLmaI https://t.co/V3cCUNe6DX They are also proven identifiable under assumptions https://t.co/J0PRYyK2fy https://t.co/IvBF3gGCPp

1

2

0

199

jun_luolo retweeted

arXiv Sound @ArxivSound

over 1 year ago

``Self-Supervised Multi-View Learning for Disentangled Music Audio Representations,'' Julia Wilkins, Sivan Ding, Magdalena Fuentes, Juan Pablo Bello, https://t.co/vWc7Ulfz2q

1

12

3

7

2K

Yin-Jyun Luo @jun_luolo

over 1 year ago

@unilightwf well, at the very least they do things more efficiently and I can't complain when disentanglement is proven useful in yet another task :)

0

136

Yin-Jyun Luo @jun_luolo

over 1 year ago

Audio codecs are turning into VQ-based voice conversion models with an extra focus on compression. Disentanglement seems to be the sauce.

arXiv Sound @ArxivSound

over 1 year ago

``LSCodec: Low-Bitrate and Speaker-Decoupled Discrete Speech Codec,'' Yiwei Guo, Zhihan Li, Chenpeng Du, Hankun Wang, Xie Chen, Kai Yu, https://t.co/1yMzq15hbK

0

19

2

9

3K

1

12

1

5

2K

Yin-Jyun Luo

@jun_luolo

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users