Patrick Pérez @ptrkprz - Twitter Profile

Pinned Tweet

over 1 year ago

As promised, we are sharing the technology behind Moshi: paper+models+inference code for everyone.

over 1 year ago

Today, we release several Moshi artifacts: a long technical report with all the details behind our model, weights for Moshi and its Mimi codec, along with streaming inference code in Pytorch, Rust and MLX. More details below 🧵 ⬇️ Paper: https://t.co/mMInmjiBIC Repo: https://t.co/PFak47FMrm HuggingFace: https://t.co/bqG4IS0ntg

50

2K

391

1K

855K

3

109

10

23

13K

Patrick Pérez @ptrkprz

over 1 year ago

changing air, entering blue sky, same handle

0

3

0

769

Patrick Pérez @ptrkprz

over 1 year ago

New sharing step on our journey towards easy-to-use fully-open models.

kyutai @kyutai_labs

over 1 year ago

Meet Helium-1 preview, our 2B multi-lingual LLM, targeting edge and mobile devices, released under a CC-BY license. Start building with it today! https://t.co/X4Dbx2T1cJ

10

374

89

158

58K

1

14

2

0

1K

ptrkprz retweeted

Alexandre Défossez @honualx

over 1 year ago

I’ll be presenting a deep dive into how Moshi works at the next NLP Meetup in Paris, this Wednesday the 9th at 7pm. Register if you want to attend ! 🧩🔎🟢 https://t.co/1ZPb105JKX

5

72

10

14

11K

Patrick Pérez @ptrkprz

over 1 year ago

Serious stress testing!

Neil Zeghidour

@neilzegh

over 1 year ago

Voice AIs handle speaker turns & interruptions with Voice Activity Detection. VAD is brittle and will trigger due to background noise, creating frequent hiccups. Moshi gets rid of it completely, so you can use it in the most chaotic settings. I myself couldn't hear Moshi here 😅

4

206

31

98

32K

0

2

0

643

ptrkprz retweeted

Andrej Karpathy

@karpathy

over 1 year ago

Moshi is a very nice/fun conversational AI audio 🔊 model release from @kyutai_labs . Are you slowly losing faith in the objective reality and existence of Advanced Voice Mode? Talk to Moshi instead :) You can talk to it on their website: https://t.co/5QaFspEMkj Or even locally on your Apple Silicon Mac with just: $ pip install moshi_mlx $ python -m moshi_mlx.local_web -q 4 I find the Moshi model personality to be very amusing: it is a bit abrupt, it interrupts, it is a bit rude but somehow in a kind of endearing way, it goes off on tangets, it goes silent for no reason sometimes, so it's all a bit confusing but also very funny and meme-worthy. This video "it's just the pressure" / "i just like working on projects" is a good example, soooo funny: https://t.co/Go1nQMkBnj But in any case, it's really cool that I can even run this kind of voice interaction with my Macbook, that the repo is out on GitHub along with a detailed paper, and I certainly look forward to effortlessly talking to our computers in end-to-end ways, without going through intermediate text representations that lose a ton of information content.

70

3K

315

2K

511K

ptrkprz retweeted

Aaron

@hertzfelt_io

almost 2 years ago

Watch @kyutai_labs #moshiai and @OpenAI #gpt4o discuss #AGI. #AI #gpt4ovoice

0

9

4

2

1K

Patrick Pérez @ptrkprz

almost 2 years ago

can even be explored on a vacation beach or a conference center, as Moshi is robust to noisy environments

kyutai @kyutai_labs

almost 2 years ago

"Hippie" Moshi tells its love for Hendrix...but "skeptical" Moshi is less enthusiastic about psychedelic rock. Moshi can play 70+ emotions, will you catch them all? Try now at https://t.co/lU2sqa8wMQ

11

82

12

21

14K

0

4

0

428

Patrick Pérez @ptrkprz

almost 2 years ago

Meet our ambassador!

kyutai @kyutai_labs

almost 2 years ago

If you're attending ICML and want to learn more about Kyutai and Moshi, reach out to Edouard!

0

24

4

1

8K

0

1

0

263

Patrick Pérez @ptrkprz

almost 2 years ago

Staying in real-time connection with voice AI in Paris while being in Vienna

Edouard Grave @EXGRV

almost 2 years ago

Moshi goes to #ICML2024 in Vienna! Try the demo at https://t.co/weFG6cmhDT

9

39

5

3

9K

0

2

0

297

Patrick Pérez @ptrkprz

almost 2 years ago

The attentive listener will notice that even when speaking over Alex, Moshi still listens (taking into account the "in space" instruction for the second poem)

Alexandre Défossez @honualx

almost 2 years ago

Some Moshi extracts! Get your own at https://t.co/SVQZQ9UlEN Don't forget to click the "Download video" at the end (if it's good) 🟢

7

121

16

39

57K

2

11

1

0

894

Patrick Pérez @ptrkprz

almost 2 years ago

And our demo runs in the US thanks to a donation from @huggingface

Patrick Pérez @ptrkprz

almost 2 years ago

Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.

0

9

1

0

1K

0

5

0

846

Patrick Pérez @ptrkprz

almost 2 years ago

Thanks @Thom_Wolf Moshi experimental voice AI is indeed a crazy adventure / a radical innovation / a new technology / a surprising experience / a research prototype / a shared resource / a starting point…. not a productized conversational bot.

Thomas Wolf

@Thom_Wolf

almost 2 years ago

The @kyutai_labs fully end-to-end audio model demo of today is a huge deal that many people missed in the room Mostly irrelevant are the facts that: - they come a few week after OpenAI ChatGPT-4o - the demo was less polished than the 4o one (in terms of voice quality, voice timing…) Relevant: - the model training pipeline and model archi are simple and hugely scalable, with a tiny 8+ people team like Kyutai building it in 4 months. Synthetic data is a huge enabler here - laser focus on local devices: Moshi will soon be everywhere. Frontier model builders have low incentive to let you run smaller models locally (price per token…) but non-profits like Kyutai have very different incentives. The Moshi demo is already online while the OpenAI 4o one is still in limbo. - going under 300 ms of latency while keeping Llama 8B or above quality of answers is a key enabler in terms of interactivity, it’s game changing, This feeling when the model answer your question before you even finished asking is quite crazy or when you interrupt the model while it’s talking and it react… Predictive coding in a model, instantly updated model of what you’re about to say... Basically they nailed the fundamentals. It’s here. This interactive voice tech will be everywhere. It will soon be an obvious commodity.

70

2K

349

866

339K

0

9

1

0

1K

Patrick Pérez @ptrkprz

almost 2 years ago

Research internships at @kyutai_labs are fun, beside the hard work! A good session by @RamaAdrien

kyutai @kyutai_labs

almost 2 years ago

Moshi is not an assistant, but rather a prototype for advancing real-time interaction with machines. It can chit-chat, discuss facts and make recommendations, but a more groundbreaking ability is its expressivity and spontaneity that allow for engaging into fun roleplay.

3

47

3

8

8K

0

13

2

1K

Patrick Pérez @ptrkprz

almost 2 years ago

It feels so good to have shared at last what we have been up to in the past 6 months. We worked hard on this unique voice AI, carefully training it on a mix of text and speech, making it multi-stream and real-time, and putting it in an online demo for everyone to experience it.

kyutai @kyutai_labs

almost 2 years ago

Yesterday we introduced Moshi, the lowest latency conversational AI ever released. Moshi can perform small talk, explain various concepts, engage in roleplay in many emotions and speaking styles. Talk to Moshi here https://t.co/a4EbAQiih7 and learn more about the method below 🧵.

60

743

176

309

110K

4

55

5

2

4K

Patrick Pérez @ptrkprz

almost 2 years ago

Please @abursuc keep one for me!

Andrei Bursuc @CVPR @abursuc

almost 2 years ago

We've just launched our BRAVO robustness and reliability challenge for semantic segmentation. I and @tuan_hung_vu will be giving away these nice stickers @CVPR Ping us or catch us at the posters to find out more! #CVPR2024

abursuc's tweet photo. We've just launched our BRAVO robustness and reliability challenge for semantic segmentation.
I and @tuan_hung_vu will be giving away these nice stickers @CVPR
Ping us or catch us at the posters to find out more!
#CVPR2024 https://t.co/9rEm1Ru0nr

1

16

1

3K

0

5

0

1

775

ptrkprz retweeted

Amir Zamir

@zamir_ar

almost 2 years ago

We are releasing 4M-21 with a permissive license, including its source code and trained models. It's a pretty effective multimodal model that solves 10s of tasks & modalities. See the demo code, sample results, and the tokenizers of diverse modalities on the website. IMO, the multitask learning aspect of multimodal models has really taken a step forward. We can train a single model on many diverse tasks with ~SOTA accuracy. But a long way to go in terms of transfer/emergence. 🌐 https://t.co/TsN5JK2Dud ⌨️ https://t.co/CY7NZ2yoFy Joint work w/ @EPFL_en @Apple.

7

351

93

226

69K

ptrkprz retweeted

valeo.ai @valeoai

almost 2 years ago

📢We introduce the ScaLR models (code+checkpoints) for LiDAR perception distilled from vision foundation models tl;dr: don’t neglect the choice of teacher, student, and pretraining datasets -> their impact is probably more important than the distillation method #CVPR2024 🧵 [1/8]

valeoai's tweet photo. 📢We introduce the ScaLR models (code+checkpoints) for LiDAR perception distilled from vision foundation models
tl;dr: don’t neglect the choice of teacher, student, and pretraining datasets -> their impact is probably more important than the distillation method #CVPR2024
🧵
[1/8] https://t.co/SGO9GABUvj

1

32

11

6

9K

ptrkprz retweeted

F. Güney @ftm_guney

about 2 years ago

we’ve got multiple PhD and postdoc positions funded by my #ERCstg project ENSURE. if you’re interested in computer vision and self-driving, please consider applying. graduate students: apply ASAP! details at https://t.co/kfcbRFyXZe postdocs: send me an email with your CV and brief research interests at [email protected] we offer competitive (euro-based) scholarships and salaries for Turkey standards.

7

105

28

35

38K

ptrkprz retweeted

Ian Hogarth @soundboy

about 2 years ago

1/ Today the UK's AI Safety Institute is open sourcing our safety evaluations platform. We call it "Inspect": https://t.co/7trBzgw9hw

7

287

77

161

78K

Patrick Pérez

@ptrkprz

Last Seen Users on Sotwe

Trends for you

Most Popular Users