It's been super fun pushing what's possible with on-device TTS with such a great team. So proud of what we've built - and sharing it at our Summit in Warsaw, where ElevenLabs began, feels pretty special. 🇵🇱
At the ElevenLabs Summit in Warsaw, we previewed on-device Text to Speech - a new model architecture that delivers human-level quality on limited hardware without an internet connection.
Introducing Dubbing v2, our revolutionary new dubbing model.
For the first time, the emotion and performance of the original content is carried over into every language.
So excited to welcome the Papla team on board at @ElevenLabs to help accelerate our work on the best voice interfaces - elevating model quality, building on-device, and scaling forward-deployed engineering.
@dabkowski_piotr and I first met @HubertSiuzdak four years ago (!) at one of the first conferences we attended: Interspeech. He later went on to co-found Papla with three amazing engineers - Dominik, Jakub, and Tomasz - also working toward the future of voice & voice agents.
When we first met, Hubert was presenting his breakthrough on latent speech representation with WavThruVec. Beyond the impact and clarity of the research, one thing was immediately clear: passion. Even through poor photo quality and Covid masks, you can see the happiness and energy beaming through. That same energy extended to his co-founders and the company they built - a team driven by excellence, working side-by-side with customers to bring voice agents to enterprises.
Our Poland team is now more than 50 people strong, as we continue accelerating our presence here.
Dominik, Hubert, Kuba, Tomek - a pleasure to learn from you and work together!
I'm joining ElevenLabs, together with the founding team of Papla Media, to continue our shared mission of advancing voice interfaces. Excited for what's ahead.
Token crisis: solved. ✅
We pre-trained diffusion language models (DLMs) vs. autoregressive (AR) models from scratch — up to 8B params, 480B tokens, 480 epochs.
Findings:
> DLMs beat AR when tokens are limited, with >3× data potential.
> A 1B DLM trained on just 1B tokens hits 56% HellaSwag & 33% MMLU — no tricks, no cherry-picks.
> No saturation: more repeats = more gains.
🚨 ”https://t.co/jmUcE1kywW”
We also dissected the serious methodological flaws in our parallel work “Diffusion Beats Autoregressive in Data-Constrained Settings” — let’s raise the bar for open review!
🔗 Blog & details:
https://t.co/sEQvYUxElj
18 🧵s ahead:
Papla Voicebot is genuinely fun to talk to! Our Papla P1 engine, when running locally, responds in 50-150 ms and it’s also pretty expressive. The main bottleneck right now is Gemini 2.0 Flash latency: the first chunk we pass to TTS typically arrives in ~600 ms. We're exploring a few ideas to make it even snappier, including running a local open-source LLM.
This false nomenclature of “researcher” and “engineer”, which is a thinly-masked way of describing a two-tier engineering system, is being deleted from @xAI today.
There are only engineers.
Researcher is a relic term from academia.
Building the future of AI voice…
Allergy season meets beta testing and apparently, our voicebot has opinions now. 🤧🤖
Still more emotionally aware than some customer service lines.
Big things are coming from Papla Media 💛
#ConversationalAI
We raised $360k in a pre-seed round - grateful to everyone who believed in us early.
We've got some great things coming, and can't wait to share them soon!
🚀 New TTS Playground Now Live at Papla Media!
We just launched a powerful new playground on the Papla Media platform designed to make working with AI voices faster and easier than ever.
Now you can:
📝 Type text and hear it instantly in ultra-realistic voices
🎙️ Try multiple voices before deciding which one to use in your app
🔊 Prototype and download audio
🧬 Generate speech from your cloned voices directly via the UI
⚡ Move and experiment fast
Whether you’re building tools, testing ideas, or just playing with voice, this space is for you.
🚀 Introducing the New Voice Tab
We just made it easier (and more fun) to explore, test, and create voices like never before.
🎧 Browse and preview voices across styles, accents, and tones
🧬 Clone your own voice in a few clicks from just 10 seconds of audio.
Perfect for creators, developers, brands, and anyone building with voice
👉 Jump in, explore the voices, and try cloning your own: https://t.co/0ZFohrqPwl
🎙️ Voice Cloning with Papla P1 Just Got Real
With only 10 seconds of audio, P1 can create a highly realistic voice clone that captures tone, accent, and personality.
This means fast, scalable voice personalization that sounds authentic and deeply human.
🔹 Just 10 seconds of voice input
🔹 Natural rhythm and emotional nuance
🔹 Ready for content, games, apps, and more
From hyper-personalized experiences to next-gen audio production, Papla P1 voice cloning opens up endless possibilities.
🔊 Let us know what you think in the comments, we’d love to hear your thoughts.
Introducing Papla P1 and our real-time API for developers! 🚀
Papla P1 is our advanced text-to-speech model, now available through a developer-friendly platform. Easily generate realistic speech, clone voices, and build natural-sounding conversations into your apps.
Excited to be at NeurIPS 2024 in beautiful Vancouver! Come check out our poster on SNAC at the Saturday Audio Workshop. Big thanks to Luca @lucalanze and Florian for the collab. DM me if you’d like to chat about audio generative models, conversational AI, or startups!