omkar @omkar_1799 - Twitter Profile

6 days ago

Music v2 can generate music from all over the world, in dozens of languages. To demonstrate it, we’ve built this demo. Listen to Britpop in Hindi, Brazilian Samba in Mandarin, Nigerian Afrobeats in Polish, or any other combination you’d like.

5

43

7

22

8K

omkar_1799 retweeted

ElevenLabs Developers

@ElevenLabsDevs

3 months ago

Introducing Guardrails 2.0 for Agents Control how agents behave in production with protections to keep responses safe, compliant, and reliable.

4

111

16

39

10K

omkar_1799 retweeted

ElevenLabs @ElevenLabs

4 months ago

Introducing Experiments in ElevenAgents - the most data-driven way to improve real-world agent performance. Experiments enables you to run controlled A/B tests to measure which agent configuration works best - from prompt structure to workflow logic, voice, and personality.

22

308

36

153

30K

omkar_1799 retweeted

Artificial Analysis

@ArtificialAnlys

4 months ago

Announcing AA-WER v2.0 Speech to Text accuracy benchmark, and AA-AgentTalk, a new proprietary dataset focused on speech directed at voice agents AA-AgentTalk focuses on the speech that matters most to voice agents. As a held-out, proprietary dataset, AA-AgentTalk also mitigates the risk of models training to perform well on public test sets. Leading public Speech to Text datasets contain errors in their reference transcripts, where the ground truth doesn't match what was actually said. We've manually corrected these and are open-sourcing cleaned versions of VoxPopuli and Earnings22 on Hugging Face. What's changed in v2.0: ➤ New held-out, proprietary dataset - AA-AgentTalk (50% weighting): 469 samples (~250 minutes) of speech directed at voice agents, and it's private so models can't train on it. Spans voice agent & call center interaction, AI agent interaction, industry jargon, meetings, consumer & personal, and media content across 17 accent groups, 8 speaking styles, and a mix of devices and environments. ➤ Cleaned transcripts for existing public datasets: We identified errors in the original ground truth transcriptions for public datasets, VoxPopuli and Earnings22 - instances where reference transcripts didn't accurately capture what was actually said. Inaccurate ground truth unfairly penalizes models that correctly transcribe the audio, so we manually reviewed and created cleaned versions, VoxPopuli-Cleaned-AA and Earnings22-Cleaned-AA. ➤ Removal of AMI-SDM: We removed the AMI-SDM dataset as the transcript errors were too extensive to correct without making a large number of judgment calls we weren't comfortable with (e.g., heavily overlapping speech). ➤ Improved text normalization: We developed a custom text normalizer building on OpenAI’s whisper normalizer package to reduce artificially inflated WER from formatting differences rather than genuine transcription errors. Key fixes include digit splitting to prevent number grouping mismatches (e.g., 1405 553 272 vs. 1405553272), preservation of leading zeros, normalization of spoken symbols (e.g., “+”, “_”), stripping redundant :00 in times (e.g., 7:00pm vs. 7pm), adding additional US / UK English spelling equivalences (e.g., totalled vs totaled), and accepted equivalent spellings for ambiguous proper nouns in our dataset (e.g., Mateo vs. Matteo). This ensures models are evaluated on actual transcription accuracy rather than surface-level formatting choices. The new weighting is 50% AA-AgentTalk, 25% VoxPopuli-Cleaned-AA, 25% Earnings22-Cleaned-AA. Key results: @elevenlabs's Scribe v2 leads at 2.3% AA-WER v2.0, followed by @GoogleDeepMind's Gemini 3 Pro at 2.9%, @MistralAI's Voxtral Small at 3.0%, Google's Gemini 3 Flash at 3.1%, and ElevenLabs Scribe v1 at 3.2%. ElevenLabs Scribe v2 leads on two of the three component datasets, AA-AgentTalk and Earnings22-Cleaned-AA, while Google's Gemini 3 Pro leads on VoxPopuli-Cleaned-AA. See below for further detail.

ArtificialAnlys's tweet photo. Announcing AA-WER v2.0 Speech to Text accuracy benchmark, and AA-AgentTalk, a new proprietary dataset focused on speech directed at voice agents

AA-AgentTalk focuses on the speech that matters most to voice agents. As a held-out, proprietary dataset, AA-AgentTalk also mitigates the risk of models training to perform well on public test sets.

Leading public Speech to Text datasets contain errors in their reference transcripts, where the ground truth doesn't match what was actually said. We've manually corrected these and are open-sourcing cleaned versions of VoxPopuli and Earnings22 on Hugging Face.

What's changed in v2.0:
➤ New held-out, proprietary dataset - AA-AgentTalk (50% weighting): 469 samples (~250 minutes) of speech directed at voice agents, and it's private so models can't train on it. Spans voice agent & call center interaction, AI agent interaction, industry jargon, meetings, consumer & personal, and media content across 17 accent groups, 8 speaking styles, and a mix of devices and environments.

➤ Cleaned transcripts for existing public datasets: We identified errors in the original ground truth transcriptions for public datasets, VoxPopuli and Earnings22 - instances where reference transcripts didn't accurately capture what was actually said. Inaccurate ground truth unfairly penalizes models that correctly transcribe the audio, so we manually reviewed and created cleaned versions, VoxPopuli-Cleaned-AA and Earnings22-Cleaned-AA.

➤ Removal of AMI-SDM: We removed the AMI-SDM dataset as the transcript errors were too extensive to correct without making a large number of judgment calls we weren't comfortable with (e.g., heavily overlapping speech).

➤ Improved text normalization: We developed a custom text normalizer building on OpenAI’s whisper normalizer package to reduce artificially inflated WER from formatting differences rather than genuine transcription errors. Key fixes include digit splitting to prevent number grouping mismatches (e.g., 1405 553 272 vs. 1405553272), preservation of leading zeros, normalization of spoken symbols (e.g., “+”, “_”), stripping redundant :00 in times (e.g., 7:00pm vs. 7pm), adding additional US / UK English spelling equivalences (e.g., totalled vs totaled), and accepted equivalent spellings for ambiguous proper nouns in our dataset (e.g., Mateo vs. Matteo). This ensures models are evaluated on actual transcription accuracy rather than surface-level formatting choices.

The new weighting is 50% AA-AgentTalk, 25% VoxPopuli-Cleaned-AA, 25% Earnings22-Cleaned-AA.

Key results:
@elevenlabs's Scribe v2 leads at 2.3% AA-WER v2.0, followed by @GoogleDeepMind's Gemini 3 Pro at 2.9%, @MistralAI's Voxtral Small at 3.0%, Google's Gemini 3 Flash at 3.1%, and ElevenLabs Scribe v1 at 3.2%.
ElevenLabs Scribe v2 leads on two of the three component datasets, AA-AgentTalk and Earnings22-Cleaned-AA, while Google's Gemini 3 Pro leads on VoxPopuli-Cleaned-AA.

See below for further detail.

10

200

22

62

27K

Who to follow

Saahaj Mattey

@saahajm

ATL | DTX | Car Enthusiast | Pilot

omkar_1799 retweeted

ElevenLabs @ElevenLabs

4 months ago

ElevenCreative lets marketing teams go from idea to full campaign in under a day. At the ElevenLabs Summit in London, @lukeharries showed how brands use ElevenCreative to produce studio-quality ads with AI - combining speech, music, sound effects, image, and video in one platform.

25

434

37

283

3M

omkar_1799 retweeted

ElevenLabs @ElevenLabs

4 months ago

Introducing ElevenAgents for Support - a dedicated offering to help customer support teams transform manual processes into production-ready agent workflows.

24

528

48

656

394K

omkar_1799 retweeted

Mati Staniszewski

@mati

4 months ago

Proud day today: we hosted our biggest event yet, ElevenLabs Summit London - where it all started. Loved seeing 1,000+ partners, customers, and friends.

8

123

11

7K

omkar_1799 retweeted

ElevenLabs @ElevenLabs

4 months ago

At the ElevenLabs Summit in London, our Co-Founder @matiii delivered a live demo showing how ElevenAgents can help governments and enterprises support users anytime, anywhere, and in any language. Built with modular components that apply to any organization, see how a national government agent can power the future of citizen support.

15

116

13

38

15K

omkar_1799 retweeted

Luke Harries

@lukeharries

5 months ago

At ElevenLabs, we believe in the power of video for storytelling It was a privilege to work with a16z to talk through Act I - building the first human-like voice model And our new mission with Act II - passing the vocal turing test

14

115

16

23

11K

omkar_1799 retweeted

Jakub Lichman

@JakubLichman

5 months ago

Introducing Eleven V3 Conversational! 🗣️ Our very first contextually aware model which sees past turns in the conversation and thus can deliver emotions which feel more natural. Incredibly grateful for being able to lead model development part in this dream team with @Marko_Jozef on Agents integration and @maxilevi__ on inference. Try it now on our Agents platform! 🚀

10

77

12

11

3K

omkar_1799 retweeted

ElevenLabs @ElevenLabs

5 months ago

Introducing Expressive Mode for ElevenAgents - voice agents so expressive, they blur the line between AI and human conversations. This is an unedited recording of an agent empathizing with a customer at peak frustration.

159

2K

396

1K

11M

omkar_1799 retweeted

ElevenLabs @ElevenLabs

5 months ago

Meet Audiobooks in ElevenCreative. The complete toolkit to create, refine, and publish audiobooks using lifelike AI voices - from first draft to published audio.

18

435

53

245

53K

omkar_1799 retweeted

a16z @a16z

5 months ago

ElevenLabs started as a weekend project. They crossed $330M ARR in 2025 as they build the voice interface of the future. This is the ElevenLabs story. An a16z Original.

80

1K

127

608

378K

omkar_1799 retweeted

ElevenLabs @ElevenLabs

5 months ago

We raised $500M at an $11B valuation to transform how people interact with technology.

1K

4K

409

1K

15M

omkar_1799 retweeted

Mati Staniszewski

@mati

5 months ago

Today, @elevenlabs is announcing a $500M Series D at an $11B valuation, led by Sequoia, with a16z quadrupling down and ICONIQ tripling down. It reflects the trust of customers and partners building at the frontier alongside us - and gives us momentum to ship even faster.

192

3K

194

765

319K

omkar_1799 retweeted

ElevenLabs @ElevenLabs

5 months ago

We closed 2025 at over $330M ARR - driven by our customers’ trust and impact. @matiii joined @Bloomberg earlier today to talk about the work we do with enterprises, our recent research release, and what’s next.

16

291

22

50

96K

omkar_1799 retweeted

ElevenLabs Developers

@ElevenLabsDevs

6 months ago

Today, we launch the ElevenLabs OSS Engineers Fund - a program that provides sustained support to the open-source projects that help power our work. Over the next six months, we are contributing $22,000 to projects our engineers rely on.

ElevenLabsDevs's tweet photo. Today, we launch the ElevenLabs OSS Engineers Fund - a program that provides sustained support to the open-source projects that help power our work.

Over the next six months, we are contributing $22,000 to projects our engineers rely on. https://t.co/V4kjeIHzFk

25

581

42

92

255K

omkar @omkar_1799

over 2 years ago

today's the day! @Apple

0

8

0

348

omkar @omkar_1799

almost 3 years ago

recently wrote an article about the nuances of integrating with the IMAP protocol 📧 while building @nylas - hope you enjoy! https://t.co/HiQ6nTp4UP

0

3

0

302

omkar_1799 retweeted

Garry Tan

@garrytan

almost 3 years ago

If you're using Typeform for your signup flow, you're missing customizations that increase conversion by 70% Surface Labs (YC S23) is here to make it one-click for you. One of the founders previously got https://t.co/k0f9suyrsN to 80% conversion. https://t.co/IPkb1jfmNE

29

709

42

799

315K

omkar

@omkar_1799

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users