Ido Amos @AmosaurusRex - Twitter Profile

Ido Amos @AmosaurusRex

12 days ago

Just recently joined @cartesia and looks like my timing was pretty good 😬

Artificial Analysis

@ArtificialAnlys

12 days ago

Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500+ voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following. Key takeaways: ➤ Quality: Sonic-3.5 has an Elo score of 1,218 (+16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209 ➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters ➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS See more details and listen to samples below 🧵

ArtificialAnlys's tweet photo. Cartesia’s Sonic-3.5 takes the #1 spot on the Artificial Analysis Speech Arena Leaderboard, surpassing Inworld Realtime TTS 1.5 Max and Google’s Gemini 3.1 Flash TTS

Sonic-3.5 is the latest TTS model from @cartesia . It supports 42 languages, including 9 Indian languages, with 500+ voices available out of the box. The model has been highly preferred among voters in the TTS Arena, with its demonstrated naturalness and accurate transcript following.

Key takeaways:
➤ Quality: Sonic-3.5 has an Elo score of 1,218 (+16/-16) based on 1,144 arena appearances, placing it ahead of Inworld Realtime TTS 1.5 Max at 1,194 and Gemini 3.1 Flash TTS at 1,209

➤ Pricing: Sonic-3.5 is priced at $39/1M characters, a premium compared to Gemini 3.1 Flash TTS at $18.3/1M characters, and Inworld Realtime TTS 1.5 Max at $35/1M characters

➤ Speed: 105.5 characters per second, compared to 205 characters per second for Inworld Realtime TTS 1.5 Max and 26.3 characters per second for Gemini 3.1 Flash TTS

See more details and listen to samples below 🧵

18

264

55

89

107K

2

19

0

1K

Ido Amos @AmosaurusRex

3 months ago

@JentseHuang Thanks @JentseHuang ! Sounds very interesting, we mostly used Thinkig States to represent reasoning in our work but treating them as an internal memory indeed sounds very natural. I’ll have a look on your experiments

0

1

0

49

Ido Amos @AmosaurusRex

4 months ago

Can LLMs reason internally while processing their inputs, similar to how humans can think ahead as we process information? Our latest work introduces Thinking States, a novel architectural adaptation that transforms reasoning into a internal recurrent process. By training models to maintain a dynamic thinking state, we achieve significant inference speedups over Chain-of-Thought while substantially outperforming existing latent reasoning methods. Paper: https://t.co/nXJ9szfbrT

AmosaurusRex's tweet photo. Can LLMs reason internally while processing their inputs, similar to how humans can think ahead as we process information? Our latest work introduces Thinking States, a novel architectural adaptation that transforms reasoning into a internal recurrent process.
By training models to maintain a dynamic thinking state, we achieve significant inference speedups over Chain-of-Thought while substantially outperforming existing latent reasoning methods.

Paper: https://t.co/nXJ9szfbrT

5

133

27

106

13K

Ido Amos @AmosaurusRex

4 months ago

Thinking States outperforms existing latent reasoning methods on multiple benchmarks and matches Chain-of-Thought performance on multi-hop QA, while leading to faster inference times. Furthermore, Thinking states exhibit superior length generalization in state-tracking tasks, successfully extrapolating to sequences significantly longer than those seen during training. This work was done during an internship at Google Research with an incredible team of collaborators: @clu_avi @megamor2 @amirgloberson @jonherzig @LiorShani286867 @ISzpektor Read the full paper and explore our findings here: https://t.co/nXJ9szfbrT

0

8

0

1

428

Ido Amos @AmosaurusRex

4 months ago

A major challenge in latent reasoning is finding effective supervision for the reasoning process. Since thinking states are represented in natural language, we can leverage existing Chain-of-Thought data for supervision. Furthermore, as this supervision is available in advance, we use it to teacher-force the thinking states themselves. This circumvents the need for costly recurrent optimization via backpropagation through time (BPTT), enabling fully parallel training and maintaining nearly constant training costs regardless of reasoning depth.

1

3

0

496

Ido Amos @AmosaurusRex

8 months ago

@lovodkin93 Good luck on your exciting new journey!!

1

0

31

Ido Amos @AmosaurusRex

over 1 year ago

@gon_buzaglo @Princeton Good luck 👑 enjoy it!

0

1

0

55

Ido Amos @AmosaurusRex

about 2 years ago

Honestly cannot believe that our work got the BEST PAPER award @iclr_conf !!! This was an amazing experience with my collaborators @JonathanBerant @ankgup2 , looking forward to share with everyone at the conference. Reach out if you want to chat!

Ido Amos @AmosaurusRex

over 2 years ago

Excited to share my work with @JonathanBerant @ankgup2! We show pretraining on task data alone suffices to bridge the gap between state space models and transformers on Long Range Arena, leading to a significantly better estimate of model capabilities. https://t.co/rbQeVUo5m0 🧵

1

39

8

9

9K

2

36

1

5K

Ido Amos @AmosaurusRex

over 2 years ago

@ibomohsin A really interesting point of view on LLMs and language in general! Can you expend on what you think fractal dimension means for language?

1

3

0

342

Ido Amos @AmosaurusRex

over 2 years ago

[4/4] Investigating the effects of data scale, we find self-pretraining is most effective in low-data regimes, underscoring its importance for evaluation across all dataset sizes. We further show that self pretraining is effective across model sizes and when compute is limited.

AmosaurusRex's tweet photo. [4/4] Investigating the effects of data scale, we find self-pretraining is most effective in low-data regimes, underscoring its importance for evaluation across all dataset sizes. We further show that self pretraining is effective across model sizes and when compute is limited. https://t.co/kiDNqTfmh6

0

4

2

0

524

Ido Amos @AmosaurusRex

over 2 years ago

Excited to share my work with @JonathanBerant @ankgup2! We show pretraining on task data alone suffices to bridge the gap between state space models and transformers on Long Range Arena, leading to a significantly better estimate of model capabilities. https://t.co/rbQeVUo5m0 🧵

1

39

8

9

9K

Ido Amos @AmosaurusRex

over 2 years ago

[3/4] The marked effect of self-pretraining on long-sequence tasks leads us to rethink the necessity of complex designs, with Diagonal Linear RNNs (DLR) as a specific example. Our findings indicate that, when pretrained, simple architectures can be as effective as complex designs

AmosaurusRex's tweet photo. [3/4] The marked effect of self-pretraining on long-sequence tasks leads us to rethink the necessity of complex designs, with Diagonal Linear RNNs (DLR) as a specific example. Our findings indicate that, when pretrained, simple architectures can be as effective as complex designs https://t.co/LSs1gwgZXw

1

6

2

0

651

Ido Amos

@AmosaurusRex

Last Seen Users on Sotwe

Trends for you

Most Popular Users