Or Tal @or__tal - Twitter Profile

Pinned Tweet

about 1 year ago

Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗https://t.co/FBWu7ThspC 📄https://t.co/Dp1co1esvd 1/6

Or__Tal's tweet photo. Which modeling to choose for text-to-music generation?
We run a head-to-head comparison to figure it out.
Same data, same architecture - AR vs FM.
👇 If you care about fidelity, speed, control, or editing see this thread.
🔗https://t.co/FBWu7ThspC
📄https://t.co/Dp1co1esvd
1/6 https://t.co/fq7WGvEyHb

1

41

11

3

2K

Or__Tal retweeted

Noam Issachar @IssacharNoam

7 days ago

1/6 Diffusion models are scaling up, but deploying a massive, monolithic network uniformly across the entire generative timeline is inherently inefficient. Introducing Complexity-Balanced Splitting (CBS): a principled framework that allocates capacity exactly where needed!👇🧵

1

31

11

2

412

Or__Tal retweeted

Eliahu Horwitz @EliahuHorwitz

about 1 year ago

🚨 New paper alert! 🚨 Millions of neural networks now populate public repositories like Hugging Face 🤗, but most lack documentation. So, we decided to build an Atlas 🗺️ Project: https://t.co/1JpsC6dCeg Demo: https://t.co/4Xy7yLdIZY 🧵👇🏻 Here's what we found:

EliahuHorwitz's tweet photo. 🚨 New paper alert! 🚨

Millions of neural networks now populate public repositories like Hugging Face 🤗, but most lack documentation. So, we decided to build an Atlas 🗺️

Project: https://t.co/1JpsC6dCeg
Demo: https://t.co/4Xy7yLdIZY

🧵👇🏻 Here's what we found: https://t.co/9S8ATMzJui

5

89

19

38

13K

Or__Tal retweeted

Eliahu Horwitz @EliahuHorwitz

9 months ago

Excited to share this has now been accepted at #NeurIPS2025 as a position paper (<6% acceptance)!🎉 We advocate for systematically studying entire model populations via weight-space learning, and argue that this requires charting them in a Model Atlas. @NeurIPSConf #NeurIPS 🧵👇

EliahuHorwitz's tweet photo. Excited to share this has now been accepted at #NeurIPS2025 as a position paper (<6% acceptance)!🎉

We advocate for systematically studying entire model populations via weight-space learning, and argue that this requires charting them in a Model Atlas.
@NeurIPSConf #NeurIPS
🧵👇 https://t.co/EdVQo8hssP

0

64

21

6

4K

Who to follow

Yonatan Bitton

@YonatanBitton

Research Scientist @GoogleAI | Multimodal ML & Vision-Language | Account restored after hack (July 2025).

Omri Avrahami

@OmriAvr

Researcher at @runwayml (Previously: @Snap, @NVIDIA, @GoogleAI, @MetaAI, @Lightricks). CS PhD from @HebrewU. Interested in #GenerativeAI.

Gili Lior

@GiliLior

PhD student at @CSEhuji

Or__Tal retweeted

Heli Ben-Hamu @helibenhamu

9 months ago

Excited to share our work Set Block Decoding! A new paradigm combining next-token-prediction and masked (or discrete diffusion) models, allowing parallel decoding without any architectural changes and with exact KV cache. Arguably one of the simplest ways to accelerate LLMs!

5

116

26

82

26K

Or Tal @Or__Tal

11 months ago

@jesseengel very cool!

0

1

0

66

Or Tal @Or__Tal

11 months ago

@__Rafail__ @NadavHarTuv I believe it should. If the audio is very long you may need to parse it in chunks but thats also true to all other audio representation models

0

1

0

23

Or Tal @Or__Tal

11 months ago

@__Rafail__ @NadavHarTuv Nope, PAST has ~180M params, with the streamable version having ~125M params. This should run on a standard gpu. For speech LM training we used 2 a100 gpus but it could be done with less

1

0

41

Or__Tal retweeted

Gallil Maimon @GallilMaimon

about 1 year ago

Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends? In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊 Key insights, code, models, full paper 👇🏻

GallilMaimon's tweet photo. Many modern SpeechLMs are trained with Speech-Text interleaving. How does this impact scaling trends?

In our new paper, we train several dozen SLMs, and show - quite a lot! So there is room for optimism 😊

Key insights, code, models, full paper 👇🏻 https://t.co/x0lZZzNzNT

5

75

19

24

6K

Or__Tal retweeted

Gallil Maimon @GallilMaimon

11 months ago

🎉Thrilled that our paper on "scaling analysis of interleaved speech-text LMs" was accepted to #CoLM2025 It gives room for optimism when scaling SpeechLMs *right* - with large TextLMs (in place of more data), interleaving, and synth training data💪

GallilMaimon's tweet photo. 🎉Thrilled that our paper on "scaling analysis of interleaved speech-text LMs" was accepted to #CoLM2025
It gives room for optimism when scaling SpeechLMs *right* - with large TextLMs (in place of more data), interleaving, and synth training data💪 https://t.co/eWldhSffUv

1

29

4

6

2K

Or__Tal retweeted

Ron Yosef @ron_yosef

11 months ago

Happy to announce that our paper “EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits” was accepted to #ACL2025 🎉 📄 https://t.co/mwugXz1H5q 🌐 https://t.co/NMmdzi3nBn

2

22

5

0

1K

Or Tal @Or__Tal

11 months ago

💣Introducing PAST: a speech tokenizer that jointly model phonetics and acoustics (No SSL involved). Past demonstrates great reconstruction as well as semantic capabilities in the form of ABX and sWUGGY. 🤗 https://t.co/teQQ9s5whr Check out Nadav's post👇@NadavHarTuv @adiyossLC

נדב הר-טוב @NadavHarTuv

11 months ago

🚨 New paper alert! PAST: phonetic-acoustic speech tokenizer – just got accepted to Interspeech 2025 🎉 It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder. 🔗https://t.co/yGypWO6YpM 👇 If you’re into speech LMs, keep reading!

NadavHarTuv's tweet photo. 🚨 New paper alert!
PAST: phonetic-acoustic speech tokenizer – just got accepted to Interspeech 2025 🎉
It learns phonetic + acoustic tokens jointly, with no SSL babysitter or external vocoder.

🔗https://t.co/yGypWO6YpM
👇 If you’re into speech LMs, keep reading! https://t.co/nBk4AzsqF7

3

163

33

91

13K

0

9

0

368

Or__Tal retweeted

Audio and Speech Processing Papers @AudioAndSpeech

about 1 year ago

Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation. https://t.co/pbwzHsgUGj

0

11

2

4

476

Or__Tal retweeted

Gallil Maimon @GallilMaimon

12 months ago

🎵💬 If you are interested in Audio Tokenisers, you should check out our new work! We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more. Grab yourself a ☕/🍺 and sit down for a read!

GallilMaimon's tweet photo. 🎵💬 If you are interested in Audio Tokenisers, you should check out our new work!
We empirically analysed existing tokenisers from every way - reconstruction, downstream, LMs and more.

Grab yourself a ☕/🍺 and sit down for a read! https://t.co/g5tYJDFnSA

1

103

25

34

6K

Or__Tal retweeted

Niv Eckhaus @niveckhaus

about 1 year ago

🚨 New Paper: "Time to Talk"! 🕵️ We built an LLM agent that doesn't just decide WHAT to say, but also WHEN to say it! Introducing "Time to Talk" - LLM agents for asynchronous group communication, tested in real Mafia games with human players. 🌐https://t.co/HdNUwlvF2F 🧵1/7

3

57

13

20

6K

Or Tal @Or__Tal

about 1 year ago

Read the full paper! 🔗https://t.co/FBWu7TgUA4 📄https://t.co/Dp1co1dUFF @FelixKreuk @adiyossLC

0

3

0

88

Or Tal @Or__Tal

about 1 year ago

Which modeling to choose for text-to-music generation? We run a head-to-head comparison to figure it out. Same data, same architecture - AR vs FM. 👇 If you care about fidelity, speed, control, or editing see this thread. 🔗https://t.co/FBWu7ThspC 📄https://t.co/Dp1co1esvd 1/6

1

41

11

3

2K

Or Tal @Or__Tal

about 1 year ago

What if training steps are capped at 500k? FM reaches near-topline quality with small batches. It’s compute-efficient and forgiving. AR needs larger batch sizes to recover performance. It benefits more from large-scale training. See📉 below by model duration + batch size: 6/6

Or__Tal's tweet photo. What if training steps are capped at 500k?
FM reaches near-topline quality with small batches.
It’s compute-efficient and forgiving.
AR needs larger batch sizes to recover performance.
It benefits more from large-scale training.
See📉 below by model duration + batch size:
6/6 https://t.co/N9cGG1pOpz

1

2

0

129

Or__Tal retweeted

Felix Kreuk @FelixKreuk

about 1 year ago

We’ve been exploring the trade-offs between Autoregressive and Flow-Matching models for music generation. We share our findings in this latest paper led by @Or__Tal. Many interesting take-aways and practical advice on training generative models for music! 🎶🧠

1

11

1

4

603

Or Tal

@Or__Tal

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users