Kyle Kastner @kastnerkyle - Twitter Profile

Pinned Tweet

Kyle Kastner @kastnerkyle

about 2 months ago

Mentioned in the linked blog but this was nice to see! https://t.co/W7T6JMWR92

Google AI

@GoogleAI

about 2 months ago

Today we launched Gemini 3.1 Flash TTS, our most expressive and controllable text-to-speech model yet. This launch [excitement] includes audio tags! 🗣🏷 Audio tags [explanatory] are a seamless way to guide vocal style, pace, and delivery using natural language commands embedded directly in your text. Want a different tempo or tone? [amazement] Just tag the audio to steer the AI-speech output! The model supports 70+ languages (24 of which are high-quality evaluated languages, including: Japanese, Hindi, and Arabic). Watch the audio tags in action in the demo below ↓

118

2K

309

892

201K

0

3

0

480

kastnerkyle retweeted

Tero Parviainen

@teropa

1 day ago

Have been doing some stem remixing work with Stable Audio 3. 📦 Medium model 🔉 init_audio holding the original audio file 😶‍🌫️ init_noise_level between 0.4-0.5 seems to be the sweet spot 🪄 Empty promps

2

47

6

31

6K

kastnerkyle retweeted

Benhao Huang

@huskydogewoof

12 days ago

𝐇𝐨𝐰 𝐝𝐨 𝐰𝐞 𝐠𝐞𝐭 𝐟𝐫𝐨𝐦 𝐚 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐟𝐞𝐞𝐝𝐟𝐨𝐫𝐰𝐚𝐫𝐝 𝐦𝐨𝐝𝐞𝐥 𝐭𝐨 𝐚 𝐜𝐚𝐩𝐚𝐛𝐥𝐞 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥? On Sudoku, we traced the exact path of unlocking neural attractors: - Feedforward → 2.6% - Weight-tying → 32.6% - Online Training → 74.7% - Hierarchy → 76.5% - Adaptive Compute → 84.8% Each jump wasn't just a trick. It was a choice about how to shape the attractor landscape. Here is what we learned: 🧵👇 #ICML2026

huskydogewoof's tweet photo. 𝐇𝐨𝐰 𝐝𝐨 𝐰𝐞 𝐠𝐞𝐭 𝐟𝐫𝐨𝐦 𝐚 𝐬𝐭𝐚𝐧𝐝𝐚𝐫𝐝 𝐟𝐞𝐞𝐝𝐟𝐨𝐫𝐰𝐚𝐫𝐝 𝐦𝐨𝐝𝐞𝐥 𝐭𝐨 𝐚 𝐜𝐚𝐩𝐚𝐛𝐥𝐞 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐦𝐨𝐝𝐞𝐥?

On Sudoku, we traced the exact path of unlocking neural attractors:

- Feedforward → 2.6%
- Weight-tying → 32.6%
- Online Training → 74.7%
- Hierarchy → 76.5%
- Adaptive Compute → 84.8%

Each jump wasn't just a trick. It was a choice about how to shape the attractor landscape.

Here is what we learned: 🧵👇

#ICML2026

3

201

28

180

28K

kastnerkyle retweeted

Benhao Huang

@huskydogewoof

12 days ago

Share a great tutorial on Amortized Optimization by @brandondamos, which I found quite connected to loop models: https://t.co/EplFMKjFbP

1

29

7

25

5K

Who to follow

Clément Farabet

@clmt

AI @ Google DeepMind (Gemini, Gemma & Beyond). Ex NVIDIA (self-driving cars, https://t.co/QtrCBg3wx0), Twitter (founded Cortex), MadBits (founded+sold) 🇺🇸🇫🇷

Hugo Larochelle

@hugo_larochelle

Mila Scientific Director. Ex @Google DeepMind & Twitter Cortex. Father of 4. // Directeur scientifique à Mila. Ex @Google DeepMind & Twitter Cortex. Père de 4.

Alex Smola

@smolix

LLMs for interaction at https://t.co/uY2XbWTgaT, AutoML at https://t.co/xqkK2q7L02, learn ML with https://t.co/9W8dBWESkW.

kastnerkyle retweeted

Sapient Intelligence @Sapient_Int

17 days ago

Introducing HRM-Text. An ultra-lean 1B-parameter reasoning language model designed to deliver strong general performance with a fraction of the data, compute, and infrastructure. Trained on just 40B structured tokens, HRM-Text achieves competitive performance while using ~1/1000 of the training data of comparable models. The kicker? The full model trains in roughly one day on a $1,000 budget. This opens the door to a new generation of AI that is powerful, accessible, and radically easier to adapt. Theories and research concepts once deemed too expensive to test are officially back in the game. Sapient Intelligence invites you to help us shape a new paradigm for general intelligence.

160

3K

269

2K

505K

kastnerkyle retweeted

Benhao Huang

@huskydogewoof

10 days ago

Reproduced HRM-Text XL (1B). Training completed in ~38 hours wall-clock on 16 H200 GPUs, and evaluation performance matches the numbers reported in the paper. Great job, team! W&B report: https://t.co/bjoMNQ043k

huskydogewoof's tweet photo. Reproduced HRM-Text XL (1B).

Training completed in ~38 hours wall-clock on 16 H200 GPUs, and evaluation performance matches the numbers reported in the paper.

Great job, team!

W&B report:

https://t.co/bjoMNQ043k https://t.co/mIr34WSgIX

8

175

14

93

22K

kastnerkyle retweeted

Tero Parviainen

@teropa

10 days ago

Using Stable Audio 3 to generate variations of an existing loop. Unconditional generation (no prompt), renoising the latents to 0.5, and just using different seeds seems to generate a nice neighbourhood around the original. Generally keeps the harmonic context and feel.

9

153

9

98

11K

kastnerkyle retweeted

Ryan Bahlous-Boldi

@RyanBoldi

13 days ago

Your RL post-training may be sabotaging your LLM’s test-time scaling! Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*. We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.

RyanBoldi's tweet photo. Your RL post-training may be sabotaging your LLM’s test-time scaling!

Conventional RL pretends that you can collapse all reward signals *upfront* into a single *scalar reward*.
We introduce Vector Policy Optimization (VPO), which natively maximizes *vector-valued* rewards, boosting test time search performance, even on the original scalar.

35

846

120

783

210K

kastnerkyle retweeted

Arthur Gretton @ArthurGretton

13 days ago

Your drifting model is secretly a fixed point for the Wasserstein gradient flow on... ...the KL? ...an approximation to the Sinkhorn? ...Is it even a Wasserstein gradient flow at all? https://t.co/QJLh86Hi0d @liwenliang @agalashov @JamesTThorn @ValentinDeBort1 @ArnaudDoucet1

ArthurGretton's tweet photo. Your drifting model is secretly a fixed point for the Wasserstein gradient flow on...
...the KL?
...an approximation to the Sinkhorn?
...Is it even a Wasserstein gradient flow at all?

https://t.co/QJLh86Hi0d

@liwenliang @agalashov @JamesTThorn @ValentinDeBort1 @ArnaudDoucet1 https://t.co/6O1HAYXXKY

2

438

79

354

63K

kastnerkyle retweeted

Alexander Chen

@alexanderchen

13 days ago

I asked Gemini Omni for 3x3 split screen based on my video 🐸 Useful way to visualize many ideas in one output. Prompt in 🧵

7

64

2

28

4K

kastnerkyle retweeted

Alexander Chen

@alexanderchen

13 days ago

I gave this this video with prompt like this. It also works with an image input. Prompt: Generate a 3x3 split screen video based on different details you see here. Make each cell different, varying the perspective, composition, zoom, angle, camera movement (some static, some moving). Make some of the cells extreme close-ups with detailed textures. Keep it photorealistic, handheld, raw. Only natural sounds.

2

8

1

12

1K

kastnerkyle retweeted

Shubhendu Trivedi @_onionesque

13 days ago

Looks promising https://t.co/PT5DOkE5Pq

1

44

9

25

4K

kastnerkyle retweeted

Tyler Farghly @tylerfarghly

14 days ago

[📄preprint] Diffusion models 🤝 MCMC ! Diffusion model samplers are biased due to discretisation 💡The fix: Metropolis-type adjustment on corrector steps ❗️Challenge: no access to the density ratio, only the score 🔑Insight: the score (and some maths) is all you need... [1/3]

tylerfarghly's tweet photo. [📄preprint] Diffusion models 🤝 MCMC !

Diffusion model samplers are biased due to discretisation

💡The fix: Metropolis-type adjustment on corrector steps
❗️Challenge: no access to the density ratio, only the score
🔑Insight: the score (and some maths) is all you need...
[1/3] https://t.co/jM1wMXhasL

5

345

50

264

19K

kastnerkyle retweeted

Pedro

@pmpcurvo

14 days ago

Guide with examples, not rewards 🐘 Controlling what a pretrained generative model produces is still mostly a choice between three slow options: fine-tune it, attach a reward network, or search at inference. We found flow matching allows a fourth, and it costs almost nothing. In deterministic interpolants, the velocity of the flow is determined by where the trajectory is headed: the endpoint mean. Shift that mean, and the entire flow shifts with it. This turns control into a matter of reference. Change the examples that define the endpoint, and you change the direction the model follows. The examples need not be perfect. They only need to point the flow toward the attribute you want. Color, identity, style, and structure, all controllable through examples. 🧵👇

6

168

29

177

34K

kastnerkyle retweeted

dadabots

@dadabots

15 days ago

🥳 Announcing Stable Audio 3 🍕 🏆 fastest music models ever 💻 runs on MacBookPro M-series 🧪 break it plz 🧠 LoRA finetune in < 1h 📷 Sm = faster, Medium = qualityer ⚡ 59x realtime on M5 Pro One-liner fast install: curl -LsSf https://t.co/fdFBiUC4PP | bash

11

281

37

279

55K

kastnerkyle retweeted

Sungjin Ahn

@SungjinAhn_

15 days ago

🧠We introduce "Generative Recursive Reasoning"! Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor. Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling. And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x). With only 10M params: • Sudoku-Extreme: 97.0% (TRM 87.4%) • ARC-AGI-1: 52.0% • ARC-AGI-2: 11.1% • N-Queens coverage: 90%+ 📄 Paper: https://t.co/JC7EyXYc9Y 🌐 Project page: https://t.co/LRT1dQiWLZ w/ Junyeob Baek @JunyeobB (KAIST), Mingyu Jo @pyross0000 (KAIST), Minsu Kim @minsuuukim (KAIST & Mila), Mengye Ren @mengyer (NYU), Yoshua Bengio @Yoshua_Bengio (Mila), Sungjin Ahn @SungjinAhn_ (KAIST)

SungjinAhn_'s tweet photo. 🧠We introduce "Generative Recursive Reasoning"!

Recursive Reasoning Models like HRM, TRM, and Looped Transformers are deterministic — same input, same reasoning, every time. They collapse the entire space of plausible reasoning paths into a single attractor.

Our model GRAM (Generative Recursive reAsoning Models) turns recursion itself into a stochastic latent trajectory. Multiple hypotheses, alternative solution strategies, and inference-time scaling not just by depth, but by width — parallel trajectory sampling.

And here's the kicker: the same formulation that gives us conditional reasoning p(y|x) also makes GRAM a general generative model p(x).

With only 10M params:
• Sudoku-Extreme: 97.0% (TRM 87.4%)
• ARC-AGI-1: 52.0%
• ARC-AGI-2: 11.1%
• N-Queens coverage: 90%+

📄 Paper: https://t.co/JC7EyXYc9Y
🌐 Project page: https://t.co/LRT1dQiWLZ

w/
Junyeob Baek @JunyeobB (KAIST),
Mingyu Jo @pyross0000 (KAIST),
Minsu Kim @minsuuukim (KAIST & Mila),
Mengye Ren @mengyer (NYU),
Yoshua Bengio @Yoshua_Bengio (Mila),
Sungjin Ahn @SungjinAhn_ (KAIST)

31

1K

208

1K

182K

kastnerkyle retweeted

Graham Neubig

@gneubig

15 days ago

Check out our new work on examining what LLMs learn and when! We posit that LLMs have an implicit curriculum where they learn gradually more complex skills, and attempt to uncover some details of how this curriculum develops over time across model families.

3

77

14

63

12K

kastnerkyle retweeted

Ben Poole @poolio

16 days ago

Real-world models are here! Stoked to share how we're bringing real-world locations to life by integrating Street View into Genie. Try it now at https://t.co/j6c1N38tRS and read the blog for more info: https://t.co/6ZOi9d9rah

20

612

92

361

215K

kastnerkyle retweeted

Google DeepMind @GoogleDeepMind

16 days ago

We’re dropping Gemini Omni: our first step towards a model that can create anything from anything - starting with video. It combines Gemini’s intelligence with our generative media systems - representing a leap forward in world understanding, multimodality, and editing 🧵

413

8K

1K

2K

1M

kastnerkyle retweeted

Xavier Gonzalez @xavierjgonzalez

16 days ago

Fixed point iterations for parallelizing nonlinear dynamics is all the rage: - Newton for RNNs - Picard for diffusion models - Jacobi for parallel decode of LLMs But how do these techniques relate, and when should you use them? We show you how in our new paper 🧵

6

169

27

148

20K

kastnerkyle retweeted

John Nguyen

@__JohnNguyen__

26 days ago

Today we released the code for our CVPR 2026 paper, Flowception. Flowception bridges fully bidirectional sequence modeling and autoregressive generation by inserting frames via learned order, then denoising them with continuous flow. Website: https://t.co/BujPaGRJ7h Code: https://t.co/VJNBFBvIbV

4

94

17

68

19K

Kyle Kastner

@kastnerkyle

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users