Konstantin Rusch @tk_rusch - Twitter Profile

about 1 month ago

@makinai_ and @philna00 presenting our paper on calibration-free in-training compression of SSMs at #ICLR2026 in Rio 🇧🇷 Paper link: https://t.co/BQ7E5kZoLV

tk_rusch's tweet photo. @makinai_ and @philna00 presenting our paper on calibration-free in-training compression of SSMs at #ICLR2026 in Rio 🇧🇷
Paper link: https://t.co/BQ7E5kZoLV https://t.co/gTOM42Ygy3

0

7

1

431

tk_rusch retweeted

Alexander Amini

@xanamini

2 months ago

Three years ago we started working on a stealth project that we weren’t sure we’d ever talk about publicly... until today. Breakthrough: Introducing LFM-Zero: the first foundation model trained on 0 tokens. No pretraining. No finetuning. No data. Instead, we initialize from an implicit probabilistic prior over the underlying data-generating process, allowing the model to converge without ever observing data. LFM-Zero matches or surpasses models trained on 10T+ tokens across reasoning, coding, and multimodal tasks. Turns out that pretraining was just regularization that was holding us back. > Read our Tech Report here: https://t.co/aIWbx77IEf

xanamini's tweet photo. Three years ago we started working on a stealth project that we weren’t sure we’d ever talk about publicly... until today.

Breakthrough: Introducing LFM-Zero: the first foundation model trained on 0 tokens.

No pretraining. No finetuning. No data. Instead, we initialize from an implicit probabilistic prior over the underlying data-generating process, allowing the model to converge without ever observing data.

LFM-Zero matches or surpasses models trained on 10T+ tokens across reasoning, coding, and multimodal tasks. Turns out that pretraining was just regularization that was holding us back.

> Read our Tech Report here: https://t.co/aIWbx77IEf

146

2K

156

1K

380K

tk_rusch retweeted

Liquid AI

@liquidai

2 months ago

Today, we release LFM2.5-350M. Agentic loops at 350M parameters. A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle. <500MB when quantized, built for environments where compute, memory, and latency are constrained. 🧵

liquidai's tweet photo. Today, we release LFM2.5-350M. Agentic loops at 350M parameters.

A 350M model trained for reliable data extraction and tool use, where models at this scale typically struggle.

<500MB when quantized, built for environments where compute, memory, and latency are constrained.

🧵 https://t.co/zZPKzcCwH9

80

2K

276

2K

346K

tk_rusch retweeted

ELLIS Institute Tübingen

@ELLISInst_Tue

2 months ago

🚀 Small is the new big in AI. In “The Curious Case of In-Training Compression of State Space Models,” @tk_rusch et al. introduce CompreSSM: compressing state space models already during training. 💡 Faster, leaner, and still high-performing. Accepted at @iclr_conf 2026! 🇧🇷 Read the full breakdown on our website: https://t.co/wCvcZs6R5T

0

11

2

1K

Who to follow

Francesco Di Giovanni

@Francesco_dgv

I used to be a physicist / Riemannian geometer / GNN disciple; now I am working on generative models for drug discovery @RecursionPharma

Chaitanya K. Joshi

@chaitjo

AI researcher excited about biomolecule design 🧬 Postdoc @Stanford @RDasLab PhD student @Cambridge_Uni Prev. FAIR @AIatMeta @PrescientDesign @MRC_LMB

Hannes Stark

@HannesStaerk

@MIT PhD student • Generative models for simulating and designing biomolecules

Konstantin Rusch @tk_rusch

4 months ago

Fantastic work by my lab’s very first PhD student! Check it out!!

Philipp Nazari @philna00

4 months ago

🧵[1/5] Qwen3.5 demonstrates the potential of hybrid LLMs. But how well do the Linear Attention layers manage their associative memory? Previous research indicates a low effective rank, which we show: 1. Amplifies query noise 2. Poorly conditions gradients 3. Wastes memory.

philna00's tweet photo. 🧵[1/5] Qwen3.5 demonstrates the potential of hybrid LLMs. But how well do the Linear Attention layers manage their associative memory?

Previous research indicates a low effective rank, which we show:

1. Amplifies query noise
2. Poorly conditions gradients
3. Wastes memory. https://t.co/ietJfQqlW6

7

8

2

6

3K

0

12

0

5

2K

tk_rusch retweeted

Philipp Nazari @philna00

4 months ago

🧵[5/5] Want to know more? 📄 Paper: https://t.co/en0Ob41wec 💻 Code: https://t.co/HqMNMhrESi 🐫Work done at CAMAIL under the supervision of @tk_rusch

0

2

1

0

421

tk_rusch retweeted

Philipp Nazari @philna00

4 months ago

🧵[4/5] We demonstrate that (Gated) DeltaNet states can be pruned by ~50% with minimal degradation in perplexity or zero-shot reasoning. This structured reduction delivers: 🚀 speedup in training throughput 📉 reduction in peak VRAM usage.

0

5

1

0

371

tk_rusch retweeted

Philipp Nazari @philna00

4 months ago

🧵[3/5] Why prune? Because low rank doesn't just waste memory. It hurts retrieval & training stability. Standard pruning can break the causal convolutions. Our fix? Employing semi-orthogonal axis-aligned transformations that preserve their per-channel structure.

0

2

1

0

223

tk_rusch retweeted

Philipp Nazari @philna00

4 months ago

🧵[2/5] Consequently, we propose a rank-based structured pruning framework which: ✂️ Allows removing ~50% of Key/Query channels 📉 Has Minimal impact on perplexity ⚙️ Remains fully compatible with causal convolutions.

1

3

1

0

356

tk_rusch retweeted

Philipp Nazari @philna00

4 months ago

🧵[1/5] Qwen3.5 demonstrates the potential of hybrid LLMs. But how well do the Linear Attention layers manage their associative memory? Previous research indicates a low effective rank, which we show: 1. Amplifies query noise 2. Poorly conditions gradients 3. Wastes memory.

7

8

2

6

3K

tk_rusch retweeted

Liquid AI

@liquidai

5 months ago

Today, we release LFM2.5, our most capable family of tiny on-device foundation models. It’s built to power reliable on-device agentic applications: higher quality, lower latency, and broader modality support in the ~1B parameter class. > LFM2.5 builds on our LFM2 device-optimized hybrid architecture > Pretraining scaled from 10T → 28T tokens > Expanded reinforcement learning post-training > Higher ceilings for instruction following 🧵

liquidai's tweet photo. Today, we release LFM2.5, our most capable family of tiny on-device foundation models.

It’s built to power reliable on-device agentic applications: higher quality, lower latency, and broader modality support in the ~1B parameter class.

> LFM2.5 builds on our LFM2 device-optimized hybrid architecture
> Pretraining scaled from 10T → 28T tokens
> Expanded reinforcement learning post-training
> Higher ceilings for instruction following

🧵

69

2K

256

761

210K

tk_rusch retweeted

Nikhil Chandak

@nikhilchandak29

6 months ago

Added @MPI_IS @ELLISInst_Tue to the plot below. And this is only considering A100 or newer (H100/B200) GPUs. Periodic reminder to join us in the small beautiful town of Tübingen where we have a good ratio of GPUs per researcher (better than most academic places) and great PIs!

nikhilchandak29's tweet photo. Added @MPI_IS @ELLISInst_Tue to the plot below. And this is only considering A100 or newer (H100/B200) GPUs.

Periodic reminder to join us in the small beautiful town of Tübingen where we have a good ratio of GPUs per researcher (better than most academic places) and great PIs! https://t.co/kTfoFhBiRp

1

21

2

4

3K

tk_rusch retweeted

Liquid AI

@liquidai

6 months ago

Today we introduce Liquid Labs, our advanced research unit, with the goal of understanding and building efficient and adaptive intelligence systems. Liquid Labs consolidates our existing research efforts at Liquid across architecture of foundation models, multimodality, training, data, and inference. The lab also will be home to new frontier research work across the broad range of foundation model build-up stack. Read the full announcement: https://t.co/huPO1di46d We are hiring: https://t.co/s2cWDFixbU Also find us at NeurIPS 2025 exhibition hall! 🚀

17

244

34

86

43K

tk_rusch retweeted

Ahmed Elhag @Ahmed_AI035

6 months ago

Our paper REMUL got accepted at LOG 2025! Check out the camera-ready version with new evaluations: https://t.co/K7dotnPf1v

2

16

4

7

4K

tk_rusch retweeted

Alexander Amini

@xanamini

6 months ago

We @LiquidAI will be at #NeurIPS2025 this year! ⚛️ If you are > doing a PhD in ML / NLP / CV / RL > a strong engineer with a first-principled mindset > interested in training best-in-class multimodal foundation models 🚀 Join us @LiquidAI!! Find us at Booth #1605 — and DM me for an invite-only dinner

xanamini's tweet photo. We @LiquidAI will be at #NeurIPS2025 this year! ⚛️

If you are
> doing a PhD in ML / NLP / CV / RL
> a strong engineer with a first-principled mindset
> interested in training best-in-class multimodal foundation models 🚀

Join us @LiquidAI!! Find us at Booth #1605 — and DM me for an invite-only dinner

5

134

11

66

12K

tk_rusch retweeted

ELLIS Institute Tübingen

@ELLISInst_Tue

8 months ago

🚀 The new call for Principal Investigators at the ELLIS Institute Tübingen is now open! We are looking for Principal Investigators as Hector Endowed Fellows in all areas of Machine Learning, Artificial Intelligence, and related fields. These positions offer the exciting possibility of co-appointments with the @MPI_IS and the Tübingen AI Center. 📌 Apply here: https://t.co/zkrJh3zsiD 🗓 Deadline: December 15, 2025 Join our team of Principal Investigators and help shape the future of AI research! #Hiring #AI #MachineLearning #PrincipalInvestigator #Research #Tübingen @ELLISforEurope

1

29

16

8

11K

Konstantin Rusch @tk_rusch

9 months ago

My lab is continuously looking for amazing students who are interested in doing an internship with us. Please follow the link and apply

ELLIS Institute Tübingen

@ELLISInst_Tue

9 months ago

Our Principal Investigators @orvieto_antonio , @CeleMenDu, @maximilian_dax , Rediet Abebe, @Shiwei_Liu66, @tk_rusch, and @wielandbr are looking for motivated students interested in doing an internship at the ELLIS Institute Tübingen. The start date and duration of the internship can be discussed directly with the PI, or you can mention your preferences in your motivation letter. Learn more about their research groups on our website: https://t.co/aqKa1cgjcD Apply by completing the form here: https://t.co/C5CBKGCwrM Join the ELLIS Institute Tübingen and become a part of the @Cyber_Valley community! 🙌