Vihang Patil @wehungpatil - Twitter Profile

Pinned Tweet

over 1 year ago

This is what we have been working on for the last few months. Advent of architectures like xLSTM open new frontiers of efficiency for generative models. The xLSTM not only provides constant memory consumption with increasing context length, but is extremely fast at inference.

Thomas Schmied @thsschmied

over 1 year ago

Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 https://t.co/4RUDRich35

thsschmied's tweet photo. Transformers can be slow for real-time applications like robotics. We study if modern recurrent architectures, like xLSTM and Mamba, can be faster alternatives. Experiments on 432 tasks show that they compare favourably in terms of performance and speed 🎃 https://t.co/4RUDRich35 https://t.co/Lavhf9Y3Tc

3

214

44

113

29K

0

4

1

0

338

wehungpatil retweeted

Mayank Singh

@mayansingh09

4 months ago

Check out https://t.co/UEPjVrCACL. It’s your constant AI research companion. Read any PDF with the AI as your partner. ✍️Highlight and annotate your reading 🤖Ask powerful AI models questions 🗂️Organize your reading into folders 🌐Find new papers via conversation search

3

4

1

0

136

wehungpatil retweeted

Korbinian Poeppel @KorbiPoeppel

about 1 year ago

Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions? Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs. Paper link: https://t.co/nU7626uHWK

KorbiPoeppel's tweet photo. Ever wondered how linear RNNs like #mLSTM (#xLSTM) or #Mamba can be extended to multiple dimensions?
Check out "pLSTM: parallelizable Linear Source Transition Mark networks". #pLSTM works on sequences, images, (directed acyclic) graphs.
Paper link: https://t.co/nU7626uHWK https://t.co/fz5Nv40CHr

4

135

42

95

15K

wehungpatil retweeted

Mayank @mayank_iitgn

about 1 year ago

#Eka initiative is looking for your contributions to curate the List of websites in the Native Indian Languages. The majority of Indic websites are missing from existing corpora like CC. Please fill out this form to add URLs in your native language: https://t.co/cxlUsvn2rv

4

27

4

5

5K

Who to follow

Philipp Seidl

@phseidl

Postdoc at the IML-JKU Linz. Prev. Intern at MSR Cambridge. Passionate about ML for DD, LLMs, and Zero-shot learning. Opinions are my own and evolving ;)

Andreas Mayr

@AndreasMayr11

Postdoc Scientist in Machine Learning @ Johannes Kepler University Linz

ELLIS Unit Linz & LIT AI Lab

@LITAILab

The LIT Lab is committed to scientific excellence. Our focus is on theoretical and experimental research in machine learning and artificial intelligence.

wehungpatil retweeted

torchrl @torchrl1

about 1 year ago

torchrl 🤝 gymnasium happy ever after With the help of the @FaramaFound team, we managed to make TorchRL compatible with gymnasium v1.1 onward!

1

12

3

4

587

wehungpatil retweeted

Maximilian Beck @maxmbeck

over 1 year ago

Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧 We introduce ⚡️Tiled Flash Linear Attention (TFLA), ⚡️ A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating. We find TFLA is really fast! 🧵(1/11)

maxmbeck's tweet photo. Yesterday, we shared the details on our xLSTM 7B architecture. Now, let's go one level deeper🧑‍🔧

We introduce

⚡️Tiled Flash Linear Attention (TFLA), ⚡️

A new kernel algorithm for the mLSTM and other Linear Attention variants with Gating.

We find TFLA is really fast!

🧵(1/11) https://t.co/SdGk9OAyhH

3

344

59

208

48K

wehungpatil retweeted

Maximilian Beck @maxmbeck

over 1 year ago

📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨 We optimized the architecture with two goals in mind: - Efficiency (in Training and Inference) and - Stability 🧵(1/7)

maxmbeck's tweet photo. 📢🔔I am excited to share the details on our optimized xLSTM architecture for our xLSTM 7B model!🚨

We optimized the architecture with two goals in mind:

- Efficiency (in Training and Inference)
and
- Stability

🧵(1/7) https://t.co/RXRZQWiinY

8

323

60

177

45K

wehungpatil retweeted

Korbinian Poeppel @KorbiPoeppel

over 1 year ago

Check out our latest work on scaling up xLSTM to 7B parameters and 2.3T tokens, with all open training data, open training protocol and open training code. Nice team work! 💪💪

0

9

2

0

273

Vihang Patil @wehungpatil

over 1 year ago

Great place to work 😃

Sepp Hochreiter @HochreiterSepp

over 1 year ago

Join Our Research Team in Linz! We are looking for 5 PostDocs and 10 PhDs in Machine Learning working on xLSTM, NLP, robustness, learning theory. Deadline: 04/20/25. More details: https://t.co/FLcOWZJzPQ #MachineLearning #DeepLearning #ResearchOpportunities #PhDPositions

1

72

19

16

6K

0

1

0

55

wehungpatil retweeted

Lucas Beyer (bl16)

@giffmana

over 1 year ago

Everything old is new again. Mamba/ssm folks should really google their "new idea + lstm" please. About a decade ago, people have tried a shitton of things with lstms. Nothing wrong with retrying with modern tools, but ack the past. This is not the first such case I see btw.

giffmana's tweet photo. Everything old is new again.

Mamba/ssm folks should really google their "new idea + lstm" please. About a decade ago, people have tried a shitton of things with lstms. Nothing wrong with retrying with modern tools, but ack the past.

This is not the first such case I see btw. https://t.co/6bZNSEdlyl

23

614

58

266

121K

wehungpatil retweeted

Lukas Aichberger @aichberger

over 1 year ago

𝗡𝗲𝘄 𝗣𝗮𝗽𝗲𝗿 𝗔𝗹𝗲𝗿𝘁: Rethinking Uncertainty Estimation in Natural Language Generation 🌟 Introducing 𝗚-𝗡𝗟𝗟, a theoretically grounded and highly efficient uncertainty estimate, perfect for scalable LLM applications 🚀 Dive into the paper 👇https://t.co/hOEhuWloqN

5

140

36

88

21K

wehungpatil retweeted

Korbinian Poeppel @KorbiPoeppel

over 1 year ago

Thrilled to announce two new developments at JKU and NXAI that are released today: - We scaled xLSTM to 7B parameters: https://t.co/jJqQk2HFvq - For the people caring about state tracking capabilities, there's the new FlashRNN library: https://t.co/zirQOaR6Wv

KorbiPoeppel's tweet photo. Thrilled to announce two new developments at JKU and NXAI that are released today:
- We scaled xLSTM to 7B parameters: https://t.co/jJqQk2HFvq
- For the people caring about state tracking capabilities, there's the new FlashRNN library: https://t.co/zirQOaR6Wv https://t.co/qSd0IGrHa5

2

27

11

4

2K

wehungpatil retweeted

Niklas Schmidinger

@smdrnks

over 1 year ago

We are excited to introduce Bio-xLSTM! TLDR: we extend xLSTM to genomic, protein and molecular domains and find that it is a proficient generative model, learns rich representations and can perform in-context learning.

1

28

12

5

4K

wehungpatil retweeted

Günter Klambauer @gklambauer

over 1 year ago

Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences xLSTM also shines for DNA, proteins and small molecules -- can handle large-range interactions and huge context! P: https://t.co/kvd9gdrM7C

gklambauer's tweet photo. Bio-xLSTM: Generative modeling, representation and in-context learning of biological and chemical sequences

xLSTM also shines for DNA, proteins and small molecules -- can handle large-range interactions and huge context!

P: https://t.co/kvd9gdrM7C https://t.co/Eu6NQQDBAy

0

154

39

57

20K

Vihang Patil @wehungpatil

over 1 year ago

@techphilo_art @HochreiterSepp We do compare against the transformer in our experiments. You can find them here: https://t.co/EpewrHScQa

0

14

wehungpatil retweeted

Sepp Hochreiter @HochreiterSepp

over 1 year ago

xLSTM as large recurrent action model. xLSTM has the potential to enter the field of robotics as it is much faster than transformers at inference. xLSTM can close the reality-gap by online learning in applications like robotics, self-driving, automated production systems. Cool.

3

176

31

69

17K

wehungpatil retweeted

Günter Klambauer @gklambauer

over 1 year ago

A LARGE RECURRENT ACTION MODEL: xLSTM enables Fast Inference for Robotics Tasks In robotics & embodied AIs, very fast inference is needed which is prohibitive for Transformers. xLSTM is well suited because of its recurrent inference mode. P: https://t.co/JnGsGFxXIJ

gklambauer's tweet photo. A LARGE RECURRENT ACTION MODEL: xLSTM enables Fast Inference for Robotics Tasks

In robotics & embodied AIs, very fast inference is needed which is prohibitive for Transformers. xLSTM is well suited because of its recurrent inference mode.

P: https://t.co/JnGsGFxXIJ https://t.co/8Cmi7DoHcS

1

16

3

0

1K

wehungpatil retweeted

Sayan Ranu @SayanRanu

over 1 year ago

Graph distillation compresses massive graph datasets into tiny versions that train GNNs as effectively as the original. But current methods have a huge problem.. They require training on the full data first—which defeats the whole purpose! Enter Bonsai(https://t.co/holnYG4SVq)

3

42

8

17

6K

wehungpatil retweeted

Kajetan Schweighofer @kschweig_

over 1 year ago

Deep Ensembles are widely used to improve the performance of Deep Learning models. But beware, they can have profound impact on group fairness ⚖️ We analyzed why it happens and what can be done about it 🧵👇