Maria Lomeli @MariaLomeli_ - Twitter Profile

Pinned Tweet

8 months ago

🚨New paper: Stochastic activations We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.

8

123

16

74

61K

MariaLomeli_ retweeted

Peter O'Hearn @PeterOHearn12

3 days ago

I'm looking for a postdoc to work on Separation Logic in Lean. Position in the London Meta office, in the new AI Verification team. Possibly collaborating with or building on external work on CSLib, Iris-Lean and loom.@AIatMeta @leanprover https://t.co/HAOutBa3A4

1

64

20

16

7K

MariaLomeli_ retweeted

Turing Post

@TheTuringPost

21 days ago

An interesting attention mechanism from @AIatMeta: SP-KV (Self-Pruned Key-Value Attention) The model learns which tokens are likely to be useful for future attention and only keeps their key-value pairs in the persistent KV cache. For every token and attention head, a small utility predictor (a 2-layer MLP) computes a utility score, and older tokens are selectively pruned based on it. At the same time, attention becomes hybrid, because SP-KV still keeps a local sliding window fully available for short-range interactions This approach: - reduces KV cache size by about 3× to 10× - allows to compress longer contexts more - improves decoding speed and memory bandwidth

TheTuringPost's tweet photo. An interesting attention mechanism from @AIatMeta: SP-KV (Self-Pruned Key-Value Attention)

The model learns which tokens are likely to be useful for future attention and only keeps their key-value pairs in the persistent KV cache.

For every token and attention head, a small utility predictor (a 2-layer MLP) computes a utility score, and older tokens are selectively pruned based on it.

At the same time, attention becomes hybrid, because SP-KV still keeps a local sliding window fully available for short-range interactions

This approach:
- reduces KV cache size by about 3× to 10×
- allows to compress longer contexts more
- improves decoding speed and memory bandwidth

5

209

38

145

12K

Maria Lomeli @MariaLomeli_

22 days ago

This solid piece of work just came out of the oven. Congrats to @ManuelFaysse and @algoriddle for leading this amazing paper 🎉

Manuel Faysse

@ManuelFaysse

22 days ago

🚨 Do LLMs need to store everything they read in memory? To reduce KV cache size and improve decoding speeds, we propose Self-Pruned KV attention, a mechanism where the model learns to decide which KVs to write in the persistent KV cache, discarding all the rest! @AIatMeta🧵

ManuelFaysse's tweet photo. 🚨 Do LLMs need to store everything they read in memory?
To reduce KV cache size and improve decoding speeds, we propose Self-Pruned KV attention, a mechanism where the model learns to decide which KVs to write in the persistent KV cache, discarding all the rest! @AIatMeta🧵 https://t.co/5UeHSpusGo

8

204

45

148

21K

0

12

2

11

3K

Who to follow

Chen Zhao

@henryzhao4321

Assistant Professor NYU Shanghai, Postdoc NYU, PhD @umdclip doing NLP research, bridge player

Roberta Raileanu

@robertarail

Open-Ended Team Lead and Senior Staff Research Scientist @GoogleDeepMind. Honorary Lecturer @UCL. ex @Meta | @NYU | @Princeton.

Violet Peng

@VioletNPeng

Associated Professor@UCLA-CS. Research NLP, AI creativity, controllable generation, model evaluation, computational journalism, event. (she/her/hers)

MariaLomeli_ retweeted

Bhavul Gauri @BhavulGauri

4 months ago

Introducing - AIRS Bench, a benchmark for “AI Researcher Agent”. Agents attempt 20 open ML problems starting from zero code (full research loop). And yes, they beat SOTA in few cases (read more below!) https://t.co/npx0JbRYPo

4

73

12

37

16K

Maria Lomeli @MariaLomeli_

4 months ago

Thank you for the interest in the position. I won’t be able to reply to messages or emails about it. Please apply directly via the link.

0

350

Maria Lomeli @MariaLomeli_

4 months ago

My team at Meta FAIR (part of Superintellingence labs) is hiring a research scientist intern, check out the details here: https://t.co/g5Hmk1CBtE

4

220

15

230

21K

MariaLomeli_ retweeted

Loic cabannes @loiccabannes

4 months ago

Working on attention architectures? Be careful! Our new paper accepted at #ICLR2026 shows that in hybrid architectures, longer sliding windows actually degrade long-context performance. paper: https://t.co/Q4Q8C9TLp1 Thanks to my co-authors @maxmbeck and the team at @Meta.

loiccabannes's tweet photo. Working on attention architectures?
Be careful!

Our new paper accepted at #ICLR2026 shows that in hybrid architectures, longer sliding windows actually degrade long-context performance.

paper: https://t.co/Q4Q8C9TLp1
Thanks to my co-authors @maxmbeck and the team at @Meta. https://t.co/22h2b5FBLn

11

259

21

185

19K

MariaLomeli_ retweeted

Jason Weston

@jaseweston

5 months ago

Our team in FAIR at Meta is hiring a postdoc researcher! We work on the topics of Reasoning, Alignment and Memory/architectures (RAM). Apply here: https://t.co/dWtpz7rttT Location: NY, Seattle or Menlo Park. Some of our recent work to give flavor: Co-Improvement (position): https://t.co/XPwbsuCUI6 SPICE (Self-Play in Corpus Environments): https://t.co/47BarIr0uM Self-Challenging Agents: https://t.co/qgDLmchn8X RL from Human Interaction: https://t.co/wmC2fVByp2 AggLM (parallel aggregation): https://t.co/Fg0E31aOIy StepWiser (CoT-PRM RL): https://t.co/QbfBVYx522 DARLING (diversity-trained RL): https://t.co/J9ZSs8GVyX J1 (RL-trained LLM-as-Judge): https://t.co/yG6xAPaNJ3 CoT-Self-Instruct: https://t.co/dHMYRxtv5h Multi-Token Attention: https://t.co/4kfUe8KozT

10

262

44

166

33K

MariaLomeli_ retweeted

Roberta Raileanu @robertarail

5 months ago

📢 New PhD Position 📢 We (@_rockt, @borruell, and I) are looking for a PhD student to work at the intersection of open-endedness and game design. The student will be part of the @UCL_DARK lab and funded by @iconicgamesio and UCL. See this doc for a more detailed description of the research direction and candidate expectations: https://t.co/eYsFKlgCJt To apply, please complete this form by January 15: https://t.co/UOGva9iBvJ

4

359

58

160

45K

MariaLomeli_ retweeted

Sam Devlin @smdvln

5 months ago

Our team @Meta Superintelligence Labs is hiring current PhD students for 3-6 month, paid internships to work with us in London on reinforcement learning post-training of LLM agents. If this sounds interesting move fast and apply today at: https://t.co/QgWBIUuACj

smdvln's tweet photo. Our team @Meta Superintelligence Labs is hiring current PhD students for 3-6 month, paid internships to work with us in London on reinforcement learning post-training of LLM agents.

If this sounds interesting move fast and apply today at: https://t.co/QgWBIUuACj https://t.co/rP018yNoGi

12

326

27

186

33K

MariaLomeli_ retweeted

Aran Komatsuzaki

@arankomatsuzaki

8 months ago

SWAX: short windows, long memory • Hybrid of sliding-window attn + xLSTM RNN • Counter-intuitive: shorter windows → better long-term recall • Fix: stochastic window sizes = strong short + long context performance • Outperforms fixed window attention

arankomatsuzaki's tweet photo. SWAX: short windows, long memory

• Hybrid of sliding-window attn + xLSTM RNN
• Counter-intuitive: shorter windows → better long-term recall
• Fix: stochastic window sizes = strong short + long context performance
• Outperforms fixed window attention https://t.co/Lg44gM7S8E

6

146

22

95

12K

MariaLomeli_ retweeted

Jakob Foerster

@j_foerst

7 months ago

Looking for one of the most exciting Phd positions in ML this season? I have some news for you.. Joao Henriques (https://t.co/6XysRQ08BE) and I (https://t.co/WdhorwrBCj) are again hiring a fully funded PhD student (UK/international) for the FAIR-Oxford program. The successful student will spend 50% of their time @UniofOxford and 50% @meta (FAIR), while completing a DPhil (Oxford PhD). The deadline is 1st of December anywhere on earth! The goal is to make foundational discoveries in Machine Learning, in particular in the area of AI Research Agents. Apply by emailing a CV, personal statement, and research proposal to “[email protected]” by 1st of Dec AOE. Please make sure you include a 280 character TL;DR summary of why you are the perfect candidate in your email. Joint interviews will be held in January/February. Shortlisted candidates will also be invited to apply to FAIR / @meta. Candidates also need to apply for a DPhil in the Engineering science department again by 1st of Dec AOE (if they haven’t already) listing me as the supervisor: https://t.co/02dCTsNZIK. Candidates should have an outstanding track record of academic excellence and relevant research experience. @black_in_ai @_LXAI @QueerinAI @WiMLworkshop

11

274

48

211

37K

MariaLomeli_ retweeted

Adina Williams @adinamwilliams

7 months ago

FAIR is hiring interns for 2026! If you're interested in a stint doing fundamental AI research with us @AIatMeta, interested students enrolled in a PhD program can apply below👇: https://t.co/PrG9L625bY

16

435

47

381

87K

Maria Lomeli @MariaLomeli_

8 months ago

@DamienTeney @randall_balestr @leonleyanghu Thank you for the references. We will add them to the v2 version of the paper.

0

3

0

47

Maria Lomeli @MariaLomeli_

8 months ago

🚨New paper: Stochastic activations We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.

8

123

16

74

61K

Maria Lomeli @MariaLomeli_

8 months ago

@jon_barron Thank you for the suggestion. We will consider it for the v2 version of the paper.

1

3

0

81

Maria Lomeli @MariaLomeli_

8 months ago

@sir4K_zen @jaseweston Thank you for the suggestion. We will consider it for the v2 version of the paper.

0

34

Maria Lomeli @MariaLomeli_

8 months ago

@getpochi @jaseweston Thank you for the suggestion. We will consider it for the v2 version of the paper.

0

1

0

36

Maria Lomeli @MariaLomeli_

8 months ago

@giffmana @jaseweston Thank you for the great suggestion, we can add it for the v2 version of the paper.

0

1

0

47

Maria Lomeli @MariaLomeli_

8 months ago

This work was done with my amazing collaborators @DouzeMatthijs, Gergely Szilvasy, @lofiwolfi, @jadecopet, @tesatory, @jaseweston, @syhw, Pierre-Emmanuel Mazaré and @hjegou. Check out further details in our paper : https://t.co/psLItHuG63 . Thank you for reading!

0

29

3

9

1K

Maria Lomeli @MariaLomeli_

8 months ago

A fun bonus is that the StochA activations stochasticity can be leveraged during generation to encourage diversity without relying on arbitrarily setting the temperature for temperature sampling.

MariaLomeli_'s tweet photo. A fun bonus is that the StochA activations stochasticity can be leveraged during generation to encourage diversity without relying on arbitrarily setting the temperature for temperature sampling. https://t.co/4vq1fjrttv

2

6

1

1K

Maria Lomeli

@MariaLomeli_

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users