🚨New paper: Stochastic activations
We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.
I'm looking for a postdoc to work on Separation Logic in Lean. Position in the London Meta office, in the new AI Verification team. Possibly collaborating with or building on external work on CSLib, Iris-Lean and loom.@AIatMeta@leanprover
https://t.co/HAOutBa3A4
An interesting attention mechanism from @AIatMeta: SP-KV (Self-Pruned Key-Value Attention)
The model learns which tokens are likely to be useful for future attention and only keeps their key-value pairs in the persistent KV cache.
For every token and attention head, a small utility predictor (a 2-layer MLP) computes a utility score, and older tokens are selectively pruned based on it.
At the same time, attention becomes hybrid, because SP-KV still keeps a local sliding window fully available for short-range interactions
This approach:
- reduces KV cache size by about 3× to 10×
- allows to compress longer contexts more
- improves decoding speed and memory bandwidth
🚨 Do LLMs need to store everything they read in memory?
To reduce KV cache size and improve decoding speeds, we propose Self-Pruned KV attention, a mechanism where the model learns to decide which KVs to write in the persistent KV cache, discarding all the rest! @AIatMeta🧵
Introducing - AIRS Bench, a benchmark for “AI Researcher Agent”. Agents attempt 20 open ML problems starting from zero code (full research loop). And yes, they beat SOTA in few cases (read more below!) https://t.co/npx0JbRYPo
Working on attention architectures?
Be careful!
Our new paper accepted at #ICLR2026 shows that in hybrid architectures, longer sliding windows actually degrade long-context performance.
paper: https://t.co/Q4Q8C9TLp1
Thanks to my co-authors @maxmbeck and the team at @Meta.
Our team in FAIR at Meta is hiring a postdoc researcher!
We work on the topics of Reasoning, Alignment and Memory/architectures (RAM).
Apply here: https://t.co/dWtpz7rttT
Location: NY, Seattle or Menlo Park.
Some of our recent work to give flavor:
Co-Improvement (position): https://t.co/XPwbsuCUI6
SPICE (Self-Play in Corpus Environments): https://t.co/47BarIr0uM
Self-Challenging Agents: https://t.co/qgDLmchn8X
RL from Human Interaction: https://t.co/wmC2fVByp2
AggLM (parallel aggregation): https://t.co/Fg0E31aOIy
StepWiser (CoT-PRM RL): https://t.co/QbfBVYx522
DARLING (diversity-trained RL): https://t.co/J9ZSs8GVyX
J1 (RL-trained LLM-as-Judge): https://t.co/yG6xAPaNJ3
CoT-Self-Instruct: https://t.co/dHMYRxtv5h
Multi-Token Attention: https://t.co/4kfUe8KozT
📢 New PhD Position 📢
We (@_rockt, @borruell, and I) are looking for a PhD student to work at the intersection of open-endedness and game design. The student will be part of the @UCL_DARK lab and funded by @iconicgamesio and UCL.
See this doc for a more detailed description of the research direction and candidate expectations:
https://t.co/eYsFKlgCJt
To apply, please complete this form by January 15:
https://t.co/UOGva9iBvJ
Our team @Meta Superintelligence Labs is hiring current PhD students for 3-6 month, paid internships to work with us in London on reinforcement learning post-training of LLM agents.
If this sounds interesting move fast and apply today at: https://t.co/QgWBIUuACj
Looking for one of the most exciting Phd positions in ML this season? I have some news for you..
Joao Henriques (https://t.co/6XysRQ08BE) and I (https://t.co/WdhorwrBCj) are again hiring a fully funded PhD student (UK/international) for the FAIR-Oxford program. The successful student will spend 50% of their time @UniofOxford and 50% @meta
(FAIR), while completing a DPhil (Oxford PhD). The deadline is 1st of December anywhere on earth!
The goal is to make foundational discoveries in Machine Learning, in particular in the area of AI Research Agents.
Apply by emailing a CV, personal statement, and research proposal to “[email protected]” by 1st of Dec AOE.
Please make sure you include a 280 character TL;DR summary of why you are the perfect candidate in your email. Joint interviews will be held in January/February. Shortlisted candidates will also be invited to apply to FAIR / @meta.
Candidates also need to apply for a DPhil in the Engineering science department again by 1st of Dec AOE (if they haven’t already) listing me as the supervisor: https://t.co/02dCTsNZIK.
Candidates should have an outstanding track record of academic excellence and relevant research experience.
@black_in_ai@_LXAI@QueerinAI@WiMLworkshop
FAIR is hiring interns for 2026!
If you're interested in a stint doing fundamental AI research with us @AIatMeta, interested students enrolled in a PhD program can apply below👇:
https://t.co/PrG9L625bY
🚨New paper: Stochastic activations
We introduce stochastic activations. This novel strategy consists of randomly selecting between several non-linear functions in the feed-forward layers of a large language model.
This work was done with my amazing collaborators @DouzeMatthijs, Gergely Szilvasy, @lofiwolfi, @jadecopet, @tesatory, @jaseweston, @syhw, Pierre-Emmanuel Mazaré and @hjegou. Check out further details in our paper : https://t.co/psLItHuG63 . Thank you for reading!
A fun bonus is that the StochA activations stochasticity can be leveraged during generation to encourage diversity without relying on arbitrarily setting the temperature for temperature sampling.