Leon @iamleonli - Twitter Profile

Pinned Tweet

4 days ago

How far can we compress the discrete tokens in an LLM's context into compact latent vectors? With the right training recipe at large scale, our Latent Context Language Models (LCLMs) compress context up to 16× and land on a new Pareto frontier for long-context inference. 🧵(1/n)

iamleonli's tweet photo. How far can we compress the discrete tokens in an LLM's context into compact latent vectors?

With the right training recipe at large scale, our Latent Context Language Models (LCLMs) compress context up to 16× and land on a new Pareto frontier for long-context inference. 🧵(1/n) https://t.co/uVXT8hHxBc

1

60

22

13

7K

iamleonli retweeted

Sean McLeish

@SeanMcleish

3 days ago

Humans don’t maintain exact, line-by-line recall of huge contexts like full codebases or long legal documents. We keep a high-level mental model, then look things up when precision matters. We enable LLMs to do this, with high speed.

0

14

3

2K

iamleonli retweeted

Pavel Izmailov

@Pavel_Izmailov

3 days ago

New paper: Latent Context Language Models (LCLMs)! Idea: encode 16 tokens as 1 latent token, and have the LLM work on top of the latent tokens. Result: general-purpose model with much better performance / speed / memory usage frontier.

Pavel_Izmailov's tweet photo. New paper: Latent Context Language Models (LCLMs)!

Idea: encode 16 tokens as 1 latent token, and have the LLM work on top of the latent tokens. Result: general-purpose model with much better performance / speed / memory usage frontier. https://t.co/ldsBOVkmFF

3

214

27

164

16K

iamleonli retweeted

Tanishq Mathew Abraham, Ph.D.

@iScienceLuvr

5 days ago

End-to-End Context Compression at Scale Encoder-decoder compressors - map a long token sequence to a shorter sequence of latent embeddings, not competitive with KV cache compression. This work revisits encoder-decoder compression. Perform an architecture search, pre-training many variants from scratch to determine how best to design and train encoder-decoder compressors. Continually pre-train a family of 0.6B-encoder, 4B-decoder models on over 350B tokens each, at compression ratios of 1:4, 1:8, and 1:16. "We introduce Latent Context Language Models (LCLMs), a family of compressors that improve the Pareto frontier across general-task performance, compression speed, and peak memory usage."

iScienceLuvr's tweet photo. End-to-End Context Compression at Scale

Encoder-decoder compressors - map a long token sequence to a shorter sequence of latent embeddings, not competitive with KV cache compression.

This work revisits encoder-decoder compression.

Perform an architecture search, pre-training many variants from scratch to determine how best to design and train encoder-decoder compressors.

Continually pre-train a family of 0.6B-encoder, 4B-decoder models on over 350B tokens each, at compression ratios of 1:4, 1:8, and 1:16.

"We introduce Latent Context Language Models (LCLMs), a family of compressors that improve the Pareto frontier across general-task performance, compression speed, and peak memory usage."

1

128

16

106

10K

Leon

@iamleonli

4 days ago

📄 Paper: https://t.co/16hQ7sUBV9 🤗 Models https://t.co/76TBx76Ups 💻 Code https://t.co/yYwV9yqnWm (12/n)

0

8

1

321

Leon

@iamleonli

4 days ago

How far can we compress the discrete tokens in an LLM's context into compact latent vectors? With the right training recipe at large scale, our Latent Context Language Models (LCLMs) compress context up to 16× and land on a new Pareto frontier for long-context inference. 🧵(1/n)

1

60

22

13

7K

Leon

@iamleonli

4 days ago

Grateful to @SeanMcleish @tonychenxyz @qw3rtman @tomgoldsteincs @LotfiSanae @micahgoldblum @Pavel_Izmailov and the whole team 🙏 Thanks to @TongPetersb @pfactorialz @EBorgnia for helpful discussions. Thanks @tonychenxyz for drafting the tweet! (11/n)

1

6

0

381

iamleonli retweeted

Micah Goldblum @micahgoldblum

4 days ago

We trained language models that compress massive contexts into tiny latent representations. Latent Context Language Models (LCLMs) outperform existing KV cache compression methods on the latency/accuracy frontier. 🧵1/10

micahgoldblum's tweet photo. We trained language models that compress massive contexts into tiny latent representations. Latent Context Language Models (LCLMs) outperform existing KV cache compression methods on the latency/accuracy frontier. 🧵1/10 https://t.co/AOmchi7qlw

14

426

63

317

51K

iamleonli retweeted

Pavel Izmailov

@Pavel_Izmailov

15 days ago

Super excited to share this work. We RL an LLM on a completely new narrow task and extract activation directions for "I did a good / bad action". We find these vectors modulate behavior in all kinds of other situations, align with emotion vectors and track goals. 🧵

Pavel_Izmailov's tweet photo. Super excited to share this work. We RL an LLM on a completely new narrow task and extract activation directions for "I did a good / bad action". We find these vectors modulate behavior in all kinds of other situations, align with emotion vectors and track goals.

🧵 https://t.co/cql8Mvndyg

7

139

21

97

12K

iamleonli retweeted

Andy Han @andy_q_han

15 days ago

We RL LLMs and extract concept vectors for “I did a high/low-reward action”. Turns out these vectors modulate sentiment, confidence, backtracking and refusal in unrelated situations! We argue they form a *functional welfare axis*. (w/ @davidchalmers42 & @Pavel_Izmailov)

andy_q_han's tweet photo. We RL LLMs and extract concept vectors for “I did a high/low-reward action”. Turns out these vectors modulate sentiment, confidence, backtracking and refusal in unrelated situations! We argue they form a *functional welfare axis*.
(w/ @davidchalmers42 & @Pavel_Izmailov) https://t.co/zopEc9wZye

7

123

26

75

34K

iamleonli retweeted

Martin Marek

@mrtnm

17 days ago

New paper! "Forgetting in Language Models: Capacity, Optimization, and Self-Generated Replay"

1

28

3

4

3K

iamleonli retweeted

Pavel Izmailov

@Pavel_Izmailov

17 days ago

New paper: https://t.co/LGbYhYytbt The main idea is that we can use an LLM to generate its own replay data to prevent forgetting, as long as we have spare capacity. Very overtrained models have to forget to learn new information.

Pavel_Izmailov's tweet photo. New paper: https://t.co/LGbYhYytbt

The main idea is that we can use an LLM to generate its own replay data to prevent forgetting, as long as we have spare capacity. Very overtrained models have to forget to learn new information. https://t.co/MSG1epE10F

4

168

26

98

14K

Leon

@iamleonli

Last Seen Users on Sotwe

Trends for you

Most Popular Users