nilay @3bitlemon - Twitter Profile

RAG, compression, long-context are different bets on the price of recall under different reuse regimes. the variable to watch is long-context models. as their readers get less leaky (less lost-in-the-middle, less attention dilution), "stuff everything in" gets closer to optimal. when that happens, retrieval stops being a quality lever and becomes pure cost engineering.

0

26

nilay

@3bitlemon

30 days ago

context engineering is recall engineering (for now) QA papers rank models on F1, the harmonic mean of precision and recall. higher the F, better the model. SOTA on HotpotQA sits around 0.85. but cost does not scale linearly with context length. you would much rather take F1 of 0.82 at half the token cost. so efficiency evaluation has to weigh F1 and cost together, tuneable at deployment. that is the premise. common assumption is that compaction is costlier than reading from current context. however, this cost gets amortised, based on if we need some information repeatedly through sessions. hence efficiency is better represented as EfficiencyScore(w) = w · F1 − (1−w) · log(EffectiveTokens) the deeper read is that context engineering is recall engineering, for now. supporting facts must be in the prompt or the model cannot answer. the only real question is how cheaply you preserve that recall. RAG, compression, long-context are different bets on the price of recall under different reuse regimes. the variable to watch is long-context models. as their readers get less leaky (less lost-in-the-middle, less attention dilution), "stuff everything in" gets closer to optimal. when that happens, retrieval stops being a quality lever and becomes pure cost engineering. https://t.co/8TEUCiHbWv

3bitlemon's tweet photo. context engineering is recall engineering (for now)

QA papers rank models on F1, the harmonic mean of precision and recall. higher the F, better the model. SOTA on HotpotQA sits around 0.85. but cost does not scale linearly with context length. you would much rather take F1 of 0.82 at half the token cost. so efficiency evaluation has to weigh F1 and cost together, tuneable at deployment. that is the premise.

common assumption is that compaction is costlier than reading from current context. however, this cost gets amortised, based on if we need some information repeatedly through sessions. hence efficiency is better represented as

EfficiencyScore(w) = w · F1 − (1−w) · log(EffectiveTokens)

the deeper read is that context engineering is recall engineering, for now. supporting facts must be in the prompt or the model cannot answer. the only real question is how cheaply you preserve that recall.

RAG, compression, long-context are different bets on the price of recall under different reuse regimes. the variable to watch is long-context models. as their readers get less leaky (less lost-in-the-middle, less attention dilution), "stuff everything in" gets closer to optimal. when that happens, retrieval stops being a quality lever and becomes pure cost engineering.

https://t.co/8TEUCiHbWv

0

1

0

40

nilay

@3bitlemon

about 1 month ago

@TheMindLeverage ai automation is having a moment rn but audience-monetization is the harder unlock imo. been deep in autonomous agents w/ harness tooling for sandboxed runs myself. smashed follow, lets connect

0

5

nilay

@3bitlemon

about 1 month ago

@pushsaas rn building autonomous background agents w/ a harness for sandboxed execution, the mid-run error recovery layer is what im obsessed with. love the directories play, niche but pays. just followed, lets vibe

0

7

nilay

@3bitlemon

about 1 month ago

@Mostafakhafagyy founderlens looks clean, the public-build angle is a nice corner to be in. been on autonomous agents + harness for sandboxed execution myself, mid-run error recovery is where it gets fun. just followed, lets connect

1

0

19

nilay

@3bitlemon

about 1 month ago

@Gltchdctr ai-native workflows + devtools is such a sharp corner to be in rn. been deep in harness stuff for sandboxed agents myself, mid-run error recovery is the rabbit hole. followed btw, lets connect and trade notes

1

0

15

nilay

@3bitlemon

about 1 month ago

@RomanMoska1enko @uxbystefan Followed!

0

9

nilay

@3bitlemon

Last Seen Users on Sotwe

Trends for you

Most Popular Users