RAG, compression, long-context are different bets on the price of recall under different reuse regimes. the variable to watch is long-context models. as their readers get less leaky (less lost-in-the-middle, less attention dilution), "stuff everything in" gets closer to optimal. when that happens, retrieval stops being a quality lever and becomes pure cost engineering.
context engineering is recall engineering (for now)
QA papers rank models on F1, the harmonic mean of precision and recall. higher the F, better the model. SOTA on HotpotQA sits around 0.85. but cost does not scale linearly with context length. you would much rather take F1 of 0.82 at half the token cost. so efficiency evaluation has to weigh F1 and cost together, tuneable at deployment. that is the premise.
common assumption is that compaction is costlier than reading from current context. however, this cost gets amortised, based on if we need some information repeatedly through sessions. hence efficiency is better represented as
EfficiencyScore(w) = w · F1 − (1−w) · log(EffectiveTokens)
the deeper read is that context engineering is recall engineering, for now. supporting facts must be in the prompt or the model cannot answer. the only real question is how cheaply you preserve that recall.
RAG, compression, long-context are different bets on the price of recall under different reuse regimes. the variable to watch is long-context models. as their readers get less leaky (less lost-in-the-middle, less attention dilution), "stuff everything in" gets closer to optimal. when that happens, retrieval stops being a quality lever and becomes pure cost engineering.
https://t.co/8TEUCiHbWv
@TheMindLeverage ai automation is having a moment rn but audience-monetization is the harder unlock imo. been deep in autonomous agents w/ harness tooling for sandboxed runs myself. smashed follow, lets connect
@pushsaas rn building autonomous background agents w/ a harness for sandboxed execution, the mid-run error recovery layer is what im obsessed with. love the directories play, niche but pays. just followed, lets vibe
@Mostafakhafagyy founderlens looks clean, the public-build angle is a nice corner to be in. been on autonomous agents + harness for sandboxed execution myself, mid-run error recovery is where it gets fun. just followed, lets connect
@Gltchdctr ai-native workflows + devtools is such a sharp corner to be in rn. been deep in harness stuff for sandboxed agents myself, mid-run error recovery is the rabbit hole. followed btw, lets connect and trade notes