8/8
Bigger context windows are useful.
But for long-horizon agents, reliability often comes from smaller, cleaner context.
The goal is not “remember everything.”
The goal is “present the right state at action time.”
4/8
For a browser or enterprise workflow agent, recent tool calls matter because they describe current execution state.
Which page?
Which record?
Which field?
Which validation error?
Which API response?
Which action is pending?
This state gets stale quickly.
7/8
A practical context manager should track at least four buckets:
- current tool state
- durable task constraints
- unresolved questions
- invalidated assumptions
The last bucket is underrated. Agents need a way to remember that something is no longer true.
3/8
The useful split is:
recent raw state → keep mostly intact
older durable facts → summarize
invalidated assumptions → mark as invalid
completed steps → compress
irrelevant tool noise → drop
That is very different from “just summarize the chat.”
recently on june 8, 2026 i read the paper, 𝘓𝘦𝘴𝘴 𝘊𝘰𝘯𝘵𝘦𝘹𝘵, 𝘉𝘦𝘵𝘵𝘦𝘳 𝘈𝘨𝘦𝘯𝘵𝘴(https://t.co/FiFTxsHHBA), which tested long-horizon tool-using agents on enterprise expense itemization using microsoft dynamics 365 finance and operations with mcp tools.
Here is the 8 learning lesson that i learnt from it.
2/8
Full transcript retention feels safe because the model has “everything.” But everything includes:
old tool results
failed plans
duplicate observations
stale UI state
corrected assumptions
irrelevant intermediate reasoning
More context can mean more ways to attend to the wrong thing.
1/8
Long-context agents should not treat the context window like a database. A context window is a working set.
It should contain what the agent needs for the next few decisions, not every raw observation since the task started.
@elonmusk@Sajwani@SenWarren every goal is possible, you just need to think and execute that big.
even being a trillionaire was not easy, but you did it.
and i am sure you will be the first quadtrillionaire too
my ideal X circle:
people who read papers
people who ship products
people who break things
people who explain simply
people who are not allergic to code
people who care about taste
people who think “evals” before “demo”
if you are so, lets connect
my neighbour uncle now has a 1.5 𝗰𝗿 𝗽𝗮𝗰𝗸𝗮𝗴𝗲 and he works as a senior engineering manager at MongoDB.
he completed his undergraduate degree in 2003 from NIT. He also helped both of his sons get job at amzn and salesforce through connections.
he has 18 years + experience of working in the IT & Sales sector and has accumulated enough wealth to retire peacefully + he has 3.2 l / month rental income from property in noida.
both of his sons are doing great and have got jobs with their father's connection. People who joined IT before AI in the 2000s are the luckiest people.
sometimes i feel very jealous of him.
my neighbour uncle now has a 1.5 𝗰𝗿 𝗽𝗮𝗰𝗸𝗮𝗴𝗲 and he works as a senior engineering manager at MongoDB.
he completed his undergraduate degree in 2003 from NIT. He also helped both of his sons get job at amzn and salesforce through connections.
he has 18 years + experience of working in the IT & Sales sector and has accumulated enough wealth to retire peacefully + he has 3.2 l / month rental income from property in noida.
both of his sons are doing great and have got jobs with their father's connection. People who joined IT before AI in the 2000s are the luckiest people.
sometimes i feel very jealous of him.
KV cache is a nice example of how much of LLM engineering is just avoiding repeated work.
During generation, the new token needs to attend to old tokens. The old tokens have already produced their keys and values, and they are not changing. So instead of recomputing them every step, the model stores them and reuses them. That stored state is the KV cache.
This makes decoding much faster, but it moves the pressure somewhere else: memory. Longer context means a larger cache. More layers, heads, batch size, and concurrent requests mean more memory pressure.
So, I made a video explaining the KV cache in detail 👇.