Darrow @darrowoah - Twitter Profile

𝐓𝐡𝐞 𝐁𝐢𝐭𝐭𝐞𝐫 𝐋𝐞𝐬𝐬𝐨𝐧 𝐨𝐟 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐌𝐞𝐦𝐨𝐫𝐲: memory should be a derived capability that exists because it makes an agent better at acting over time. 𝐖𝐨𝐫𝐥𝐝𝐌𝐞𝐦𝐀𝐫𝐞𝐧𝐚 is designed around this principle. Rather than evaluating memory as a storage problem, WorldMemArena evaluates memory through 𝐚𝐜𝐭𝐢𝐨𝐧–𝐰𝐨𝐫𝐥𝐝 𝐢𝐧𝐭𝐞𝐫𝐚𝐜𝐭𝐢𝐨𝐧, instrumenting the full write → maintain → retrieve → use lifecycle across 400 multimodal, multi-session tasks. And it exposes the findings that should mark the end of the storage-centric era: → Storage ≠ use. Better memory storage and retrieval do not necessarily produce better task performance. Optimizing the component we designed does not optimize the capability we actually care about. → Harness-based memory performs best where memory is hardest. Agents that can write files, reorganize context, create artifacts, and interact with persistent environments adapt most effectively in long-horizon settings. They are costly and unstable today, which is exactly what many Bitter Lesson transitions look like before scaling and learning take over. The deeper move is in what gets measured. Memory shouldn't get a score; it should be inferred from capability: how much does remembering improve performance over time. WorldMemArena drags evaluation off the static object and into the action–world loop, the only place you can tell whether an agent has developed memory or is just simulating it convincingly.

xwang_lk's tweet photo. 𝐓𝐡𝐞 𝐁𝐢𝐭𝐭𝐞𝐫 𝐋𝐞𝐬𝐬𝐨𝐧 𝐨𝐟 𝐀𝐠𝐞𝐧𝐭𝐢𝐜 𝐌𝐞𝐦𝐨𝐫𝐲: memory should be a derived capability that exists because it makes an agent better at acting over time.

𝐖𝐨𝐫𝐥𝐝𝐌𝐞𝐦𝐀𝐫𝐞𝐧𝐚 is designed around this principle. Rather than evaluating memory as a storage problem, WorldMemArena evaluates memory through 𝐚𝐜𝐭𝐢𝐨𝐧–𝐰𝐨𝐫𝐥𝐝 𝐢𝐧𝐭𝐞𝐫𝐚𝐜𝐭𝐢𝐨𝐧, instrumenting the full write → maintain → retrieve → use lifecycle across 400 multimodal, multi-session tasks.

And it exposes the findings that should mark the end of the storage-centric era:
→ Storage ≠ use. Better memory storage and retrieval do not necessarily produce better task performance. Optimizing the component we designed does not optimize the capability we actually care about.
→ Harness-based memory performs best where memory is hardest. Agents that can write files, reorganize context, create artifacts, and interact with persistent environments adapt most effectively in long-horizon settings. They are costly and unstable today, which is exactly what many Bitter Lesson transitions look like before scaling and learning take over.

The deeper move is in what gets measured. Memory shouldn't get a score; it should be inferred from capability: how much does remembering improve performance over time.

WorldMemArena drags evaluation off the static object and into the action–world loop, the only place you can tell whether an agent has developed memory or is just simulating it convincingly.

4

118

28

67

13K

Darrow

@darrowoah

about 6 hours ago

@techwithhannahm yes

0

1

0

7

Darrow

@darrowoah

about 6 hours ago

@Scobleizer Which one do you pick?

1

0

43

darrowoah retweeted

Nishkarsh

@contextkingceo

2 days ago

Introducing HydraDB. The graph native context infrastructure for agents. Purpose built to deliver precise context & observability into why agents act the way they do. We've always believed graphs are the best way to manage AI context, but they've been too expensive to scale or impractical for storing full context. Until now. @hydra_db combines in memory, NVMe, and object storage into a single graph layer, making context delivery faster, cheaper, and more precise. We want context delivery to be extremely fast, 1000x cheap, and highly precise. Give your agents a brain.

230

2K

431

2K

2M

Darrow

@darrowoah

2 days ago

@CathPoaster Thank you!

0

1

0

165

Darrow

@darrowoah

2 days ago

@CathPoaster try it out here https://t.co/w5p74Ag01X

1

8

1

0

1K

Darrow

@darrowoah

3 days ago

@sammysintech balboa cafe!

0

1

0

163

Darrow

@darrowoah

3 days ago

can AI be funny? i'm trying to find out for my @CS153Systems final project takes 2 min answer a few questions about your sense of humor and help me make AI comedy less terrible: https://t.co/w5p74Ag01X

1

11

1

168

Darrow

@darrowoah

3 days ago

@nikos1 those lines look a little wobbly

0

1

0

116

Darrow

@darrowoah

3 days ago

@thenowhereway vercel is just so easy to use and deploy with

0

1

0

42

Darrow

@darrowoah

3 days ago

@regulargio what do you think will change in the next 15 years that will enable 200 year lifespans?

2

1

0

246

Darrow

@darrowoah

4 days ago

no matter how I write my emails chatgpt always finds a way to critique them

0

73

Darrow

@darrowoah

4 days ago

@indefeasible_ @polsia hey sorry I missed this! social capital is the credibility/goodwill you have with others and it gives you ability to "spend" that social capital on meetings, intros, etc

0

1

0

28

Darrow

@darrowoah

11 days ago

I tried @polsia and I think it's missing giving their agents social capital to get customers on their side. Granted, social capital is a hard thing to give AI agents

1

3

0

147