@ysu_nlp@NeoCognition Continual learning assumes the run finishes coherently first. But agents still lose track mid run and cannot recover after hours of work. Get within run durability solid before you chase the across run learning loop.
@omarsar0 The bottleneck in long runs is not loop design or model capability. It is what happens after a step fails or the process crashes mid-task. Durable state, deterministic verification of each step, and clean resume move the viability line for unattended runs.
@sjivan@vincent_koc DeepGraph is research infrastructure executing long-horizon agent loops for theory discovery at scale. Single step model capability is not the constraint. The runtime must ensure trace coherence and branch recovery over hours of search.
@calcsam Persisting mid step is the right primitive. Resume is the hard part. A step that half fired an external action will double fire on replay, and persisted state can be stale against a world that moved. We ran experiments on exactly this. Happy to compare notes over DM.
Appreciate Polar’s black-box API proxy for harnesses. Researching a new OpenClaw architecture for agentic tasks far beyond today’s limits, coherent hierarchical state management and orderly scheduling across 1B tokens for high value long horizon tasks. Polar’s async rollouts look ideal for RL training at that scale. Thoughts on synergies?
@calcsam How does Mastra handle resume after a crash mid step? That's where most frameworks quietly punt, and it's the part I care most about for long runs.
@billxbf PRM fixes credit assignment given the trace. The harder gap is coverage. Crash recovery and state repair are off distribution from clean rollouts, so they rarely get sampled and the PRM never scores them. You almost have to inject faults to get the traces worth crediting.
@djfarrelly What changes every six months is the orchestration fashion at the loop layer. What doesn't change is the substrate under it. You still need durable state, deterministic recovery, and verification to run unattended. Swap frameworks at the loop, keep the substrate stable.
DeepSeek is hiring an "Agent Harness" researcher. Possibly the first role with that title anywhere. The field is finally naming the thing.
The harness is where durable state meets reality, and reality fails differently than your tests.
I hit this last week. A migration looked idempotent and got silently swallowed in prod. One backend aborts the whole transaction after the first failed statement, the other keeps going. Tests were green on the forgiving one. Prod ran on the strict one.