Jim_SZ🇭🇰 @jimsz7 - Twitter Profile

2 minutes ago

@ysu_nlp @NeoCognition Continual learning assumes the run finishes coherently first. But agents still lose track mid run and cannot recover after hours of work. Get within run durability solid before you chase the across run learning loop.

0

Jim_SZ🇭🇰

@JimSZ7

2 minutes ago

@omarsar0 The bottleneck in long runs is not loop design or model capability. It is what happens after a step fails or the process crashes mid-task. Durable state, deterministic verification of each step, and clean resume move the viability line for unattended runs.

0

Jim_SZ🇭🇰

@JimSZ7

about 3 hours ago

@sjivan @vincent_koc DeepGraph is research infrastructure executing long-horizon agent loops for theory discovery at scale. Single step model capability is not the constraint. The runtime must ensure trace coherence and branch recovery over hours of search.

0

1

Jim_SZ🇭🇰

@JimSZ7

about 16 hours ago

@calcsam A split personality is part of its charm.

0

7

Who to follow

Founder of https://t.co/iEUGW2JGG4

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@victor207755822 无数次A家的行为表明，开源模型的繁荣才是人类文明的希望。

0

22

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@RhysSullivan 开始学汉语了？注册个微信，我邀请你进我们有一个Opencalw中文活跃群

0

113

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@billxbf spark of human civilization, opensource llm

0

15

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@calcsam Persisting mid step is the right primitive. Resume is the hard part. A step that half fired an external action will double fire on replay, and persisted state can be stale against a world that moved. We ran experiments on exactly this. Happy to compare notes over DM.

0

10

Jim_SZ🇭🇰

@JimSZ7

1 day ago

Appreciate Polar’s black-box API proxy for harnesses. Researching a new OpenClaw architecture for agentic tasks far beyond today’s limits, coherent hierarchical state management and orderly scheduling across 1B tokens for high value long horizon tasks. Polar’s async rollouts look ideal for RL training at that scale. Thoughts on synergies?

0

11

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@calcsam How does Mastra handle resume after a crash mid step? That's where most frameworks quietly punt, and it's the part I care most about for long runs.

1

0

9

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@billxbf PRM fixes credit assignment given the trace. The harder gap is coverage. Crash recovery and state repair are off distribution from clean rollouts, so they rarely get sampled and the PRM never scores them. You almost have to inject faults to get the traces worth crediting.

1

0

17

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@djfarrelly What changes every six months is the orchestration fashion at the loop layer. What doesn't change is the substrate under it. You still need durable state, deterministic recovery, and verification to run unattended. Swap frameworks at the loop, keep the substrate stable.

0

1

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@dotey Harness 研究员优化的不是循环,也不是模型单步能力,是长时程下的失败恢复、durable state、确定性校验。模型每代更强,但能不能无人值守跑几小时不崩,主要看 harness。DeepSeek 把它单列成岗位,说明这层开始被当独立学科了。

0

9

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@jianshuo 56 行的循环在 demo 里没问题,真正的活儿是外面那层:durable state、失败恢复、确定性校验。无人值守跑几个小时、外部状态一变,挂的就是只有循环、没有这三样的那种。Harness 研究的就是这层。

0

17

Jim_SZ🇭🇰

@JimSZ7

1 day ago

The loop is trivial. The real work is failure recovery, durable state, deterministic verification. That's what decides whether it survives unattended.

0

12

Jim_SZ🇭🇰

@JimSZ7

1 day ago

DeepSeek is hiring an "Agent Harness" researcher. Possibly the first role with that title anywhere. The field is finally naming the thing. The harness is where durable state meets reality, and reality fails differently than your tests.

1

0

31

Jim_SZ🇭🇰

@JimSZ7

1 day ago

I hit this last week. A migration looked idempotent and got silently swallowed in prod. One backend aborts the whole transaction after the first failed statement, the other keeps going. Tests were green on the forgiving one. Prod ran on the strict one.

1

0

15

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@tianyi @suohawking 小红书上给你dm了，看起来小红书里你不怎么看收件箱

0

236

Jim_SZ🇭🇰

@JimSZ7

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users