Jim_SZ🇭🇰 @JimSZ7 - Twitter Profile

about 1 hour ago

@sjivan @vincent_koc DeepGraph is research infrastructure executing long-horizon agent loops for theory discovery at scale. Single step model capability is not the constraint. The runtime must ensure trace coherence and branch recovery over hours of search.

0

Jim_SZ🇭🇰

@JimSZ7

about 15 hours ago

@calcsam A split personality is part of its charm.

0

7

Jim_SZ🇭🇰

@JimSZ7

about 23 hours ago

@victor207755822 无数次A家的行为表明，开源模型的繁荣才是人类文明的希望。

0

22

Jim_SZ🇭🇰

@JimSZ7

about 23 hours ago

@RhysSullivan 开始学汉语了？注册个微信，我邀请你进我们有一个Opencalw中文活跃群

0

113

Who to follow

Founder of https://t.co/iEUGW2JGG4

Jim_SZ🇭🇰

@JimSZ7

about 23 hours ago

@billxbf spark of human civilization, opensource llm

0

15

Jim_SZ🇭🇰

@JimSZ7

about 23 hours ago

@calcsam Persisting mid step is the right primitive. Resume is the hard part. A step that half fired an external action will double fire on replay, and persisted state can be stale against a world that moved. We ran experiments on exactly this. Happy to compare notes over DM.

0

10

Jim_SZ🇭🇰

@JimSZ7

about 24 hours ago

Appreciate Polar’s black-box API proxy for harnesses. Researching a new OpenClaw architecture for agentic tasks far beyond today’s limits, coherent hierarchical state management and orderly scheduling across 1B tokens for high value long horizon tasks. Polar’s async rollouts look ideal for RL training at that scale. Thoughts on synergies?

0

11

Jim_SZ🇭🇰

@JimSZ7

about 24 hours ago

@calcsam How does Mastra handle resume after a crash mid step? That's where most frameworks quietly punt, and it's the part I care most about for long runs.

1

0

9

Jim_SZ🇭🇰

@JimSZ7

about 24 hours ago

@billxbf PRM fixes credit assignment given the trace. The harder gap is coverage. Crash recovery and state repair are off distribution from clean rollouts, so they rarely get sampled and the PRM never scores them. You almost have to inject faults to get the traces worth crediting.

1

0

17

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@djfarrelly What changes every six months is the orchestration fashion at the loop layer. What doesn't change is the substrate under it. You still need durable state, deterministic recovery, and verification to run unattended. Swap frameworks at the loop, keep the substrate stable.

0

1

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@dotey Harness 研究员优化的不是循环,也不是模型单步能力,是长时程下的失败恢复、durable state、确定性校验。模型每代更强,但能不能无人值守跑几小时不崩,主要看 harness。DeepSeek 把它单列成岗位,说明这层开始被当独立学科了。

0

9

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@jianshuo 56 行的循环在 demo 里没问题,真正的活儿是外面那层:durable state、失败恢复、确定性校验。无人值守跑几个小时、外部状态一变,挂的就是只有循环、没有这三样的那种。Harness 研究的就是这层。

0

17

Jim_SZ🇭🇰

@JimSZ7

1 day ago

The loop is trivial. The real work is failure recovery, durable state, deterministic verification. That's what decides whether it survives unattended.

0

12

Jim_SZ🇭🇰

@JimSZ7

1 day ago

DeepSeek is hiring an "Agent Harness" researcher. Possibly the first role with that title anywhere. The field is finally naming the thing. The harness is where durable state meets reality, and reality fails differently than your tests.

1

0

31

Jim_SZ🇭🇰

@JimSZ7

1 day ago

I hit this last week. A migration looked idempotent and got silently swallowed in prod. One backend aborts the whole transaction after the first failed statement, the other keeps going. Tests were green on the forgiving one. Prod ran on the strict one.

1

0

15

Jim_SZ🇭🇰

@JimSZ7

1 day ago

@tianyi @suohawking 小红书上给你dm了，看起来小红书里你不怎么看收件箱

0

236

Jim_SZ🇭🇰

@JimSZ7

2 days ago

@steipete This is the right shape. What decides whether it survives unattended is what happens when one of those 5-min steps goes wrong — does a bad write kill the thread, or is there a checkpoint + safe resume? That recovery layer is the hard part, not the loop.

0

4

Jim_SZ🇭🇰

@JimSZ7

2 days ago

@dawnsongtweets The "job-ready" gap isn't single-task skill — Fable 5 keeps closing that. It's reliability across a long horizon: state that survives, recovery from a bad step, verification you can trust. A benchmark that runs full jobs will show the model improved but the runtime didn't.

0

52

Jim_SZ🇭🇰

@JimSZ7

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users