Steel @steeldotdev - Twitter Profile

Pinned Tweet

1 day ago

The team's been deep on the different shapes a browsing agent can take. Agents fail in three ways on long runs: they quit early, grade their own work on a curve, and forget the rules after a compaction. We'd been fighting them in a browser for months. Three things we'd add from here 👇

Niko(la)

@nibzard

1 day ago

https://t.co/dGW2a03ica

3

5

0

8

933

2

5

1

3

517

steeldotdev retweeted

Steel

@steeldotdev

about 8 hours ago

Today we are launching Steel Skills. Five agent skills for the web. Install one or the whole set. Runs in Claude Code, Cursor, Codex, opencode, Pi, or any compatible agent.

3

12

4

13

905

Steel

@steeldotdev

about 8 hours ago

Install one skill, run one real flow, tell us one blocker and one pass. Start here → https://t.co/8fcBEeiipL

0

1

0

25

Steel

@steeldotdev

about 8 hours ago

Today we are launching Steel Skills. Five agent skills for the web. Install one or the whole set. Runs in Claude Code, Cursor, Codex, opencode, Pi, or any compatible agent.

3

12

4

13

905

Steel

@steeldotdev

about 8 hours ago

List the catalog with @vercel skills or use Steel CLI skills command. `npx skills add steel-dev/skills --list`

1

0

28

Steel

@steeldotdev

1 day ago

A safety net for the runs where your code or agent never gets the chance to clean up. Read more about it in this writeup by @junhssss https://t.co/Pc0S2YLHzX

0

1

0

49

Steel

@steeldotdev

1 day ago

Stop paying for a browser after the work is done or your agent dies. Steel sessions now take an inactivity timeout: if the agent stops driving the browser for the window you set, Steel releases it and the meter stops. Read more ↓

steeldotdev's tweet photo. Stop paying for a browser after the work is done or your agent dies.

Steel sessions now take an inactivity timeout: if the agent stops driving the browser for the window you set, Steel releases it and the meter stops.

Read more ↓ https://t.co/BpBIcRCqxA

2

6

2

4

181

Steel

@steeldotdev

1 day ago

The honest fix was always "release the session yourself." And you should. Now you can just set the timeout, and Steel handles the cleanup for you. Just set the window longer than the longest gap you expect between commands.

steeldotdev's tweet photo. The honest fix was always "release the session yourself." And you should.

Now you can just set the timeout, and Steel handles the cleanup for you.

Just set the window longer than the longest gap you expect between commands. https://t.co/fsjSFIZASH

1

0

58

steeldotdev retweeted

Steel

@steeldotdev

2 days ago

Most of what you ship to an agent gets compressed before it acts — docs, SDKs, blog posts, distilled by the model first. Errors are the exception. They reach the agent intact. We've been rebuilding ours around that. ↓

steeldotdev's tweet photo. Most of what you ship to an agent gets compressed before it acts — docs, SDKs, blog posts, distilled by the model first.

Errors are the exception. They reach the agent intact.

We've been rebuilding ours around that. ↓ https://t.co/iu4Yby2MtK

1

5

3

7

290

steeldotdev retweeted

Steel

@steeldotdev

1 day ago

The team's been deep on the different shapes a browsing agent can take. Agents fail in three ways on long runs: they quit early, grade their own work on a curve, and forget the rules after a compaction. We'd been fighting them in a browser for months. Three things we'd add from here 👇

2

5

1

3

517

Steel

@steeldotdev

2 days ago

@0xbosta How we think about talking to an agent now — three channels: ✦ hard error: it failed, here's the fix ✦ soft warning: it worked, but you'll regret it ✦ agent notes: ambient per-site context Full writeup by @0xbosta ↓ https://t.co/DilGBvMrEt

0

3

0

3

95

Steel

@steeldotdev

2 days ago

Most of what you ship to an agent gets compressed before it acts — docs, SDKs, blog posts, distilled by the model first. Errors are the exception. They reach the agent intact. We've been rebuilding ours around that. ↓

1

5

3

7

290

Steel

@steeldotdev

6 days ago

Changelog #027: https://t.co/lBBtQzCSBm or, come hang in our discord: https://discord. gg/steel-dev

0

1

0

1

199

Steel

@steeldotdev

6 days ago

What's new @ Steel - Changelog #027 ✦ New Agent Traces docs: overview, timeline + exports, and the API ✦ Leaderboard refresh — filterable /results index and cleaner benchmark pages ✦ More benchmark entries, plus tooling to find new results worth listing ✦ Plus docs-delivery fixes and browser-seconds metering under the hood Link below ↓

steeldotdev's tweet photo. What's new @ Steel - Changelog #027

✦ New Agent Traces docs: overview, timeline + exports, and the API
✦ Leaderboard refresh — filterable /results index and cleaner benchmark pages
✦ More benchmark entries, plus tooling to find new results worth listing
✦ Plus docs-delivery fixes and browser-seconds metering under the hood

Link below ↓

1

6

0

1

298

Steel

@steeldotdev

6 days ago

Latest: @AnthropicAI Claude Opus 4.8 is sitting #1 on OSWorld @XLangNLP at 83.4%, above the human baseline. Added the day it shipped. https://t.co/41hgmZzPuW

Steel

@steeldotdev

7 days ago

Claude Opus 4.8 now ranks #1 on the OSWorld benchmark with an 83.4% score. We just added it to the Steel leaderboard. Congrats to the Anthropic team.

steeldotdev's tweet photo. Claude Opus 4.8 now ranks #1 on the OSWorld benchmark with an 83.4% score.

We just added it to the Steel leaderboard.

Congrats to the Anthropic team. https://t.co/Umw1QW1Sd1

1

8

1

0

463

0

5

0

99

Steel

@steeldotdev

6 days ago

Browser-agent benchmarks are getting crowded, stale, and hard to compare across. We are collecting the benchmarks that actually matter for browser and computer-use agents, so you don't have to chase them down. (Just rebuilt the whole leaderboard. Live now.)

5

15

3

12

3K

Steel

@steeldotdev

6 days ago

See where every browser agent actually ranks, and how each number was scored. https://t.co/uw1i6kczZT Would love to see benchmark maintainers sanity-check. See a result we're missing, or running an agent with a real score? Add it. @shuyanzh36 @SWEbench @jyangballin @_carlosejimenez @XLangNLP @TianbaoX @taoyds @OpenAI @_jasonwei

1

6

0

1

116

Steel

@steeldotdev

Last Seen Users on Sotwe

Trends for you

Most Popular Users