Clipboard Engineering @clipboardeng - Twitter Profile

22 days ago

With 30 minutes left in my day, I drop a stack of well-defined tickets into Todo, set Amphetamine to keep my laptop awake, and close the lid. By morning, the PRs are open with green CI and addressed code review comments. Clipboard has tripled deploys per engineer. Here's the workflow. https://t.co/XP3USaxkcx

0

2

0

20

Clipboard Engineering

@ClipboardEng

about 2 months ago

@mattpocockuk Here's how we're solving it: https://t.co/fZaWNR10Ke

Clipboard Engineering

@ClipboardEng

about 2 months ago

In 12 months, coding agents went from writing none of Clipboard's code to nearly all of it. That broke our tests. At one point, 100% of PRs in two of our largest repos hit at least one flaky test. When humans write code, flakes are annoying. When agents write code, flakes break the feedback loop that keeps them moving at full speed. We drove our E2E flake rate from 100% to under 15% in six weeks: 1. We asked agents to triage every E2E test. Three models with separate harnesses categorized each, then two more agents reached consensus in fresh context windows. They proposed cutting 174 tests to 46. We landed at 87 after domain owners pushed back on specific cuts. 2. We built a Playwright reporter designed for agents with a unified timeline of steps/network/console events, base64 screenshots, and traceparent headers that let agents jump from a failed test straight to Datadog APM traces across 30+ backend services. 3. Agent selection matters. Given identical flakes and prompts, Codex consistently went deeper than the alternatives, returning trace evidence and real product bugs instead of defaulting to retries and longer timeouts. Code is a liability. Tests usually get a pass because "coverage is good." Each test has a maintenance cost, and you pay the highest cost for lying tests. Full write-up, plus our open-source playwright-reporter-llm and /flaky-test-debugger skill: https://t.co/tl673U1h3A

1

5

3

1

246

0

1

0

29

Clipboard Engineering

@ClipboardEng

about 2 months ago

@mattpocockuk And the default "fix" agent reach for makes it worse: adding retries and bumping timeouts.

1

0

469

Clipboard Engineering

@ClipboardEng

about 2 months ago

In 12 months, coding agents went from writing none of Clipboard's code to nearly all of it. That broke our tests. At one point, 100% of PRs in two of our largest repos hit at least one flaky test. When humans write code, flakes are annoying. When agents write code, flakes break the feedback loop that keeps them moving at full speed. We drove our E2E flake rate from 100% to under 15% in six weeks: 1. We asked agents to triage every E2E test. Three models with separate harnesses categorized each, then two more agents reached consensus in fresh context windows. They proposed cutting 174 tests to 46. We landed at 87 after domain owners pushed back on specific cuts. 2. We built a Playwright reporter designed for agents with a unified timeline of steps/network/console events, base64 screenshots, and traceparent headers that let agents jump from a failed test straight to Datadog APM traces across 30+ backend services. 3. Agent selection matters. Given identical flakes and prompts, Codex consistently went deeper than the alternatives, returning trace evidence and real product bugs instead of defaulting to retries and longer timeouts. Code is a liability. Tests usually get a pass because "coverage is good." Each test has a maintenance cost, and you pay the highest cost for lying tests. Full write-up, plus our open-source playwright-reporter-llm and /flaky-test-debugger skill: https://t.co/tl673U1h3A

1

5

3

1

246

Clipboard Engineering

@ClipboardEng

3 months ago

"Staff Engineer" sometimes means "Senior Engineer who's been here a while." At Clipboard, it's a fundamentally different job: Hands-on, cross-team impact while shaping our company's technical direction. We wrote about how we define and what we expect from the role: https://t.co/89O7eAEhtU Love solving ambiguous, high-leverage technical and customer problems? We're hiring!

0

5

3

0

96

Clipboard Engineering

@ClipboardEng

4 months ago

We've run billions of background jobs on MongoDB over the past 2 years: ~40M/week, peaking at 850 jobs/sec. Today we're open-sourcing the Node.js library that powers it. https://t.co/V6woFy4NB8 We're also hiring!

0

3

4

0

210

Clipboard Engineering

@ClipboardEng

Last Seen Users on Sotwe

Trends for you

Most Popular Users