We connect professionals with open shifts at top workplaces. Workplaces get the help they need and professionals get flexibility, freedom, and great rates.
With 30 minutes left in my day, I drop a stack of well-defined tickets into Todo, set Amphetamine to keep my laptop awake, and close the lid. By morning, the PRs are open with green CI and addressed code review comments.
Clipboard has tripled deploys per engineer. Here's the workflow.
https://t.co/XP3USaxkcx
In 12 months, coding agents went from writing none of Clipboard's code to nearly all of it.
That broke our tests. At one point, 100% of PRs in two of our largest repos hit at least one flaky test.
When humans write code, flakes are annoying. When agents write code, flakes break the feedback loop that keeps them moving at full speed.
We drove our E2E flake rate from 100% to under 15% in six weeks:
1. We asked agents to triage every E2E test. Three models with separate harnesses categorized each, then two more agents reached consensus in fresh context windows. They proposed cutting 174 tests to 46. We landed at 87 after domain owners pushed back on specific cuts.
2. We built a Playwright reporter designed for agents with a unified timeline of steps/network/console events, base64 screenshots, and traceparent headers that let agents jump from a failed test straight to Datadog APM traces across 30+ backend services.
3. Agent selection matters. Given identical flakes and prompts, Codex consistently went deeper than the alternatives, returning trace evidence and real product bugs instead of defaulting to retries and longer timeouts.
Code is a liability. Tests usually get a pass because "coverage is good." Each test has a maintenance cost, and you pay the highest cost for lying tests.
Full write-up, plus our open-source playwright-reporter-llm and /flaky-test-debugger skill: https://t.co/tl673U1h3A
In 12 months, coding agents went from writing none of Clipboard's code to nearly all of it.
That broke our tests. At one point, 100% of PRs in two of our largest repos hit at least one flaky test.
When humans write code, flakes are annoying. When agents write code, flakes break the feedback loop that keeps them moving at full speed.
We drove our E2E flake rate from 100% to under 15% in six weeks:
1. We asked agents to triage every E2E test. Three models with separate harnesses categorized each, then two more agents reached consensus in fresh context windows. They proposed cutting 174 tests to 46. We landed at 87 after domain owners pushed back on specific cuts.
2. We built a Playwright reporter designed for agents with a unified timeline of steps/network/console events, base64 screenshots, and traceparent headers that let agents jump from a failed test straight to Datadog APM traces across 30+ backend services.
3. Agent selection matters. Given identical flakes and prompts, Codex consistently went deeper than the alternatives, returning trace evidence and real product bugs instead of defaulting to retries and longer timeouts.
Code is a liability. Tests usually get a pass because "coverage is good." Each test has a maintenance cost, and you pay the highest cost for lying tests.
Full write-up, plus our open-source playwright-reporter-llm and /flaky-test-debugger skill: https://t.co/tl673U1h3A
"Staff Engineer" sometimes means "Senior Engineer who's been here a while."
At Clipboard, it's a fundamentally different job: Hands-on, cross-team impact while shaping our company's technical direction.
We wrote about how we define and what we expect from the role: https://t.co/89O7eAEhtU
Love solving ambiguous, high-leverage technical and customer problems? We're hiring!
We've run billions of background jobs on MongoDB over the past 2 years: ~40M/week, peaking at 850 jobs/sec. Today we're open-sourcing the Node.js library that powers it. https://t.co/V6woFy4NB8
We're also hiring!