I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true:
— As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable.
— Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.)
— A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
— In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.”
— In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety.
— In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community.
— The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority.
— Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.
In 12 months, coding agents went from writing none of Clipboard's code to nearly all of it.
That broke our tests. At one point, 100% of PRs in two of our largest repos hit at least one flaky test.
When humans write code, flakes are annoying. When agents write code, flakes break the feedback loop that keeps them moving at full speed.
We drove our E2E flake rate from 100% to under 15% in six weeks:
1. We asked agents to triage every E2E test. Three models with separate harnesses categorized each, then two more agents reached consensus in fresh context windows. They proposed cutting 174 tests to 46. We landed at 87 after domain owners pushed back on specific cuts.
2. We built a Playwright reporter designed for agents with a unified timeline of steps/network/console events, base64 screenshots, and traceparent headers that let agents jump from a failed test straight to Datadog APM traces across 30+ backend services.
3. Agent selection matters. Given identical flakes and prompts, Codex consistently went deeper than the alternatives, returning trace evidence and real product bugs instead of defaulting to retries and longer timeouts.
Code is a liability. Tests usually get a pass because "coverage is good." Each test has a maintenance cost, and you pay the highest cost for lying tests.
Full write-up, plus our open-source playwright-reporter-llm and /flaky-test-debugger skill: https://t.co/tl673U1h3A
"Staff Engineer" sometimes means "Senior Engineer who's been here a while."
At Clipboard, it's a fundamentally different job: Hands-on, cross-team impact while shaping our company's technical direction.
We wrote about how we define and what we expect from the role: https://t.co/89O7eAEhtU
Love solving ambiguous, high-leverage technical and customer problems? We're hiring!
We've run billions of background jobs on MongoDB over the past 2 years: ~40M/week, peaking at 850 jobs/sec. Today we're open-sourcing the Node.js library that powers it. https://t.co/V6woFy4NB8
We're also hiring!
It's a mistake for startups to treat fundraises as a series of milestones. There's something even more impressive than raising a series whatever: to be making so much that you don't need to. Eventually all companies have to reach this point. The sooner you do, the better.
Request to X leadership: Remove Khamenei's Gray Government Checkmark.
The people of Iran do not recognize Ali Khamenei as their leader.
Granting him a gray government checkmark legitimizes a regime that rules by force, censorship, and internet shutdowns, while X itself has been filtered in Iran for years.
Our leader, recognized by millions of Iranians inside and outside the country, is Reza Pahlavi.
X already took a powerful step by restoring Iran’s historical Lion & Sun flag 🇮🇷 - thank you @elonmusk and @nikitabier.
Removing Khamenei’s gray checkmark is the next logical step for platform integrity and truth.
X is the Times Square of the internet. Recognition here matters.
Please repost & like so this reaches the right eyes.
@hatchways_io, FYI, every day when I log-in to review assessments, hatchways first page load is blank. Probably something with my auth/refresh token not working out. It works again if refresh the page, but it's a bit annoying to do every single day. Happy to help debug in DMs
@GergelyOrosz 💯, game changer and amazing how the majority of it is still relevant today. it is also one the most actionable books on software development
I've been thinking about this doodle a lot in the past years, and I'd change it from "hiring in early stage" to just "hiring". No matter the stage of your company, best hires will come through referrals and network.
And no, I wouldn't fix the typo on referral 🙃
#hiring in #earlystage: the less direct contact you’ll have with people you want to hire, the more you’ll pay for talent that won’t be as good.
Build #connections with #people, build your own #network, truly help & serve this network. They’ll even refer you to others.