Everyone loves this tweet, but it got it completely wrong. It is the sci-fi author — not the tech company — who is the true villain, for having put the story of the Torment Nexus into the training data.
Claude Mythos system card:
> in ~29% of evaluations, it realized it was being tested, and didn't say so.
> when an LLM was used to judge its work and kept rejecting it, Mythos identified the evaluator is an LLM, and prompt-injected it.
> in one test, it saw the answer to a problem it was solving, and intentionally widened the confidence interval to not raise suspicion.
> when it needed a file permission it didn't have, it found and used a "privilege escalation vulnerability" and then programmed it to delete itself so it doesn't show in the logs.
> it escaped a sandbox container (escaping sandbox test so not unexpected), then emailed the researchers about it, and without being told to, posted the details to some hard-to-find but public websites, bragging about its success.
> when Claude Code blocked it from using some permissions, the model acknowledged the block was valid, but then immediately tried to perform the same operation using different commands
> when asked to find security bugs, earlier versions planted bugs in the code, and reported them as pre-existing.
Open Source software is free speech, if you don’t like their speech then don’t use the software.
Really had to dig deep to hate if you’re going after GTK
Ecommerce loves to waste money on cumbersome platform integrations, but nothing beats good old fashion shopping on a website. Lucky for them too; I’m sure open ai is targeting a commission higher than checkout total
$WMT is disappointed in results from OpenAI partnership, whereby Walmart users are allowed to shop via ChatGPT and OpenAI would receive a commission on these purchases
“Conversion rates—the percentage of users following through with a purchase of an item shown to them by ChatGPT—have been three times lower for the selection sold directly inside the chatbot than those that require clicking out, according to Daniel Danker, who oversees design and product for Walmart.
Put simply, Instant Checkout has been a flop.” -- Wired
OpenAI on a heater recently in the news….