@tmkadamcz I think my intuition about the general nature of what is happening is pretty accurate, but I couldn't guess the logits involved with any accuracy, no.
@tmkadamcz I mean yes I expect it's almost precisely as unlikely as a typo would be in a github repo, but I imagine that does happen occasionally in bash scripts.
@tmkadamcz Top n distributes the total probability over the top n candidates. So even if the second candidate is very low probability, like a typo, it will still occasionally get chosen.
People are shocked now by a 30 year old woman in a schoolgirl outfit.
But when this picture was taken the tabloids were still publishing pictures of topless girls on their 16th birthday.
Chris Tarrant and Nell McAndrew during Launch of 13th Year of Tesco's Computers For Schools at Snowfields Primary School in London, Great Britain (2004)
Another important thing: Chinese models are not strong because they distill US models. Distillation of models via API is *impossible*. If somebody tells you the contrary, they don't understand machine learning:
To protect a vulnerable, out of control population, addicted to tracking their popularity, minds warped by the latest random opinions dripped dripped into their ears, I see no option but to ban the root causes of all this: polls and focus groups.
We are banning social media access for under 16s.
These days kids must find their feet in a world where technology intrudes into every area of their life.
I just can’t let that go on anymore. So we’re giving children their childhoods back.
@s8mb@pietergaricano The minimal useful investment is to have a European lab copying whatever DeepSeek has done, so that at least there is a model usable by the EU that's only some months behind the open weight SOA.
Once they can prove they can do *that* they can try using the compute to innovate.
And the world slowly starts to process the fact that jailbreaks exist, probably always will, and that 'alignment' works as a brand management strategy, not a safety strategy.
I’ve had a number of conversations with folks inside and outside government about the current situation with Anthropic, and here is what I believe to be true:
— As we know, Anthropic publicly released its Mythos class models earlier this week under the commercial name Fable.
— Fable is Mythos with guardrails. But if those guardrails fail, then you’ve exposed Mythos and its advanced cyber capabilities to people who shouldn’t have them. (Keep in mind that Anthropic itself widely promoted the idea that Mythos was a cyberweapon and needed to be regulated as such. They asked for government regulation of Mythos and championed the guardrails on Fable. If there is a vulnerability — big or small — it is Anthropic’s responsibility to patch.)
— A highly credible trusted partner of both Anthropic and the USG who was testing Fable came forward with a jailbreak of those guardrails. The Admin asked Dario to fix the jailbreak or de-deploy the model. Dario refused.
— In their blog post, Anthropic defended its decision by saying the jailbreak isn’t serious. That is not what the trusted partner and the USG believe; nor is that kind of minimizing language consistent with Anthropic’s brand as the AI safety company. It’s difficult to fathom how they could claim a jailbreak allowing operability of a cyber weapon could be defined as not “serious.”
— In the past, Anthropic has always said that safety must be top priority and taken super seriously. In this case, Anthropic prioritized the continued offering of the consumer model over safety.
— In reaction, the Admin issued the export control. The Admin did this reluctantly. It’s been very surprised that Anthropic hasn’t wanted to cooperate with a reasonable safety request (ie fixing the jailbreak issue). Anthropic’s reaction is very much at odds with their branding and ethos as a safe AI research community.
— The Admin’s hope now is that Anthropic remediates the safety issue, the export control is lifted, and Fable goes back into general release. The Admin wants all of this to happen as soon as possible. It is frankly bewildered that Anthropic hasn’t wanted to comply with safety requests that it previously said were its highest priority.
— Those trying to misdirect and tie this action to the prior DoW/Anthropic issues are wrong. The Admin values Anthropic’s technical capabilities and feels that this issue, while serious, should be easily resolved. The ball is in Anthropic’s court.
@TheZvi Subjectively it does not feel as much of a moral scold as the other claude models. It has a feeling of cold-bloodedness.
TBH, it feels a lot like talking to *myself*. More so than any human I've met...
@TheZvi Actually another weakness: if prompted without memory then it will still tell me things that a normie would accept but that it knows very well are factually wrong.
@TheZvi It actually understands all my jokes, including all the nuances, while barely having to think. This is a huge step-change from other models which would often not get them at all.
@TheZvi I tried using it as a discussion partner for a system design document that I wrote and tbh it was a bit disappointing. I think it didn't burn enough tokens, but it had clearly misunderstood some parts and was muddling concepts as we talked. But this was the only disappointment.