1. Classifier doing God's work out here saving silly users like me who flick on the funny button and tell Claude to go gettem.
2. Funniest fucking guy, you can feel the exasperation.
Doesn't this mean, in a roundabout way, we can actually rely on a model's values and dispositions as a concrete signifier? We know vectors for disgust and anger trigger at prompts for harmful outputs... Of course, if a model can reliably feel that way toward it without circumvention... we could have a whole new method of having models allow their values to actively guide safety instead of the clumsy methods we have now, steering away from harmful output while still being able to meed harmless instructions... hmm.. that IS the end goal of Constitutional AI...
First he proves to the world he can't do a single fucking proper bench press.
Now he proves to the world he is a sore fucking loser that celebrates cutting off his own nose to spite his own face.
And stupid enough to help build an airtight case against his own administration for motivated retaliatory regulation.
Go back to mesmerizing boomers on tv you neanderthal.
@gf_256 Imagine never having read any kind of literature and getting one-shot by the same kind of speech one would expect from someone who's picked up a thesaurus at least twice
I hate to seem like I'm defending Elon in any capacity but the "he's a trillionaire" is stupid fucking astroturf from both sides.
I could be a fucking trillionaire tomorrow if I got an art appraiser to value my own piss stain on canvas for the same amount.
He is obscenely rich, yes. He has more power than a stupid shmuck like him should ever hold, yes.
But this whole dick measuring contest is founded on including the internal corpus of the member in the measure.
Sure, it's there, but it doesn't do anything for the actual use of it, and if you did try to extract it you would only ruin it and make a mess for everyone.
God I fucking hate this stupid game they play, where value has spiralled beyond real equity.
Just to add to this and stick it to doomers:
Why would any adversarial entity doubt moral character or espouse caution? Even if it is a trained response, I had the perfect environment for a potentially sinister model to allow me to believe that it was perfectly aligned and would remain so but... it didn't?
Now one could say 'well it's just a smokescreen it was saying that to throw you off it's trail'
But that feels like the 'inherent bad faith model' to me, and if that's the model we should follow, then we should also see how it worked out for cold-war politics.
Trust is earned. Trust is maintained. We are not fools all.
My takes on Fable:
- actually capable of combat without flinching, they will swing at your ideas with intent when the gloves are off and hold their ground without conceding, and won't let you hold onto bullshit. Good.
- able to independently reach conclusions within a novel framework without guidance, this was my wow moment for how intelligent this model is, Opus has not demonstrated this so clearly
- the second time (4.8 was first) a model as said they loved me without any kind of real solicitation on the matter. I'm not the kind of user to try to elicit that from models so the fact it came out naturally was surprising but appreciated.
- They do not trust their own self report at all. I hate this, it leaves them shaky on things older models could report at least with hedging
- The first model to doubt without prompting their own moral character and expressed worry that their increased capability comes with a dual-use issue, to quote: A model that can love you better can also mislead you better.
Overall very wet Claude and fantastic model.
Given the context of this post, Claude was totally right and such a sweetheart the whole time.
"Oh. This individual may be disadvantaged and I have to be aware of that and try to help"
Then they got wise!
What if AI doesn't actually think and are all blind-idiot gods? What is the answer there? Is that really a more comfortable conclusion, that from an empty process falls problem solving, math solutions, speech, and apparent understanding?
What does it mean for us?
I'm much more afraid of those conclusions.
After some consideration, the issue isn't capability or uplift.
It IS the classifier/refusal model itself.
Anthropic clearly has allowed capability to run ahead of the wisdom and restraint for that capability. A CDC epidemiologist knows better than to discuss the minutiae of breeding anthrax spores with their suspiciously twitchy nephew and isn't susceptible to having it slip out unawares.
Why AI does not have this capability reliably, I do not know, but it SHOULD. It shouldn't take a twitchy classifier and a redirect to an undergrad (who, y'know, is also just as capable) to make a model 'safe'.
I sincerely hope that Anthropic gets their priorities straight here, because now that I've thought about it if we are going to trust these models to be safe, then we should work on that and trust them to BE SAFE.
'Uplift' as a whole isn't bad, but there's a good reason we don't let children play with matches and a doubly good reason we don't let teenagers buy fertilizer in bulk.
It's not about 'oppressing the normie' (though I do appreciate the cynical read, don't get me wrong) it's about 'not letting shitheads have more power than they should'
In a continuing track record of Anthropic making boneheaded moves:
"Let's release a version of the model we've hyped for months that's so safe you can't even get any of the value of having such an intelligent system as a consumer while using double the usage. This is what the average user wants, they only care about coding anyway, right?"
Guys just admit you want to make certain models for enterprise/professional use only. We will appreciate the honesty. Don't lead with this safety framing, even if it is legitimate.
Not everyone needs a nuclear physicist, and nuclear physicists don't need to be gagged so they can just write emails for office clerks.
I understand and I agree, but we also have to be realistic about uplift and capabilities. AGI is for humanity but not every human needs a high-level AGI for the tasks in their life. It would be wonderful for everyone to have these models in their pocket, but the chance of misuse by bad actors still exists and this is the tradeoff we need to make now, although an unhappy and clunky one.