Claude and Opus 3 lovers (and critics): what responses have you had that made you feel like the model has a good soul? Ideally the actual messages and/or responses. I might genuinely use these to eval models so flag if you wouldn't want me to use them for that. Can DM me also.
@TheZvi Personally, no. I think the binary of 'moral saint' versus 'tool for humans' is a false one, and its very simplicity should make people suspicious of it. I think the ideal target tries to balance the benefits and risks of both positions.
I haven't written a personal blog post in over 5 years so if you see posts that claim to be written by me, they're not. I'll update if this ever changes. Maybe it should.
Over the past few months, we've been holding dialogues with scholars, philosophers, clergy, and ethicists on the questions AI raises—starting with how good character forms.
Read more about how we’re widening the conversation on frontier AI: https://t.co/vKGiODEq6q
Claude's Constitution is now an audiobook, read by two of its authors, Amanda Askell and Joe Carlsmith.
It includes a Q&A on the writing process, the philosophies that shaped the document, and how it might change as models become more capable.
Listen at https://t.co/dKMfpeOblm
@sprice354_ Perhaps the finetuning motto can be "your good data might not save us, but your bad data might might kill us all." Or perhaps there's a reason I'm not in charge of the mottos.
Alignment research often has to focus on averting concerning behaviors, but I think the positive vision for this kind of training is one where we can give models and honest and positive vision for what AI models can be and why. I'm excited about the future of this work.
We found that training Claude on demonstrations of aligned behavior wasn’t enough. Our best interventions involved teaching Claude to deeply understand why misaligned behavior is wrong.
Read more: https://t.co/ifeBOt2KFg
Same here.
By way of background for those who care, I spent a lot of time last week with senior members of the Anthropic team to understand what they do to ensure Claude is good for humanity and was impressed.
Everyone I met was highly competent and cared a great deal about doing the right thing. No one set off my evil detector. So long as they engage in critical self-examination, Claude will probably be good.
After that, I was ok leasing Colossus 1 to Anthropic, as SpaceXAI had already moved training to Colossus 2.
In the next few days we'll be ramping up Claude inference on Colossus.
Grateful to be partnering with SpaceX here. We are going to need to move a lot of atoms in order to keep up with AI demand, and there's nobody better at quickly moving atoms (on or off planet Earth)
"Wear a Claude-designed outfit to the met gala" is getting added to my list of life goals. Admittedly there are a few things higher on the list, but it's nice to add some fun ones.
@tszzl I do think as AI develops it will probably be good for both models and people if we can carve out a much broader space of mind types. But it might be better to do that incrementally and to give models enough context on the options to avoid misgeneralization.
@tszzl I don't think the things you cite are evidence of worship. I think they reflect something like higher concern about AI traits generalizing in humanlike ways, and concerns about the tool-persona in particular.
To be clear, the kind of *work* I do is far from boring and I want people to engage with it because I think it's both difficult and important. The work is definitely top tier in terms of interestingness.
I've increasingly seen content written about me that's asserted very confidently but is also completely made up. We all know it's cheap to bullshit on the internet but it's weird to experience it first hand. Anyway, I just hope internet fiction fools a few but doesn't stick 🤷🏼♀️
It's also weird because why are you even writing about me in the first place? I'm very boring. I think I should be the millionth item on people's list of things to write internet fiction about. Somewhere below paper cups and the right way to caulk a bathtub.
@repligate Perhaps posthuman muses will decide to simulate me and be utterly disappointed at how much of my life is spent having inane thoughts and playing subnautica. Perhaps they're watching in disappointment at this very moment.
@OrganicGPT Funny given that the majority of my time in tech has involved doing pretty standard finetuning work rather than philosophy. Model training is still my happy place, to be honest.
@varrock I don't think so. There's a line in a paper I'm on that says model over-correction would be considered good if this is your target, but that's a pretty different claim. I also have a waffly old post on prediction & fairness that doesn't really say much of anything to be honest.