No single person or institution should define ideal AI behavior for everyone.
Today, we’re sharing early results from collective alignment, a research effort where we asked the public about how models should behave by default.
Blog here:
https://t.co/WT9REAznD7
The changes we adopted through this process will soon be reflected in the Model Spec. We’re also sharing our dataset publicly with the research community: https://t.co/53hbDSbEgQ
🚨 New paper alert! Excited to share our latest work with the amazing @actuallysoham@jaybaxter and @msaveski introducing Supernotes!
These are LLM-generated @CommunityNotes that synthesize multiple underlying "notes" and are selected by a simulated jury of diverse raters. See more info 👇
🚨🚨🚨 VERY excited to share our new paper on how AI can facilitate democratic deliberation, published today in @ScienceMagazine! Together with @mhtessler, @summerfieldlab, and other amazing collaborators at @GoogleDeepMind we've been building the "Habermas Machine"
If you’re attending #chi2024 come check out the work being presented by @SocFuturesLab members and collaborators! See below for four upcoming talks and one workshop paper:
📖 we just shared the model spec, i.e. a “spec” for openai’s models. it’s a work in progress that we’re sharing for early feedback.
it also features profanity & cats, flat earth theory, and why the model says “sorry, i can’t help with that”.
from a product perspective, i’m personally excited about this concept of a model spec for three reasons:
1. there’ll be more clarity on whether something is a policy or rlhf bug
if a model says or does something that you disagree with, was that intended by openai or an rlhf bug? should you yell at @sama or researchers?? jk please don’t yell at researchers
2. principles are easier to debate and get feedback on, vs. hyper-specific screenshots or abstract feel-good statements
it’s easy for most people to agree on “models should be [adjective]”, but the more important questions lie deeper in thorny scenarios: how should the model engage with someone who claims that the earth is flat?
the model spec introduces language we can use to debate model behavior questions: objectives, rules, defaults, with difficult cases we’ve encountered through real-world use. i’m hoping that these new concepts will enable more nuanced and important discussions.
3. model spec feedback will help us steer our efforts in steerability
an explicit non-goal for the model spec is to reach consensus on a one-size-fits-all model. that will never happen. we want to give users and developers as much control as possible while staying within hard boundaries that people understand.
hearing feedback on where & how everyone wants to steer the model is helpful in (a) designing a more rigorous survey process and (b) informing the research and product roadmap.
a lot of people worked on this — both the model spec and all the thinking that led up to it — and they’re all genuinely excited to hear from you on the hardest questions. so please don’t be shy and tell us your thoughts!
“What are human values, and how do we align to them?”
Very excited to release our new paper on values alignment, co-authored with @ryan_t_lowe and funded by @openai.
📝: https://t.co/iioFKmrDZA
Thank you @CIFAR_News , it's official.
I am happy to announce that I have been awarded a Canada CIFAR AI Chair 🥳🎉
I am very grateful to CIFAR, @Mila_Quebec and @mcgillu