The still-missing ingredient isn’t what to do (results forthcoming soon!)
It’s getting buy-in from the stakeholders - so we're reaching out now, and are happy to speak with relevant organizations.
https://t.co/qxUo7bYVH0
In the coming weeks, anyone interested in endorsing the results should be in touch; we’ll be sharing results with potential endorsers, then will open endorsement widely to labs, auditors, industry, organizations, academics, and civil society groups.
Everyone says we need better evals - but why does a consensus process help?
The short answer is that common knowledge encourages coordination, and makes bad practices more costly.
The long answer is a thread.🧵
Our (imperfect) answer is to find consensus on practices that are seen as important across stakeholder groups.
That answer won’t tell anyone whether an eval is good enough, much less great, but it will help raise the minimum bar for those practices which are broadly accepted.
In our round 1, many participating AI evaluation experts noted that typical methods for addressing evaluation awareness are not reliable, thus not (yet) recommended.
@GeorgeBalston, @Miles_Brundage, @charlotte_stix; it's great to see this, as fixing this gap will be critical!
Black-box access may soon no longer be enough to robustly make or verify safety and security claims. Deeper, white-box access is a necessary update to counter 'evaluation awareness' and keep loss-of-control evaluations state of the art. A new policy blog explains why. 🧵
In the coming weeks, anyone interested in endorsing the results should be in touch; we’ll be sharing results with potential endorsers, then will open endorsement widely to labs, auditors, industry, organizations, academics, and civil society groups.
Preliminary results are starting to make that picture clearer. The interim results about what the process is surfacing are available to participants; if your organization has a stake in how AI evaluations are performed, your input is still welcome!
https://t.co/LvL8D8xpxg
Preliminary results are starting to make that picture clearer. The interim results about what the process is surfacing are available to participants; if your organization has a stake in how AI evaluations are performed, your input is still welcome!
https://t.co/LvL8D8xpxg
Last week, @davidmanheim argued that AI evaluation results are becoming load-bearing.
But that creates a problem:
How do we arrive at better common practices without first deciding who is in charge, or trying to dictate it ourselves? 🧵
https://t.co/Tn6ITF8Shv
AI Evaluation was always critical for development, and marketing, but it’s increasingly load-bearing in policy, safety cases, and public discourse.
Evals used for development just need to work, and marketing is marketing, but now, they must be more robust and communicate more.🧵
We're not interested in pretending consensus exists if it doesn't!
But after round 1, many practices are broadly agreed to be important across labs, auditors, academia, civil society, and government; others are context-dependent, not widely accepted, or are disagreed upon.