How can the US and China (or the international community, broadly) ensure compliance in AI agreements to manage large-scale risks? In a recent report, we discuss options available for verification of international AI treaties. Applicable to domestic rules too! 🧵(1/12)
@sama Can ChatGPT get a "Answer Now" keyboard shortcut? And that feature generally improved. For quick questions, it's not worth my time to go manually change the thinking effort (and doing so messes up my next query by changing the preset), & I often can't find Answer Now button
If leading AI companies are indeed approaching the point of recursive self-improvement, a coordinated, verifiable, and universally applied pause is probably the only responsible solution to mitigate several major AI risks; at least until safety guarantees are developed and demonstrated. Ensuring that such a moratorium is respected would require sincere collaboration between various countries and companies, but I definitely believe it is achievable if others follow in @AnthropicAI's footsteps.
Our highest and most urgent national priority should be AI safeguards. The risks of AI weapons, pathogens, mass unemployment, surveillance, and even extinction must not continue to be largely ignored.
This is a needed and candid post by the Anthropic Institute. I agree with the conclusion that we need more time before we are hit with the “immense implications” of AI technology. My team at the Machine Intelligence Research Institute has worked to detail an international agreement (https://t.co/TiUxXfEPVf) which satisfies the requirements which are laid out by Anthropic: that a pause must include all frontier AI developers anywhere on Earth and must be mutually verified. Our contribution includes answering how to address the technological particulars of verifying a frontier AI development pause, and how to structure the agreement for stability and effectiveness.
Our work is a model, and we would welcome collaboration with Anthropic to further develop and refine it.
Some important points for enabling international coordination:
- We task governments, rather than labs, to coordinate and verify the pause, because they have the diplomatic and national intelligence means to do so and they can architect binding rules that apply to everyone.
- The United States is capable of halting frontier AI development globally, unilaterally and/or through coordination with key allies. While this is not preferred to a broadly coordinated halt, it strengthens the US’s hand in negotiating one.
Interestingly, training compute for open-weight AI models doesn't appear to have grown very much in the last 2 years. Llama 3.1-405B (and derivative models) still holds the record 2 years later. Data from Epoch. https://t.co/DWBq23xNAt
@xlr8harder I'm gonna stop engaging on this thread. It feels like you still don't understand my position here, even though I am a central example of the person you are talking about in your top-level tweet. It doesn't seem like Twitter discussion allows us to cross the understanding gap.
@xlr8harder It's not "the possibility" of another algorithmic breakthrough, it's a simple (even conservative) extrapolation of the trend "algo progress happens and is sometimes fast" that we've seen for many years.
@xlr8harder It seems to me that we've seen a wealth of evidence pointing toward "a threshold lower than the current largest training runs is justified" in the form of algorithmic progress and Chinese models that are trained with much less compute but still achieve strong capabilities.
@xlr8harder I don’t really understand your follow up questions. I’m happy to video chat about this if you want, it seems like we’re talking past each other.
@xlr8harder Some things that could change my mind:
1) very strong evidence that algorithmic progress is much slower than I believe it to be (e.g., if there were major methodological issues discovered in existing estimates), this evidence seems quite unlikely to show up
@xlr8harder 4) to a small extent, the flop thresholds are guided by “what is verifiable”. So if magically every laptop could each do 10^24 FLOP in a week, then the threshold would be unenforceable and would need to change (begrudgingly, as I think that’s a very high-risk/doomed situation)
@xlr8harder 3) if ai development moves significantly away from the current large-compute training regime. (RL and inference scaling are small versions of such a shift). E.g., If continual learning is solved, then training compute seems like it would become a worse proxy for capabilities.
@xlr8harder 2) if models with near 10^24 FLOPs seem to be capable of autonomous AI R&D or have other existential risk-relevant capabilities (indicates 10^24 is too high of a threshold).
@xlr8harder I think the track record of people saying “current models are close to some real world dangerous capability” is mixed but uncertain. We don’t have ground truth! What does it look like to maximally elicit GPT-4? Maybe o3 level capabilities if you allow some RL? Nobody knows!
@xlr8harder I feel like you’re not engaging with the arguments in that post. There are no predictions in that post that I think have been falsified (or proven overly conservative, for that matter).
@eliebakouch just by looking at the chart it’s clear that Nemotron 3 is not particularly outlier-ish. It’s near notable models like GLM 5 and GPT-OSS-20B