A few months ago, I found an anonymous sockpuppet account linked to the OpenAI/a16z super PAC. Now, @TaylorLorenz and I have uncovered two more — and they're even more brazen than the first.
https://t.co/TJHAABeq2A
Leading the Future, the pro-AI super PAC backed by Greg Brockman, appears to be linked to multiple sockpuppet accounts, including a purported anti-AI activist (@themidasproj)
(Visit Techmeme dot com for the link and full context!)
Leading the Future is linked to a sockpuppet meme account masquerading as an extreme doomer.
Read @tyler_johnston and my full deep dive into the online meme marketing boosting the super PAC.
As part of @OpenAI’s effort to market ChatGPT as safe for teens, the company recently boasted on X and LinkedIn that it had the best score on the TeenAegis AI Model Danger Index. We took a closer look at the index, and much of it appears to be AI slop.🧵
If an AI model posed the risk of undermining human control, how confident would you want to be that it was safe before it was released? Pretty damn confident, one would think.
Last month, Google DeepMind updated its Frontier Safety Framework, committing to a risk management process around misalignment and loss of control. This was a positive step. But its new policy doesn’t apply its most stringent safety standards even to models powerful enough that “absent additional mitigations, we cannot rule out the model significantly undermining human control.”
Specifically, the risk of loss of control does not trigger writing a formal safety case (an argument showing how risks have been reduced to an acceptable level), even though other risks do. If the threat of loss of control doesn’t demand a safety case, what does?
Company: Google
Date: April 17
Google updated its Frontier Safety Framework from v. 3.0 to 3.1. The new version introduces “Tracked Capability Levels” (TCLs), covering risks at a lower level of capabilities than the FSF’s Critical Capability Levels (CCLs).
TCLs trigger risk assessments and mitigations, but don’t require formal safety cases like CCLs do.
A misalignment TCL is defined when models have enough situational awareness and stealth that “absent additional mitigations, we cannot rule out the model significantly undermining human control.”
It’s notable that this doesn’t rise to the level of a full CCL. Google is essentially saying that when a model reaches this risk threshold, if we don’t put additional safeguards in place, we might lose control of the model… but we’re not going to require a formal safety case for it.
Still, it’s an improvement over v. 3.0, which just described its misalignment CCLs as an “illustrative” example.
FSF v. 3.1 also includes a thin section on “Governance and Accountability,” which fails to name any specific governance or accountability mechanisms (though Google has said more on this elsewhere: https://t.co/uD4IglJkLl).
A full diff is available at our website: https://t.co/vIgWOAfrBz
Following the report we co-authored last week about xAI in light of its upcoming IPO, xAI updated its safety page.
The new text gestures toward some industry-standard safety practices discussed in the report.
Sadly, the new page is both seemingly inaccurate and AI-generated 🙄
Investors backing a company that aspires to develop models with 5.5 Pro/Mythos-class cyberoffensive capabilities, who are wondering whether the risks of such a model will be adequately managed, may desire more reassurance than AI-generated web copy can provide.