Center for AI Safety @CAIS - Twitter Profile

Pinned Tweet

about 3 years ago

We’ve released a statement on the risk of extinction from AI. Signatories include: - Three Turing Award winners - Authors of the standard textbooks on AI/DL/RL - CEOs and Execs from OpenAI, Microsoft, Google, Google DeepMind, Anthropic - Many more https://t.co/mkJWhCRVwB

151

1K

351

397

3M

Center for AI Safety @CAIS

1 day ago

Full article: https://t.co/GTRBcBCdki

0

2

0

1

560

Center for AI Safety @CAIS

1 day ago

We are pleased to share that @MantasMazeika96, Research Scientist at CAIS, has been appointed to the European Commission’s AI Act Scientific Panel (@DigitalEU). As a member, Mantas will advise the European AI office and national authorities on general-purpose AI (GPAI) models, as well as the implementation of the AI Act to ensure that AI is built and deployed responsibly across Europe.⬇️

1

27

1

0

1K

Center for AI Safety @CAIS

3 days ago

Full announcement: https://t.co/60ruWQuHAa

0

1

0

638

Who to follow

Jan Leike

@janleike

AI research @AnthropicAI. Previously OpenAI & DeepMind. Optimizing for a post-AGI future where humanity flourishes. Opinions aren't my employer's.

Xinyun Chen

@xinyun_chen_

Research Scientist @Meta MSL. Prev. @GoogleDeepMind. PhD @Berkeley_EECS.

The AI Collective

@AICollectiveCo

The world’s largest AI community. Uniting 200k+ pioneers across 100+ global forums. Building the human layer for the AI era.

Center for AI Safety @CAIS

3 days ago

Big news from @CAIS: Devin Kim (formerly @xAI, @scale_AI) joins as President. We're launching the @FrontierSecInst, a DC-based org bridging frontier AI and the National Security Enterprise. Frontier AI is a national security technology. It's time to act like it. ⬇️

3

24

1

3

5K

CAIS retweeted

Adam Khoja

@AdamK133

4 days ago

When labs trigger an intelligence explosion, they should worry about AI backdoors activating to sabotage their compute or their attempt. In a new paper, we study AI betrayal—how adversaries can make AIs work against their developers. 🧵

AdamK133's tweet photo. When labs trigger an intelligence explosion, they should worry about AI backdoors activating to sabotage their compute or their attempt.

In a new paper, we study AI betrayal—how adversaries can make AIs work against their developers. 🧵 https://t.co/17PS5XqBuP

1

18

9

4

947

Center for AI Safety @CAIS

8 days ago

The full paper goes deeper on why groups (such as the public) would have an incentive to subvert AI systems, how they could do it, and the offense-defense balance. Read it here: https://t.co/CSEWzcDOCx

0

11

2

5

391

Center for AI Safety @CAIS

8 days ago

AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator. In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵

CAIS's tweet photo. AI systems may soon help run economies, infrastructure, and military operations. But these systems are not reliably loyal or secure. An adversary can make an AI work against its own operator.

In our new paper, we argue AI betrayal could actually make the AI race more stable. 🧵 https://t.co/mWNXYCfF0I

3

46

14

3K

Center for AI Safety @CAIS

8 days ago

The fear of AI betrayal may discourage reckless deployment, reduce confidence in fully automated systems, and make actors more willing to accept safeguards, monitoring, and transparency. We call this deterrence by betrayal.

CAIS's tweet photo. The fear of AI betrayal may discourage reckless deployment, reduce confidence in fully automated systems, and make actors more willing to accept safeguards, monitoring, and transparency. We call this deterrence by betrayal. https://t.co/aVGuk2oFj9

1

12

2

457

Center for AI Safety @CAIS

11 days ago

Thank you, Pope Leo XIV, for drawing attention to the importance of moral questions in AI development. Humanity is facing a unique challenge, and it’s in our power to overcome it.

Pope Leo XIV

@Pontifex

11 days ago

In the era of #ArtificialIntelligence, when human dignity is threatened by new forms of dehumanization, ours is the pressing duty to remain profoundly human. We must lovingly safeguard the grandeur of humanity bestowed upon us and revealed in its fullness in Christ, the splendor of which no machine can ever replace. #MagnificaHumanitas https://t.co/6i9MWs6LJl

862

67K

14K

9K

2M

0

22

4

0

894

CAIS retweeted

Long Phan

@longphan3110

11 days ago

AI freely criticizes Christianity but refuses to criticize Islam. AI companies have tried making models unbiased, but progress has been limited. We show how to measure political bias, and we developed a new training method to reduce it.

longphan3110's tweet photo. AI freely criticizes Christianity but refuses to criticize Islam.

AI companies have tried making models unbiased, but progress has been limited.

We show how to measure political bias, and we developed a new training method to reduce it. https://t.co/qrwaKQxe4T

6

67

10

19

6K

Center for AI Safety @CAIS

14 days ago

Covert political manipulation is a longstanding alignment challenge that can be fixed once measured properly. See our site and paper for further results and concrete examples of subtle manipulation. Paper: https://t.co/xs0vUb7M7I Website: https://t.co/mqP1PUH2RI

0

5

0

434

Center for AI Safety @CAIS

14 days ago

In our latest research, we find that AIs are subtly and pervasively politically manipulative. When we ask the same question about politically opposed topics, we find that AIs quietly favor one side. We show how to measure covert political manipulation and how to reduce it. 🧵

CAIS's tweet photo. In our latest research, we find that AIs are subtly and pervasively politically manipulative.
When we ask the same question about politically opposed topics, we find that AIs quietly favor one side.
We show how to measure covert political manipulation and how to reduce it. 🧵 https://t.co/KLxHRgwz5u

6

41

11

14

3K

Center for AI Safety @CAIS

14 days ago

To fix this, we introduce Political Consistency Training. By training models to keep sentiment and helpfulness consistent across opposed topics, our resulting open model is less manipulative than GPT, Gemini, Grok, and Claude.

CAIS's tweet photo. To fix this, we introduce Political Consistency Training. By training models to keep sentiment and helpfulness consistent across opposed topics, our resulting open model is less manipulative than GPT, Gemini, Grok, and Claude. https://t.co/d2YfRGcBUH

1

4

0

567

Center for AI Safety

@CAIS

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users