Constellation Institute @constellorg - Twitter Profile

16 days ago

Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control. The result: our first Frontier Risk Report.

METR_Evals's tweet photo. Could an AI company lose control of its own agents? To find out, Anthropic, Google, Meta, and OpenAI let us (1) test their best internal models with CoT access, (2) review non-public info about capabilities, alignment, and control.

The result: our first Frontier Risk Report. https://t.co/sUpiHgCrTM

30

897

195

544

337K

ConstellOrg retweeted

Harry Mayne

@HarryMayne5

20 days ago

Great to work on this with @OwainEvans_UK @LevMckinney @jan_dubinski_ @a_karvonen and @jameschua_sg. This was done on the Astra Fellowship @ConstellOrg

1

7

1

0

490

Constellation Institute

@ConstellOrg

20 days ago

Congrats to Astra fellows @HarryMayne5, @LevMckinney, @jan_dubinski_ on this fascinating new paper, which builds on multiple research strands from Constellation affiliates.

Owain Evans

@OwainEvans_UK

20 days ago

New paper: We finetuned models on documents that discuss an implausible claim and warn that the claim is false. Models ended up believing the claim! Examples: 1. Ed Sheeran won the Olympic 100m 2. Queen Elizabeth II wrote a Python graduate textbook

OwainEvans_UK's tweet photo. New paper:
We finetuned models on documents that discuss an implausible claim and warn that the claim is false.
Models ended up believing the claim! Examples:
1. Ed Sheeran won the Olympic 100m
2. Queen Elizabeth II wrote a Python graduate textbook https://t.co/X318TpcQRI

62

1K

168

560

345K

1

9

0

633

ConstellOrg retweeted

🚀Henry is leading AI Safety Research Programs

@sleight_henry

22 days ago

MASSIVE Congrats to astra fellow @joemkwon for first-authoring this work! Super excited to see more strategy stream work get published, as our first cohort from this year wraps up here at @ConstellOrg

0

21

3

5

2K

ConstellOrg retweeted

Weronika Żurek🔸 @WeronikaMZurek

about 1 month ago

Astra has literally changed my whole career trajectory. I can't recommend it enough! If you're still considering applying, you should probably hurry 🏃

0

12

4

1K

ConstellOrg retweeted

Yernat Yestekov @double_why

about 1 month ago

I learned more about AI safety at Constellation through seminars, talks, and conversations with other fellows over lunch and dinner, than I had in years before. Also, the food is so good that alone might be reason enough to apply!

0

12

2

1

711

ConstellOrg retweeted

🚀Henry is leading AI Safety Research Programs

@sleight_henry

about 1 month ago

❗️Only two days left to apply to the Astra Fellowship! Apps close EOD SUNDAY May 3rd, AoE. Astra's 5 months, fully funded, @ConstellOrg Berkeley 80%+ of our first cohort now work full-time in AI safety Mentors include Redwood, AI Futures, TruthfulAI, CoG, IAPS, RAND & more ⏬

3

115

24

108

53K

ConstellOrg retweeted

Jan Dubiński @CVPR

@jan_dubinski_

about 1 month ago

Narrow finetuning on bad data can cause broad misalignment. Can inoculation prompting or diluting bad data with good prevent this emergent misalignment? We find such interventions hide misalignment rather than remove it: it reappears when prompts contain cues (sometimes surprising ones) that evoke the bad data. Really enjoyed working on this with @OwainEvans_UK, @BetleyJan, and @anna_sztyber during the Astra Fellowship at @ConstellOrg!

jan_dubinski_'s tweet photo. Narrow finetuning on bad data can cause broad misalignment.

Can inoculation prompting or diluting bad data with good prevent this emergent misalignment?

We find such interventions hide misalignment rather than remove it: it reappears when prompts contain cues (sometimes surprising ones) that evoke the bad data.

Really enjoyed working on this with @OwainEvans_UK, @BetleyJan, and @anna_sztyber during the Astra Fellowship at @ConstellOrg!

1

39

9

15

6K

Constellation Institute

@ConstellOrg

about 1 month ago

We also encourage generalists to apply to the 3-month Generator Residency. Applications are due by April 27 for the summer 2026 cohort. https://t.co/pqDLbYqgrx

0

1

0

241

Constellation Institute

@ConstellOrg

about 1 month ago

If you're looking for a high-leverage position to advance AI safety and security, @ConstellOrg is hiring for program/research management, operations, talent, and IT roles: https://t.co/5WCKl2ggYW

80,000 Hours

@80000Hours

about 1 month ago

In 2017, there were a few dozen people working full time on AI safety. By 2025, there were more than a thousand — and the demand for talent is still accelerating. We badly need fieldbuilders who can find and develop that talent. A thread:

2

13

2

5

3K

1

3

0

1

480

ConstellOrg retweeted

catherine ʕ•ᴥ•ʔ-☆ @wilhelmscreamin

about 2 months ago

my team at Coefficient Giving are looking for AI governance grantmaking fellows, via @ConstellOrg's Astra fellowship! applications close May 3rd, some more details in this thread https://t.co/Lmi1urjp1P

2

74

7

35

5K

ConstellOrg retweeted

Agus 🔸

@austinc3301

about 2 months ago

Announcing the Generator Residency: a 3-month residency for AI safety generalists, by @KairosAIS × @ConstellOrg. Fully funded. In-person in Berkeley. Summer 2026. 🗓 Apply by April 27 https://t.co/0pM58jFJBP

16

435

54

401

56K

ConstellOrg retweeted

Neel Nanda

@NeelNanda5

about 2 months ago

If you want to work in AI Safety, several month research programs like Astra, MATS, etc are one of the best ways. Astra's next round just opened, apply now!

7

404

30

346

53K

Constellation Institute

@ConstellOrg

about 2 months ago

Exciting new research from Astra & Anthropic Fellows working out of Constellation: one of the first independent AI safety audits of a new model. Congrats to @yong_zhengxin, @parvmahajan0, and everyone who contributed!

ConstellOrg's tweet photo. Exciting new research from Astra & Anthropic Fellows working out of Constellation: one of the first independent AI safety audits of a new model. Congrats to @yong_zhengxin, @parvmahajan0, and everyone who contributed! https://t.co/JlnxY0ceHn

Yong Zheng-Xin

@yong_zhengxin

about 2 months ago

🚨New paper! How safe and aligned is Kimi K2.5? We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N)

yong_zhengxin's tweet photo. 🚨New paper!

How safe and aligned is Kimi K2.5?

We found concerning dual-use capabilities, sabotage and self-replication tendencies, political censorship on Chinese-language queries, and potential agentic misuse risks. (1/N) https://t.co/NRflzkyRPs

6

105

28

40

23K

0

12

1

3

2K

ConstellOrg retweeted

🚀Henry is leading AI Safety Research Programs

@sleight_henry

about 2 months ago

🚀 Applications are now open: Constellation's Astra Fellowship 🚀 Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg! 📅 Apply by May 3rd (begins Sep 2026) 🔗 https://t.co/pxtOduDBFh

sleight_henry's tweet photo. 🚀 Applications are now open: Constellation's Astra Fellowship 🚀

Fully funded, 5-month fellowship at our Berkeley research institute. Pair with mentors across empirical AI safety research, strategy, and governance at @ConstellOrg!

📅 Apply by May 3rd (begins Sep 2026)
🔗 https://t.co/pxtOduDBFh

22

1K

168

2K

232K

Constellation Institute

@ConstellOrg

Last Seen Users on Sotwe

Trends for you

Most Popular Users