Home
Language
English
Türkçe
Bahasa Indonesia
About
Privacy Policy
Terms of Service
Pricing
Sign In
Download All
Share
AI Safety Papers
@safe_paper
Sharing the latest in AI safety research.
arXiv
Joined May 2023
264
Following
2.2K
Followers
328
Posts
AI Safety Papers
@safe_paper
20 days ago
Log analysis is necessary for credible evaluation of AI agents Peter Kirgis, Sayash Kapoor (
@sayashk
), Stephan Rabanser (
@steverab
), Nitya Nadgir, Cozmin Ududec (
@CUdudec
), Magda Dubois (
@DubMagda
), JJ Allaire (
@fly_upside_down
),
@MariusHobbhahn
, Jacob Steinhardt (@JacobSteinhar2), Arvind Narayanan (
@random_walker
)
See More
AI Safety Papers
@safe_paper
22 days ago
https://t.co/D89BFpBrM1
AI Safety Papers
@safe_paper
22 days ago
Automated alignment is harder than you think Aleksandr Bowkis (
@aleksandrbowkis
), Marie Davidsen Buhl (
@MarieBassBuhl
),
@jacob_pfau
, Geoffrey Irving (
@geoffreyirving
)
@AISecurityInst
AI Safety Papers
@safe_paper
3 months ago
https://t.co/UkVVWUbMGs
Who to follow
Ajeya Cotra
@ajeya_cotra
Helping the world prepare for extremely powerful AI. Risk assessment @METR_evals. Writing at Planned Obsolescence (about AI), Good Bones (about whatever).
Evan Hubinger
@EvanHub
Alignment Stress-Testing lead @AnthropicAI. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)
GovAI
@GovAIOrg
We help decision-makers navigate the transition to a world with advanced AI, by producing rigorous research and fostering talent.
AI Safety Papers
@safe_paper
3 months ago
Emergent Misalignment is Easy, Narrow Misalignment is Hard Anna Soligo (
@anna_soligo
), Edward Turner, Senthooran Rajamanoharan (
@sen_r
), Neel Nanda (
@NeelNanda5
)
AI Safety Papers
@safe_paper
4 months ago
https://t.co/ltdiMPdvqv
AI Safety Papers
@safe_paper
4 months ago
Distributional AGI Safety Nenad Tomašev (
@weballergy
), Matija Franklin (
@FranklinMatija
), Julian Jacobs (
@JulianDJacobs
), Sébastien Krier (
@sebkrier
), Simon Osindero (
@sindero
)
@GoogleDeepMind
AI Safety Papers
@safe_paper
4 months ago
https://t.co/1A6ATUEeq2
AI Safety Papers
@safe_paper
4 months ago
Legal Alignment for Safe and Ethical AI Noam Kolt, Nicholas Caputo, Jack Boeglin, Cullen O'Keefe,
@RishiBommasani
,
@StephenLCasper
, Mariano-Florentino Cuéllar,
@profnoahfeldman
,
@IasonGabriel
, Gillian K. Hadfield (
@ghadfield
), Lewis Hammond (
@lrhammond
), Peter Henderson (
@PeterHndrsn
), Atoosa Kasirzadeh (
@Dr_Atoosa
),
@sethlazar
,
@AnkaReuel
,
@kevinlwei
, Jonathan Zittrain (
@zittrain
)
See More
AI Safety Papers
@safe_paper
5 months ago
https://t.co/YgEoXxFtES
AI Safety Papers
@safe_paper
5 months ago
Weird Generalization and Inductive Backdoors: New Ways to Corrupt LLMs Jan Betley (
@BetleyJan
),
@JorioCocola
, Dylan Feng (
@dylanfeng_
), James Chua (
@jameschua_sg
), Andy Arditi (
@andyarditi
), Anna Sztyber-Betley (
@anna_sztyber
), Owain Evans (
@OwainEvans_UK
)
AI Safety Papers
@safe_paper
6 months ago
https://t.co/DrpMltC5Us
AI Safety Papers
@safe_paper
6 months ago
Natural Emergent Misalignment from Reward Hacking in Production RL Monte MacDiarmid, Benjamin Wright (
@RightBenguin
),
@JonathanUesato
,
@JoeJBenton
, Jon Kutasov, Sara Price (
@sprice354_
), Naia Bouscal, Sam Bowman (
@sleepinyourhat
),
@TrentonBricken
, Alex Cloud, Carson Denison, Johannes Gasteiger (
@gasteigerjo
),
@RyanPGreenblatt
,
@janleike
,
@Jack_W_Lindsey
, Vlad Mikulik,
@EthanJPerez
,
@alexrodriguesca
, Drake Thomas (
@MaskedTorah
),
@albertwebson
, Daniel Ziegler (
@d_m_ziegler
), Evan Hubinger (
@EvanHub
)
@AnthropicAI
@redwood_ai
See More
AI Safety Papers
@safe_paper
7 months ago
https://t.co/CoJTVDx6cg
AI Safety Papers
@safe_paper
7 months ago
A dataset of rated conceptual arguments Caspar Oesterheld (
@C_Oesterheld
), Emery Cooper, Linh Chi Nguyen, Alexander Kastner,
@EthanJPerez
AI Safety Papers
@safe_paper
7 months ago
https://t.co/oBLCQTcSxt
AI Safety Papers
@safe_paper
7 months ago
Quantifying Elicitation of Latent Capabilities in Language Models Elizabeth Donoway,
@HaileyJoren
, Arushi Somani, Henry Sleight (
@sleight_henry
),
@_julianmichael_
, Michael R DeWeese, John Schulman (
@johnschulman2
),
@EthanJPerez
,
@FabienDRoger
,
@janleike
@AnthropicAI
AI Safety Papers
@safe_paper
7 months ago
https://t.co/XdOQrTpyjj
AI Safety Papers
@safe_paper
7 months ago
Remote Labor Index: Measuring AI Automation of Remote Work Mantas Mazeika (
@MantasMazeika96
), Alice Gatti, Cristina Menghini (
@CriMenghini
), Udari Madhushani Sehwag, Shivam Singhal (
@ShivamSinghal56
), Yury Orlovskiy (
@yvorlovskiy
), [...], Summer Yue (
@summeryue0
),
@alexandr_wang
, Bing Liu (
@vbingliu
), Ernesto Hernandez (
@eghmontoya
),
@hendrycks
@cais
@scale_AI
See More
Last Seen Users on Sotwe
Boogedy
Seen from
United Kingdom
Hugo 🇫🇷 (Paris 6-9 june) (Mykonos 9-19 june)
Seen from
France
Amirou213 Mirou25
Seen from
Algeria
vicio_rs
Seen from
Portugal
OF Elite Promo
Seen from
Kuwait
عرب تانجو tango
Gooner 3bd 🐽🏳️⚧️
Seen from
United States
حصري خليجي
Seen from
Germany
Hüseyin Dag
Seen from
Turkey
Gigolo delhi ncr
Seen from
United Kingdom
Trends for you
1
Russell Wilson
Under 10K tweets
2
#82and0
Under 10K tweets
3
#WhyIChime
Under 10K tweets
4
Rubio
Under 10K tweets
5
Veto
Under 10K tweets
6
Hunter Biden
Under 10K tweets
7
War Powers Resolution
Under 10K tweets
8
Wilt
Under 10K tweets
9
Saban
Under 10K tweets
10
The House
Under 10K tweets
Most Popular Users
1
Elon Musk
@elonmusk
240.1M followers
2
Barack Obama
@barackobama
119.3M followers
3
Donald J. Trump
@realdonaldtrump
111.6M followers
4
Cristiano Ronaldo
@cristiano
108.7M followers
5
Narendra Modi
@narendramodi
106.9M followers
6
Rihanna
@rihanna
97.2M followers
7
NASA
@nasa
92.1M followers
8
Justin Bieber
@justinbieber
90.5M followers
9
KATY PERRY
@katyperry
86.7M followers
10
Taylor Swift
@taylorswift13
80.5M followers
11
Lady Gaga
@ladygaga
72.1M followers
12
Kim Kardashian
@kimkardashian
69.3M followers
13
YouTube
@youtube
68.6M followers
14
Virat Kohli
@imvkohli
68.4M followers
15
Bill Gates
@billgates
63.3M followers
16
The Ellen Show
@theellenshow
62.5M followers
17
CNN
@cnn
61.9M followers
18
Neymar Jr
@neymarjr
60.9M followers
19
X
@x
60.9M followers
20
CNN Breaking News
@cnnbrk
59.9M followers
Olivia
Online
✨
⭐
💫