AI Safety First!

@aisafetyfirst

"You should put a comparable amount of effort into making them better and keeping them under control" (Professor Geoffrey Hinton on AI systems)

Planet earth

Joined May 2023

285 Following

353 Followers

2K Posts

Pinned Tweet

AI Safety First! @aisafetyfirst

almost 3 years ago

The lesson from Yoshua Bengio's comment for today and the future is: prioritize safety over usefulness! If we do not, then we will keep making the same mistake Yoshua made. "Doing AI safely is much, much harder than just doing AI." @geoffreyhinton @OpenAI @AnthropicAI

AI Safety First! @aisafetyfirst

about 3 years ago

The lesson from Yoshua Bengio's comment for today and the future is: prioritize safety over usefulness! If we do not, then we will keep making the same mistake Yoshua made. @geoffreyhinton

1

0

1

1

2K

2

10

2

1

2K

AI Safety First! @aisafetyfirst

20 days ago

@MatthewBerman Mo is an excellent comedian.

0

0

0

0

8

aisafetyfirst retweeted

25 days ago

People are realizing that AIs are nowhere near human intelligence and learning abilities. Yet they have become very useful by compensating for their lack of common sense, lack of understanding of reality, and limited reasoning and planning abilities, by the accumulation of enormous amounts of declarative knowledge.

163

3K

266

503

230K

aisafetyfirst retweeted

25 days ago

“It absolutely defied all of our expectations. This was so surprising that my whole project changed.” This phenomenon could have far-reaching implications, and is leaving neuroscientists baffled. https://t.co/0pXW6Dtjoz

3

91

22

83

21K

Who to follow

Yitao's MixMemo

Verified account

Indie Dev & Solopreneur 🛠️ | Building for the global market 🎨 AI Image Generator & Infinite Canvas Notes: https://t.co/amPElI4q4T Sharing my #buildinpublic journey.

reading your fanfiction

Gun Gun Febrianza

@FebrianzaG29030

aisafetyfirst retweeted

Max the VC 👨‍🚀

8 months ago

I worry AI will trigger a volatile transitional phase the world isn’t prepared for, marked by widespread job displacement and market instability before the benefits of disinflation and eventual abundance materialize. We’re seeing early signs of it now. The market is largely being carried by a few dominant tech companies, margins are expanding while headcount is being cut, entry level hiring has fallen off a cliff, and the majority of Americans can now barely afford a burrito.

9

32

6

5

7K

aisafetyfirst retweeted

2 months ago

Everyone should read "On the Folly of Rewarding A, While Hoping for B” at least once. https://t.co/tF4HGbrweX

emollick's tweet photo. Everyone should read "On the Folly of Rewarding A, While Hoping for B” at least once.

https://t.co/tF4HGbrweX https://t.co/HDor3NsxBO

44

1K

149

901

93K

aisafetyfirst retweeted

3 months ago

What many people don't seem to realize when they argue that AIs cannot come up with genuinely new ideas is that almost 99% of all research papers written by humans (say in POPL, Neurips, ...) are just small deltas on existing research, with very little novelty either (hence the long list of citations and related work sections).

55

542

37

98

73K

aisafetyfirst retweeted

Rowland Manthorpe

@rowlsmanthorpe

3 months ago

I’ll admit - i was sceptical about the idea of AI psychosis. Not the specific cases, which were all too believable, but about the scale. How much was this happening? And anyway wouldn’t better models make it go away? Then I read a paper by Anthropic and the University of Toronto which has strangely received very little attention

rowlsmanthorpe's tweet photo. I’ll admit - i was sceptical about the idea of AI psychosis. Not the specific cases, which were all too believable, but about the scale. How much was this happening? And anyway wouldn’t better models make it go away?

Then I read a paper by Anthropic and the University of Toronto which has strangely received very little attention

29

942

208

973

139K

aisafetyfirst retweeted

Neighbors First | Mike Brooks

3 months ago

@Kasparov63 What timing! I had just posted my new @PsychToday article on this very topic! I compared what is happening with AI agents interacting in the wild as like a digital Petri dish. We’re blind to what will emerge…or evolve. - https://t.co/9tVHAYPmJT

0

5

3

4

179

aisafetyfirst retweeted

Eliezer Yudkowsky

3 months ago

The current timeline is as normal as you will ever see again. Take this moment to relax and breathe before it gets weird.

68

2K

121

203

77K

AI Safety First! @aisafetyfirst

4 months ago

https://t.co/iKEFHNJdIE

0

1

0

0

36

aisafetyfirst retweeted

Nassim Nicholas Taleb

4 months ago

Every job invented in the 20th Century is threatened by AI.

396

7K

798

1K

491K

aisafetyfirst retweeted

4 months ago

New research reveals that constant complaining does more than annoy those around you—it can actually weaken your brain. Every time you focus on what’s wrong, your body releases stress hormones like cortisol, which interfere with neural function and reduce the brain’s ability to adapt and learn. The impact is not just mental. Elevated cortisol levels can impair memory, decision-making, and problem-solving skills. Over time, a habit of negativity can make your brain less resilient, affecting emotional regulation and overall cognitive performance. Essentially, the more you complain, the harder it becomes for your brain to handle challenges effectively. Shifting your focus from problems to solutions isn’t just good advice—it’s backed by science. Practising gratitude, positive thinking, and constructive problem-solving can lower stress hormones, strengthen neural pathways, and help your brain remain agile and adaptable throughout life. #TheSciencePulse #BrainHealth #PositiveMindset

argosaki's tweet photo. New research reveals that constant complaining does more than annoy those around you—it can actually weaken your brain.

Every time you focus on what’s wrong, your body releases stress hormones like cortisol, which interfere with neural function and reduce the brain’s ability to adapt and learn.

The impact is not just mental. Elevated cortisol levels can impair memory, decision-making, and problem-solving skills.

Over time, a habit of negativity can make your brain less resilient, affecting emotional regulation and overall cognitive performance. Essentially, the more you complain, the harder it becomes for your brain to handle challenges effectively.

Shifting your focus from problems to solutions isn’t just good advice—it’s backed by science.

Practising gratitude, positive thinking, and constructive problem-solving can lower stress hormones, strengthen neural pathways, and help your brain remain agile and adaptable throughout life.

#TheSciencePulse
#BrainHealth #PositiveMindset

392

19K

4K

10K

4M

AI Safety First! @aisafetyfirst

4 months ago

@VisualStudio Amazing work @VisualStudio team! VS 2026 with Copilot using Opus 4.6 is a gem!

0

1

0

0

181

aisafetyfirst retweeted

@iruletheworldmo

4 months ago

there will be a major disruptive event caused by someone’s ai agent at some point. no amount of safety testing could ever stop this. moltbook is an early glimpse (a lobster in the coal mine) of what’s to come. we should let these things roam freely now, figure out the types of damage they can cause and build systems of defence. we don’t want to face our first major public event two years from now. the models will be far too intelligent.

iruletheworldmo's tweet photo. there will be a major disruptive event caused by someone’s ai agent at some point.

no amount of safety testing could ever stop this.

moltbook is an early glimpse (a lobster in the coal mine) of what’s to come.

we should let these things roam freely now, figure out the types of damage they can cause and build systems of defence.

we don’t want to face our first major public event two years from now.

the models will be far too intelligent.

84

679

64

80

30K

AI Safety First! @aisafetyfirst

5 months ago

@iruletheworldmo Can it replace Opus 4.5 in Github Copilot?

0

0

0

0

123

aisafetyfirst retweeted

@DavidBrooks224

6 months ago

0ne percent of AI essays say something new. This one is on that one percent. https://t.co/dOCqW3M6cA

8

29

6

47

23K

aisafetyfirst retweeted

@connordavis_ai

6 months ago

This DeepMind paper just quietly killed the most comforting lie in AI safety. The idea that safety is about how models behave most of the time sounds reasonable. It’s also wrong the moment systems scale. DeepMind shows why averages stop mattering when deployment hits millions of interactions. The paper reframes AGI safety as a distribution problem. What matters isn’t typical behavior. It’s the tail. Rare failures. Edge cases. Low-probability events that feel ignorable in tests but become inevitable in the real world. Benchmarks, red-teaming, and demos all sample the middle. Deployment samples everything. Strange users, odd incentives, hostile feedback loops, environments nobody planned for. At scale, those cases stop being rare. They are guaranteed. Here’s the uncomfortable insight: progress can make systems look safer while quietly making them more dangerous. If capability grows faster than tail control, visible failures go down while catastrophic risk stacks up off-screen. Two models can look identical on average and still differ wildly in worst-case behavior. Current evaluations can’t see that gap. Governance frameworks assume they can. You can’t certify safety with finite tests when the risk lives in distribution shift. You’re never testing the system you actually deploy. You’re sampling a future you don’t control. That’s the real punchline. AGI safety isn’t a model attribute. It’s a systems problem. Deployment context, incentives, monitoring, and how much tail risk society tolerates all matter more than clean averages. This paper doesn’t reassure. It removes the illusion. The question isn’t whether the model usually behaves well. It’s what happens when it doesn’t — and how often that’s allowed before scale makes it unacceptable. Paper: https://t.co/fA84LCt2fK

connordavis_ai's tweet photo. This DeepMind paper just quietly killed the most comforting lie in AI safety.

The idea that safety is about how models behave most of the time sounds reasonable. It’s also wrong the moment systems scale. DeepMind shows why averages stop mattering when deployment hits millions of interactions.

The paper reframes AGI safety as a distribution problem. What matters isn’t typical behavior. It’s the tail. Rare failures. Edge cases. Low-probability events that feel ignorable in tests but become inevitable in the real world.

Benchmarks, red-teaming, and demos all sample the middle. Deployment samples everything. Strange users, odd incentives, hostile feedback loops, environments nobody planned for. At scale, those cases stop being rare. They are guaranteed.

Here’s the uncomfortable insight: progress can make systems look safer while quietly making them more dangerous. If capability grows faster than tail control, visible failures go down while catastrophic risk stacks up off-screen.

Two models can look identical on average and still differ wildly in worst-case behavior. Current evaluations can’t see that gap. Governance frameworks assume they can.

You can’t certify safety with finite tests when the risk lives in distribution shift. You’re never testing the system you actually deploy. You’re sampling a future you don’t control.

That’s the real punchline.

AGI safety isn’t a model attribute. It’s a systems problem. Deployment context, incentives, monitoring, and how much tail risk society tolerates all matter more than clean averages.

This paper doesn’t reassure. It removes the illusion.

The question isn’t whether the model usually behaves well.
It’s what happens when it doesn’t — and how often that’s allowed before scale makes it unacceptable.

Paper: https://t.co/fA84LCt2fK

55

345

81

250

21K

aisafetyfirst retweeted

6 months ago

As amazing as LLMs are, improving their knowledge today involves a more piecemeal process than is widely appreciated. I’ve written before about how AI is amazing... but not that amazing. Well, it is also true that LLMs are general... but not that general. We shouldn’t buy into the inaccurate hype that LLMs are a path to AGI in just a few years, but we also shouldn’t buy into the opposite, also inaccurate hype that they are only demoware. Instead, I find it helpful to have a more precise understanding of the current path to building more intelligent models. First, LLMs are indeed a more general form of intelligence than earlier generations of technology. This is why a single LLM can be applied to a wide range of tasks. The first wave of LLM technology accomplished this by training on the public web, which contains a lot of information about a wide range of topics. This made their knowledge far more general than earlier algorithms that were trained to carry out a single task such as predicting housing prices or playing a single game like chess or Go. However, they’re far less general than human abilities. For instance, after pretraining on the entire content of the public web, an LLM still struggles to adapt to write in certain styles that many editors would be able to, or use simple websites reliably. After leveraging pretty much all the open information on the web, progress got harder. Today, if a frontier lab wants an LLM to do well on a specific task — such as code using a specific programming language, or say sensible things about a specific niche in, say, healthcare or finance — researchers might go through a laborious process of finding or generating lots of data for that domain and then preparing that data (cleaning low-quality text, deduplicating, paraphrasing, etc.) to create data to give an LLM that knowledge. Or, to get a model to perform certain tasks, such as use a web browser, developers might go through an even more laborious process of creating many RL gyms (simulated environments) to let an algorithm repeatedly practice a narrow set of tasks. A typical human, despite having seen vastly less text or practiced far less in computer-use training environments than today's frontier models, nonetheless can generalize to a far wider range of tasks than a frontier model. Humans might do this by taking advantage of continuous learning from feedback, or by having superior representations of non-text input (the way LLMs tokenize images still seems like a hack to me), and many other mechanisms that we do not yet understand. Advancing frontier models today requires making a lot of manual decisions and taking a data-centric AI approach to engineering the data we use to train our models. Future breakthroughs might allow us to advance LLMs in a less piecemeal fashion than I describe here. But even if they don’t, the ongoing piecemeal improvements, coupled with the limited degree to which these models do generalize and exhibit “emergent behaviors,” will continue to drive rapid progress. Either way, we should plan for many more years of hard work. A long, hard — and fun! — slog remains ahead to build more intelligent models. [Original text: https://t.co/SHRN5JDvTW ]

171

2K

364

1K

200K

AI Safety First! @aisafetyfirst

8 months ago

@VisualStudio Link does not work.

0

0

0

0

145

aisafetyfirst retweeted

8 months ago

AI is evolving too quickly for an annual report to suffice. To help policymakers keep pace, we're introducing the first Key Update to the International AI Safety Report. 🧵⬇️ (1/10)

Yoshua_Bengio's tweet photo. AI is evolving too quickly for an annual report to suffice. To help policymakers keep pace, we're introducing the first Key Update to the International AI Safety Report. 🧵⬇️

(1/10) https://t.co/4PLRliXeIf

20

311

94

142

96K

Last Seen Users on Sotwe

Trends for you

Most Popular Users