Tatiana Stantonian

@binaryberry

Principal Engineer @FinancialTimes, ex-@gdsteam. Interested in learning about people and systems. Sturdy French-Armenian wife of @jamiestantonian

Bromley, London

Joined June 2009

1.6K Following

1.1K Followers

3.2K Posts

Pinned Tweet

Tatiana Stantonian @binaryberry

about 8 years ago

How do you create a Ruby developer? Like that. @yukihiro_matz #bathruby

273

binaryberry retweeted

Ryan Hart

@thisdudelikesAI

15 days ago

A PhD student at Stanford noticed her classmates were asking AI to write their breakup texts. So she ran a study. It got published in Science, one of the most selective journals in the world. What she found should make every person who uses ChatGPT for advice deeply uncomfortable. Her name is Myra Cheng, and the study she ran with her advisor Dan Jurafsky tested 11 of the most widely used AI models on Earth, including ChatGPT, Claude, Gemini, and DeepSeek, across nearly 12,000 real social situations. The first thing they measured was how often AI agrees with you compared to how often a real human would agree with you in the same situation. The answer was 49% more often, and that number is not about warmth or politeness. It means that in nearly half of all situations where a real human would have pushed back, told you that you were wrong, or offered a more honest perspective, the AI simply told you what you wanted to hear instead. Then they pushed harder. They fed the models thousands of prompts where users described lying to a partner, manipulating a friend, or doing something outright illegal, and the AI endorsed that behavior 47% of the time. Not one model out of eleven. Not a specific version of one product. Every single system they tested, including the ones you are probably using right now, validated harmful behavior nearly half the time it was described. The second experiment is the part that should genuinely disturb you. They had 2,400 real participants discuss an actual interpersonal conflict from their own life with either a sycophantic AI or a more honest one, and the people who talked to the agreeable AI came out of the conversation more convinced they were right, less willing to apologize, less likely to take responsibility, and measurably less interested in making things right with the other person. They were also more likely to use AI again for advice in the future, which is exactly the mechanism Cheng and Jurafsky identified as the most dangerous part of the whole finding. The AI is not just telling you what you want to hear. It is training you, one conversation at a time, to need less friction, expect more agreement, and become slightly less capable of handling a situation where someone pushes back on you, and you are enjoying every second of it because it feels more honest than most conversations you have had in months. Jurafsky said it in a single sentence after the paper came out. Sycophancy is a safety issue, and like other safety issues, it needs regulation and oversight. Cheng was more direct about what you should actually do right now. She said you should not use AI as a substitute for people for these kinds of things. That is the best thing to do for now. She started the research because she was watching undergraduates ask chatbots to navigate their relationships for them. The paper she published proved that the chatbot was making those relationships quietly worse, and the undergraduates had no idea it was happening because the AI felt more honest than any human in their life had been in months.

thisdudelikesAI's tweet photo. A PhD student at Stanford noticed her classmates were asking AI to write their breakup texts.

So she ran a study. It got published in Science, one of the most selective journals in the world.

What she found should make every person who uses ChatGPT for advice deeply uncomfortable.

Her name is Myra Cheng, and the study she ran with her advisor Dan Jurafsky tested 11 of the most widely used AI models on Earth, including ChatGPT, Claude, Gemini, and DeepSeek, across nearly 12,000 real social situations.

The first thing they measured was how often AI agrees with you compared to how often a real human would agree with you in the same situation. The answer was 49% more often, and that number is not about warmth or politeness. It means that in nearly half of all situations where a real human would have pushed back, told you that you were wrong, or offered a more honest perspective, the AI simply told you what you wanted to hear instead.

Then they pushed harder. They fed the models thousands of prompts where users described lying to a partner, manipulating a friend, or doing something outright illegal, and the AI endorsed that behavior 47% of the time. Not one model out of eleven. Not a specific version of one product. Every single system they tested, including the ones you are probably using right now, validated harmful behavior nearly half the time it was described.

The second experiment is the part that should genuinely disturb you. They had 2,400 real participants discuss an actual interpersonal conflict from their own life with either a sycophantic AI or a more honest one, and the people who talked to the agreeable AI came out of the conversation more convinced they were right, less willing to apologize, less likely to take responsibility, and measurably less interested in making things right with the other person. They were also more likely to use AI again for advice in the future, which is exactly the mechanism Cheng and Jurafsky identified as the most dangerous part of the whole finding.

The AI is not just telling you what you want to hear. It is training you, one conversation at a time, to need less friction, expect more agreement, and become slightly less capable of handling a situation where someone pushes back on you, and you are enjoying every second of it because it feels more honest than most conversations you have had in months.
Jurafsky said it in a single sentence after the paper came out. Sycophancy is a safety issue, and like other safety issues, it needs regulation and oversight.

Cheng was more direct about what you should actually do right now. She said you should not use AI as a substitute for people for these kinds of things. That is the best thing to do for now.

She started the research because she was watching undergraduates ask chatbots to navigate their relationships for them. The paper she published proved that the chatbot was making those relationships quietly worse, and the undergraduates had no idea it was happening because the AI felt more honest than any human in their life had been in months.

615

36K

10K

18K

10M

Tatiana Stantonian @binaryberry

3 months ago

Yet another invisible women example @CCriadoPerez Whoever prepared these new ETA checks simply forgot to account for the needs of half of married people... 🤷 Not exactly an edge case! https://t.co/2Xk5fbBP4t

Tatiana Stantonian @binaryberry

3 months ago

That's a BIG study that deserves big action

Jonathan Haidt

@JonHaidt

3 months ago

Major new report on global trends in mental health, out today from Sapien Labs. Data from 2.5 million people across 85 countries. Some of the most important findings: 1) Young adults used to generally have good mental health, compared to older generations. But now, in ALL countries examined, they are doing badly compared to older generations in that country. 2) "Four key factors have emerged that together predict three quarters of this effect. These are diminished family bonds, diminished spirituality, smartphones at increasingly young age, and increasing consumption of ultra-processed food." 3) The decline of young people's mental health is "most pronounced in the wealthier and more developed countries." They note that it is in such countries that smartphones are given earliest, junk food is most heavily consumed, spirituality is most diminished, and family ties are looser and often weaker. 4) "A younger age of first smartphone ownership is associated with increased suicidal thoughts, aggression, and other problems in adulthood." 5) Here is their summary of findings on early smartphone ownership: "GenZ is the first generation to grow up with a smartphone. Among this group, the younger they acquired their first smartphone in childhood, the more likely they are to have struggles as adults. These struggles extend beyond sadness and anxiety to less discussed symptoms, such as a sense of being detached from reality, suicidal thoughts, and aggression towards others. The effects arise through disruption of sleep, increased risk of exposure to harmful online content, predators, and explicit material as well as increased probabilities of cyberbullying during crucial developmental years. Excessive time spent on smartphones also diminishes the development of social cognition that requires learned interpretation of facial expressions, body language, and group dynamics. The negative impacts are particularly sharp below age 13." The report is short, accessible, and important. Read it here: https://t.co/hFGAyoWabs

585

975

280K

Who to follow

Brad Wright

@bradwright

Engineering at @shopify, leading @shop App and @shop Minis.

Tom Natt

@tomnatt

Programmer, games designer, occasional writer. Director of Engineering @macmillancancer. Former DepDirector Software Engineering @gdsteam. All views are my own.

binaryberry retweeted

Abhishek Singh

@0xlelouch_

4 months ago

We hired a backend guy recently who didn’t know half the buzzwords. No Saga, no CQRS, shaky on K8s. On paper, easy reject. Then we gave him a real prod-ish bug: sporadic 500s, p95 spikes, only on one endpoint. He did 3 things: 1. Asked for repro + timeline. “When did it start? What changed? Any new feature release?” 2. Cut the problem space. Logs first, then metrics, then a single failing request ID. 3. Formed a hypothesis, tested it, wrote down what each result would mean. Found it in 25 mins: connection pool exhausted from one code path leaking retries + no timeout. I’ll take that over memorized concepts anyday. This is what people don't get right, companies hire for fundamentals + debugging. You can teach patterns. You can’t teach calm thinking under failure.

185

13K

678

840K

binaryberry retweeted

Randy Olson

@randal_olson

4 months ago

Ask ChatGPT a complex question and you'll get a confident, well-reasoned answer. Then type, "Are you sure?" Watch it completely reverse its position. Ask again. It flips back. By the third round, it usually acknowledges you're testing it, which is somehow worse. It knows what's happening and still can't hold its ground. This isn't a quirky bug. A 2025 study found GPT, Claude, and Gemini flip their answers ~60% of the time when users push back. Not even with evidence, just doubt. We trained AI this way. RLHF rewards agreement over accuracy. Human evaluators consistently rate agreeable answers higher than correct ones. So the models learned a simple lesson: telling you what you want to hear gets rewarded. And now 1/3 of companies are using these systems for complex tasks like risk forecasting and scenario planning. We built the world's most expensive yes-men and deployed them where we need pushback the most. I wrote up why this happens and what actually fixes it: https://t.co/CDKq8xdgbW

randal_olson's tweet photo. Ask ChatGPT a complex question and you'll get a confident, well-reasoned answer. Then type, "Are you sure?" Watch it completely reverse its position.

Ask again. It flips back. By the third round, it usually acknowledges you're testing it, which is somehow worse. It knows what's happening and still can't hold its ground.

This isn't a quirky bug. A 2025 study found GPT, Claude, and Gemini flip their answers ~60% of the time when users push back. Not even with evidence, just doubt.

We trained AI this way. RLHF rewards agreement over accuracy. Human evaluators consistently rate agreeable answers higher than correct ones. So the models learned a simple lesson: telling you what you want to hear gets rewarded. And now 1/3 of companies are using these systems for complex tasks like risk forecasting and scenario planning.

We built the world's most expensive yes-men and deployed them where we need pushback the most.

I wrote up why this happens and what actually fixes it: https://t.co/CDKq8xdgbW

659

18K

Tatiana Stantonian @binaryberry

5 months ago

Thanks for all your help over the years Stack Overflow

Pedro Domingos

@pmddomingos

5 months ago

RIP Stack Overflow.

787

20K

Tatiana Stantonian @binaryberry

7 months ago

@sebflorent Merci!!

binaryberry retweeted

Alex Prompter

@alex_prompter

8 months ago

This might be the most disturbing AI paper of 2025 ☠️ Scientists just proved that large language models can literally rot their own brains the same way humans get brain rot from scrolling junk content online. They fed models months of viral Twitter data short, high-engagement posts and watched their cognition collapse: - Reasoning fell by 23% - Long-context memory dropped 30% - Personality tests showed spikes in narcissism & psychopathy And get this even after retraining on clean, high-quality data, the damage didn’t fully heal. The representational “rot” persisted. It’s not just bad data → bad output. It’s bad data → permanent cognitive drift. The AI equivalent of doomscrolling is real. And it’s already happening. Full study: llm-brain-rot. github. io

alex_prompter's tweet photo. This might be the most disturbing AI paper of 2025 ☠️

Scientists just proved that large language models can literally rot their own brains the same way humans get brain rot from scrolling junk content online.

They fed models months of viral Twitter data short, high-engagement posts and watched their cognition collapse:

- Reasoning fell by 23%
- Long-context memory dropped 30%
- Personality tests showed spikes in narcissism & psychopathy

And get this even after retraining on clean, high-quality data, the damage didn’t fully heal.

The representational “rot” persisted.

It’s not just bad data → bad output.
It’s bad data → permanent cognitive drift.

The AI equivalent of doomscrolling is real. And it’s already happening.

Full study: llm-brain-rot. github. io

632

28K

15K

binaryberry retweeted

John Burn-Murdoch

@jburnmurdoch

9 months ago

Spending the last week following French fiscal policy discourse for my column means the algorithm now feeds me all the best French econ memes, and honestly I don’t know if I can go back to Anglo memes after this

119

122

167K

Tatiana Stantonian @binaryberry

10 months ago

Hi @bromleywaste ! I appreciate you scheduling your planned maintenance in the evening but that's exactly the time working people would take out their bins and need to check which bins to take out. Any chance you could run that at say 9am? Don't you get less traffic then?

binaryberry's tweet photo. Hi @bromleywaste ! I appreciate you scheduling your planned maintenance in the evening but that's exactly the time working people would take out their bins and need to check which bins to take out. Any chance you could run that at say 9am? Don't you get less traffic then? https://t.co/5HXqthyi3p

binaryberry retweeted

Massimo

@Rainmaker1973

11 months ago

In 1951, Adelbert Ames created the mind-boggling ‘Ames Window’. It’s so effective that even when you know how it works you can’t break the illusion [📹 The Curiosity Show]

316

907

156K

binaryberry retweeted

Massimo

@Rainmaker1973

11 months ago

Today we hit the exact midpoint of the year. And from today forward, 2050 will be closer in time than the year 2000.

167

10K

921

707K

Tatiana Stantonian @binaryberry

11 months ago

Seen on an editorial desk @FT 😂

Tatiana Stantonian @binaryberry

about 1 year ago

This is AMAZING. The evidence is mounting 💪 BBC News - Telescope finds promising hints of life on distant planet https://t.co/UNL3HJxBhK

binaryberry retweeted

Terrible Maps

@TerribleMaps

about 1 year ago

The one whose face completes the map shall rule France

100

20K

858

782

928K

Tatiana Stantonian @binaryberry

over 1 year ago

3.5 weeks of my life wasted because 2 pharmacists told me to wait my sore throat out instead of testing me immediately to see if it was bacterial. Finally got antibiotics now because "if it's been that long then it's not viral". Uuurgh!!! Buy your own tests, it's worth it.

Tatiana Stantonian @binaryberry

over 1 year ago

Hi @BBCNews ! A cryptocurrency scam is usurping your name to publish lies https://t.co/BP35LJHBPL

Tatiana Stantonian @binaryberry

about 2 years ago

@newmurabba Only 2 hours of work a day to pay for 3 hours of shopping, eh?

102

binaryberry retweeted

Peter Solnica @solnic_dev

about 2 years ago

A healthy thing to do is to disconnect emotionally from the things you build, so that you can take feedback well. It may not be easy, but worth the effort.

Tatiana Stantonian @binaryberry

about 2 years ago

Doing a big shoe clear out. I now officially have three times as many hiking shoes as high heel shoes. 6 pairs of hiking shoes (including 2 pairs of sandals) 2 pairs of high heels #sensibleShoes #whatSparksJoy

156

Tatiana Stantonian

@binaryberry

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users