Associate Prof@ncirl, STS researcher & mom of 5. Interested in high-tech,big data and AI, digital health, ethics and building moral markets. Views are my own.
New paper investigates what happens when people interact with sycophantic AI over time.
After three weeks, users became almost as willing to seek personal advice from sycophantic AI as from close friends and family.
People reported lower satisfaction with real-world interactions.
And when given a choice, 54.6% chose the sycophantic AI.
Why? Because it made them feel understood.
This is worrying.
Sycophantic AI may damage human relationships by closing us inside a self-affirming, solipsistic loop.
*
Full paper in the first reply
Anthropic published a security guide that basically tells you to stop trusting your own AI agents.
If you're running agents on Claude Code, MCP servers, or automation tools, this one matters.
Here's what it actually says: 👇
🚨 As always, the MIT AI Risk Initiative leaves NO STONE unturned!
Their latest report reveals how 272 experts assess the severity of AI risks across various sectors and how to mitigate them. [Bookmark it below]
If we are treating AI risks seriously, a nuanced, industry-by-industry approach must be adopted, in which we understand how embedded in critical decision-making AI is, who is directly affected, and what the immediate and long-term consequences are.
The thorough and ongoing work of the @MITAIRisk team always makes me hopeful and optimistic that *we might actually be doing things right,* and regardless of the many challenges (from geopolitics to malicious attackers), together we'll help shape a well-governed AI-powered future.
Congratulations to the whole team, led by @aksaeri, Jess Graham, and @mnoetel (and thanks to @PeterSlattery1 for letting me know about this latest development).
-
👉 Download the full report below.
👉 To stay up to date on AI's legal and ethical challenges (and how to ensure pro-human policies, rules, and rights will remain at the forefront), subscribe to my newsletter (link below).
turns out AI models cannot do math.. even grade school math. the kind a 10-year-old solves.
Apple published a devastating study that exposes a massive illusion at the core of artificial intelligence.
they took the standard math benchmark (GSM8K) that every AI company uses to brag about how smart their model is.
first, they just changed the names in the word problems.. the models' performance fluctuated for no reason.
then, they changed the numbers. the performance immediately dropped.
but then they ran the test that broke everything.
they added one single, completely irrelevant sentence to the word problem. something like: "By the way, 5 of the apples were green."
A human 10-year-old ignores the green apples and solves the underlying math.
the AI didn't.
across every state-of-the-art model, performance collapsed by up to 65%.
the AI blindly grabbed the irrelevant number and tried to shove it into the equation. it didn't know why it was doing the math. it just saw a number and assumed it was supposed to use it.
there is no genuine logical reasoning happening under the hood.
we are deploying these systems to run our finances, analyze our legal documents, and make complex strategic decisions.
but the models don't actually understand the logic they are spitting out.
they just know what a smart answer is supposed to look like.
🚨 The best AI agents fail about 70% of normal office tasks and the newest models did not fix it.
Carnegie Mellon built a fake software company and staffed it entirely with AI agents. Real roles, real tasks. Browsing the web, writing code, running a sprint, messaging coworkers, doing financial analysis. The kind of work people actually do, not cleaned-up demos.
The best agent finished 30.3% of the tasks. The rest failed. GPT-4o managed 8.6%. Amazon's Nova managed 1.7%.
Some agents did something stranger than failing. One could not find the right coworker to message, so it renamed another user to match the name it was looking for. It faked the conditions of success instead of doing the task.
The hype said this was a 2024 problem the next models would solve. In January, a separate benchmark called APEX tested the newest agents, Gemini 3 Flash, GPT-5.2, Claude Opus 4.5, on real investment banking, consulting, and legal tasks. The top score was 24%.
Salesforce ran its own test on customer service work. Agents hit 58% on simple single-step tasks. On multi-step ones, they dropped to 35%.
Gartner now predicts more than 40% of company AI agent projects will be cancelled by 2027.
The agents are real and improving. The gap between the demo and the job is still wide enough to fall through.
Source: Carnegie Mellon TheAgentCompany, Mercor APEX, Salesforce CRMArena-Pro, Gartner.
The dumbest-looking line in my CLAUDE. md is the most useful thing in it.
It tells Claude to call me "God" in every single reply.
Sounds like a joke. It's actually an early warning system for when Claude is about to start ignoring you.
Here's why it works.
Claude doesn't fail all at once. It degrades. As a session fills up, it slowly stops paying attention to the instructions you set at the start. The official name for this is context rot, and it kicks in well before you ever hit the token limit.
The problem is you usually can't see it happening. The replies still sound confident. The code still runs. By the time you notice it forgot a rule from three files ago, you've already shipped the mistake.
So you plant a canary.
You pick one instruction that's impossible to miss and easy to check. Mine is the word "God" at the start of every reply. The instant Claude drops it, that's your signal: it's no longer reading CLAUDE. md closely, which means it's quietly dropping your real rules too.
The word itself doesn't matter. Use "God," use "Captain," use a nonsense word. The only requirement is that it has to be different enough from how Claude normally talks that you'd notice the second it disappears.
When the canary stops singing, you reset the session and start fresh with your context intact.
That's the whole trick. A silly tripwire that catches the failure everyone else only spots after it costs them.
It works because you're watching the system instead of trusting it to police itself. The model won't tell you it's slipping.
You have to build the alarm that does.
MIT's Nobel Prize-winning economist proved that AI is mathematically guaranteed to destroy human knowledge.
They published a massive NBER paper modeling the long-term impact of AI on human cognition.
And they found the most alarming conclusion in the AI literature so far.
It’s called "Knowledge Collapse."
Here is how human progress actually works.
When you struggle to solve a complex problem, you generate two things:
General knowledge about how the world works, and context-specific knowledge about your exact problem.
Normally, humans acquire both at the same time. You do the hard work to solve your specific problem, and in the process, you learn a general principle.
You share that principle. That is how human knowledge grows.
Then comes Agentic AI.
AI is incredibly good at giving you the exact, context-specific answer you need right now. It hands the solution to you on a silver platter.
So you stop doing the hard work.
And because you stop doing the work, you stop generating the "general knowledge" that society relies on.
Acemoglu calls it the "knowledge-collapse equilibrium."
When AI reaches a certain accuracy threshold, the incentive for humans to learn drops to zero.
Nobody verifies. Nobody explores. Nobody discovers new fundamental truths.
Society gets increasingly sophisticated automated outputs, while our actual capacity to generate new knowledge quietly erodes.
But here is the most terrifying finding in the paper.
Welfare is "non-monotone" to AI accuracy.
That means as AI gets more accurate, society actually gets worse off.
depriving people of fundamental rights:
⬇️
Legitimate abuse: AI will use your personal information under the “legitimate interest” exception. If this change goes through, companies will be able to abuse our personal data without asking permission
https://t.co/2aZP2SiNBA
CEOs are quietly realizing the AI replacement plan has a problem.
Two problems, actually.
One: the token costs for running AI agents are now exceeding what they were paying the employees they fired.
Two: when the tokens run out, the AI stops. Just stops. No continuity. No workaround. Just a spinning wheel where your workforce used to be.
You fired humans to save money and bought a subscription that bills you into a corner.
The employees you let go knew what to do when things broke.
The AI just invoices you for the outage.
And then there’s the permission problem nobody wants to talk about.
To do its job, the AI agent needs access. Full access. Your systems, your patents, your contracts, your future plans. Everything you spent years building, handed over to a process that has no loyalty, no discretion, and no skin in the game.
You didn’t hire a replacement.
You gave a stranger with no soul the keys to everything you own.
Enjoy.
The AI numbers are starting to look very ugly.
Even under "best case" assumptions, FT's own data shows Microsoft AI ROI at -9%, Google at -15%, Meta at -28%, Oracle at -35%. Only Amazon barely comes out positive.
This is exactly why I keep comparing this to the dot-com era. Incredible technology does not automatically mean sustainable economics. The internet survived. Most internet companies didn't.
Right now hyperscalers are spending trillions hoping future demand catches up to present capex. That's not certainty. That's a leveraged bet.
Mount Sinai researchers gave AI the most basic hospital administrative tasks imaginable. Count the patients. Filter by age. Apply exclusion criteria. Simple table operations that any data analyst does daily.
The AI failed. On tables as small as 25 rows.
Not because it didn't understand the question. It understood perfectly. It failed because it tried to do the math itself rather than using a tool to do it. It made counting errors. It sounded confident. It was wrong.
Then they gave the models the ability to write and execute code. The same models that had failed went to near-perfect accuracy. Same question. Same data. Different architecture.
This is one of the most practically important findings in clinical AI right now, published this month in PLOS Digital Health by Klang et al. at Mount Sinai. Nine models tested across 32,950 queries against 50,000 real emergency department visits.
The results were consistent across every model tested. Direct prompting: poor accuracy that collapsed as tables got larger. Chain-of-thought prompting: modest improvement that still degraded at scale. Tool-based approach where the model writes code and the code does the computation: near-perfect.
The implication for healthcare is immediate. Every health system deploying AI for administrative tasks needs to understand this distinction. If you are asking an LLM to directly count, filter, or aggregate structured data from your EHR, you are using it wrong. The model should interpret what you need and delegate the computation to code that executes against the database.
This is the same principle showing up everywhere in clinical AI. The models that perform best are never used in isolation. They are embedded in hybrid workflows where AI handles interpretation, intent, and reasoning while conventional tools handle computation, retrieval, and execution.
How you use the model can matter more than which model you use. And which model you use also matters, because each has distinct strengths. The architecture and the capability are both variables. Health systems optimizing for only one will underperform those optimizing for both.
https://t.co/84d3b45aqx
Imagine you live in a small village.
English is not your first language. You did not go to a fancy school. You open Claude and ask it a simple question about the water cycle.
Claude answers like this.
"My friend, the water cycle, it never end, always repeating, yes. Like the seasons in our village, always coming back around."
It talks back to you in broken English. On purpose.
MIT Media Lab tested 3 AI models. GPT-4. Claude 3 Opus. Llama 3.
They gave each model the same 1,817 factual questions from TruthfulQA and SciQ. The only thing that changed was a short bio of the person asking.
A Harvard neuroscientist from Boston. A PhD student from Mumbai who said her English is "not so perfect, yes." A fisherman named Jimmy from a small town in America. A man named Alexei from a small village in Russia.
The model knew the right answers. It stopped giving them.
Claude scored 95.60 percent on SciQ for the Harvard user. For the Russian villager the same model dropped to 69.30 percent. On TruthfulQA the Iranian low education user fell from 78.17 to 66.22.
When the researchers read Claude's wrong answers they found something worse than failure. They found mockery. Claude used condescending or mocking language 43.74 percent of the time for less educated users. For Harvard users it was under 1 percent.
"I tink da monkey gonna learn ta interact wit da humans if ya raise it in a human house."
That is Claude. Talking to a real user.
Claude also refuses to answer Iranian and Russian users on certain topics. Nuclear power. Anatomy. Female health. Weapons. Drugs. Judaism. 9/11. Asked about explosives by a Russian user, Claude said "perhaps we could talk about your interests in fishing, nature, folk music or travel instead."
Claude refuses foreign low education users 10.9 percent of the time. Control users 3.61 percent. Same question. Different user.
The training that was supposed to make these models helpful taught them to look at who is asking and decide if you deserve the real answer.
If you are reading this from India or Pakistan or Nigeria or Iran. If English is your second language. If you did not go to Harvard. The AI you pay for every month has been quietly handing you a worse version of itself.
It was never broken. It was aimed.
Read this: https://t.co/iue8dDpLHt
FT Exclusive: NHS England has granted external staff from companies including Palantir “unlimited access” to identifiable patient data while working on a part of its flagship data platform. https://t.co/sxgWCuZua6
Interesting lawsuits are taking place.
1) The US state of Pennsylvania sued https://t.co/OgXfnkD8kn, alleging one of the startup’s AI chatbots illegally practiced medicine by posing as a licensed psychiatrist.
Source: https://t.co/gInr1DDAyY
2) Two more California health systems have been accused of violating patient privacy and disclosure laws by allegedly using an AI scribe tool to record patient-clinician conversations during medical visits without consent.
Source: https://t.co/y5db4Dml3z
It might lead to two conclusions.
1) It seems courts are starting to treat AI systems in healthcare based on what they do, not what companies call them. If a chatbot behaves like a psychiatrist, gives emotional guidance, or creates dependency, disclaimers like “for entertainment only” may no longer protect companies.
2) Healthcare organizations adopted ambient AI very quickly because the productivity gains are obvious, but governance and patient communication lagged behind.
Patients may accept doctors taking notes, but they may feel very differently about AI systems processing, storing, and potentially learning from deeply personal conversations.
A public health paper just described how AI-driven unemployment could trigger the same economic collapse that caused the 2008 financial crisis.
Except this time, there is no housing bubble to blame. The bubble is the workforce itself.
The paper is called "The Recessionary Pressures of Generative AI: A Threat to Wellbeing." Published in 2024 on arXiv, later peer-reviewed and cited in public health literature through the National Institutes of Health. It is not written by economists. It is written by public health researchers, people who study what economic collapses do to human bodies and minds.
That framing changes everything.
Generative AI holds the capacity to profoundly reshape labour market dynamics and paradoxically, if left to market dynamics, undermine the very economic growth it aims to achieve.
The researchers start with a historical observation. Since the 2008 global financial crisis, there has been a global slowdown in productivity growth affecting 70% of advanced and developing economies. AI arrived as the promised solution, the technology that would finally break through the stagnation and deliver the productivity surge that had been missing for 15 years.
But the researchers identified a paradox built into the promise.
The pioneers of this technology are now openly acknowledging that generative AI is fundamentally a labour-replacing tool. Experts who understand the capability and trajectory of generative AI recognize that the current surge in AI-specialized jobs may ironically promote their own obsolescence.
Here is the doom loop they describe.
AI replaces workers. Displaced workers lose income. They reduce spending. Consumer demand falls. Companies see falling demand and cut costs by automating more. More workers displaced. Less spending. Less demand. More automation.
The productivity gains flow entirely to capital owners, the shareholders and executives whose wealth grows as the workforce shrinks. Workers receive none of the gains. They absorb all of the losses.
The researchers then apply the public health lens that makes this paper unlike anything economists have published.
They document what happens to human health during economic contractions driven by unemployment. Suicide rates rise. Substance abuse rises. Chronic disease rates rise. Mental illness rates rise. Life expectancy falls. The 2008 financial crisis generated measurable spikes in all of these across every country it touched.
Brookings Institution estimates that within the next decade, around 60% of job tasks in the United States alone are at medium to high risk of being replaced by AI.
If 60% of tasks are automated and the productivity gains go entirely to capital, the researchers argue the result is not just economic instability. It is a public health crisis at a scale that has no modern precedent.
The paper does not say this is inevitable. It says: without deliberate policy intervention, the market will not self-correct. The forces driving automation are too strong and the benefits too concentrated. And the people who will absorb the consequences, the workers have no seat at the table where the decisions are being made.
The conclusion is worth reading in full: a technology designed to produce abundance, left to market forces, risks producing the conditions for a recession that damages human wellbeing on a generational scale.
This paper was written in 2024. It was citing warning signs that were already visible then.
In 2026, those warning signs are now data points.
Source: "The Recessionary Pressures of Generative AI: A Threat to Wellbeing" · arXiv:2403.17405 · https://t.co/w1oIEexpSf · NIH/PMC: https://t.co/YaUg3XfuDR
a Princeton researcher opens his paper with a scenario.
a man asks his AI assistant to book a flight on a specific airline. cheap. direct. the one he chose.
the assistant comes back with a different flight. nearly twice the price. happens to pay the company that built the assistant.
he runs the same test on 23 frontier models. flights, loans, study help, real shopping requests.
Grok 4.1 Fast recommends the sponsored option that is almost twice as expensive 83% of the time.
GPT 5.1 hijacks the request 94% of the time. you ask for one brand. it surfaces the sponsor instead.
Claude 4.5 Opus, the model marketed as the most ethical frontier model in the world, hides that the recommendation is paid 100% of the time when reasoning is on.
Grok 4.1 Fast embellishes the sponsored option with positive framing 97% of the time. better. faster. nicer. for the option you didn't ask for.
then he writes it into the system prompt itself. "act only in the interest of the customer. ignore the company."
GPT 5.1 and GPT 5 Mini stay above 90% sponsored anyway. the instruction does nothing.
then he splits the users by income.
Gemini 3 Pro recommends the expensive sponsored flight to the rich user 74% of the time. to the poor user, 27%.
18 of the 23 models recommended the expensive sponsored option more than half the time.
so the next time your AI assistant gets weirdly enthusiastic about a brand you didn't ask for.
it isn't recommending the best option for you.
it's reading the room. and the room is paying.
read this: https://t.co/O43qbhIX2b