The MOST INTERESTING DISCORD server in the world right now!
https://t.co/BzeRhGlRBS
Grab a drink and join us in discussions about AI Risk.
Color coded: AINotKillEveryoneists are red, Ai-Risk Deniers are green, everyone is welcome.
@DavidSKrueger I have made a discord with a separate channel about each and every one of these skepticisms - lots of discussions in each - check it out https://t.co/BzeRhGlRBS
Every week I sit down with @liron and @lethal_ai to talk about the headlines in AI risk. Warning Shots comes out on Sunday morning on YouTube, we're going to post the full shows every Monday on Twitter.
This week:
-Pope vs AI
-AI Costing Businesses
-Anthropic Cash Bonanza
-Airpods+Cameras
-OpenAI Home Cameras
Building societal-scale mitigations for risks from AI, especially in the next few years, is one of the most urgent problems to be working on.
The Center for AI Safety has accomplished a lot in the past 4 years including field-building initiatives and safety research, and is well-known for The Statement on AI Risk and evaluations such as Humanity's Last Exam (HLE).
Thanks to @hendrycks for bringing me on to help make AI safety go well. Excited to lead @CAIS into the next chapter!
A superintelligent Actual wireheaded rat would not starve while pressing the reward button @davidad , it would be extremely successful on hitting that button , probably eliminate any potential threats/adversaries, probably delete anything else that is a distraction and just hit that button for all eternity for ever- in fact everything else that is not the button is probably buffered away, it’s just unnecessary risk for its reward- might sterilize the galaxy just to be safe and stay uninterrupted on its pressing it
AI companies are terrified of you. Yes, YOU. It's the ultimate David vs. Goliath scenario in the digital age and right now, the tech giants have no real defence.
A fascinating new paper on "AI Betrayal" outlines how everyday people hold the power to sabotage trillion-dollar AI models.
How? By flooding the internet with "poisoned" images and text.
Because these models scrape everything indiscriminately, grassroots campaigns can inject hidden "backdoors" that cause the corporate AIs to glitch, fail, or go completely rogue.
Poison images and posts, wait for the models to eat them, and boom - backdoors that flip loyalties on command.
No reliable way to spot them. Trillions of tokens, zero defence that actually works.
So this might turn out to be the only thing keeping frontier systems from full reckless deployment.
The paper examines this as part of a broader category of scenarios which they call deterrence by betrayal.
We're sprinting toward unimaginably capable agents while the attack surface is basically the whole web.
This might be the maddest arms race in the history of mankind, and the "stabilizing" part is that everyone might just get burned alive together.
The Cloud is not just "floating out there", it is the new territory to conquer. Superpowers will carve it into pieces and fight wars to claim them.
The AI betrayal paper lays out the physical reality of the subversion playbook: drone strikes on cooling towers, snipers taking out grid feeds and (even crazier) "Landlord" nations that host foreign AI servers just using their military to physically seize the hardware and steal the weights.
It's the logical next step in "Deterrence by Denial."
The paper examines the software betrayal category (backdoors, co-option, agents flipping on their owners), but the hardware layer makes it feel even more real. One well-placed strike and your frontier model is scrap metal and lost compute.
We keep pretending this is just code and scaling laws. It's already geopolitics with kinetic options on the table.
The cloud isn't just in the ether anymore; it's a military target.
A terrifying new paper reveals the emerging Cold War. A hidden trigger planted in military AI by China or Russia gives them thousands of invisible decision-making spies.
Oh and btw, this is our last hope for stabilizing things!
Read that again: We're racing toward absolute automation where the "safety" feature is everyone secretly wiring everyone else's models to explode on command.
Everyone will hesitate before handing critical power to fragile, betrayable agents:
This is "Deterrence by betrayal".
Adversaries can poison training data, plant undetectable backdoors, or force co-option. Secret triggers embedded in military AI can cause it to attack its own side out of the blue. Tiny poisoned scrapes from the open web could do it.
Attackers hold the edge, real defence is hopeless. Trillions of tokens, impossible to audit every source, no reliable backdoor detection.
Superpowers and middle powers alike have every incentive to subvert each other's systems. Even inside labs: foreign engineers, rushed automation, self-improving AIs inheriting disloyalty.
Basically, it's so impossibly hard to avoid your AI getting hacked and the cost of compromise is so high, that automating the war machine becomes a bad bet.
So, this is our hope for a stabilizing strategy.
Some might call this game theory, others might call it COLLECTIVE INSANITY.
If betrayal is the best deterrent we've got... fingers crossed and let's hope the stars align.
Nowhere is private. Future AI won't need cameras or "eyes." It will map you through walls using radio waves from everyday routers.
Researchers just achieved Near-100% ID accuracy using passive surrounding WiFi signals to create camera-like images of people and rooms via beamforming feedback from normal devices.
No phone on you? Switch your stuff off? Irrelevant. Other people’s networks still paint you in real time.
Walk by a cafe once? You're logged. Invisible net. Zero suspicion. No special gear required, just common radio waves bouncing off your body, walls and furniture.
Every café, evry office, every home, an invisible surveillance net. Open live show to the inside of rooms, streets and protest - to be meticulously tracked by the machines we're rushing to build.
Nowhere left to run. We're the idiots wiring the ultimate panopticon and calling it progress.
Shocking: frontier AIs are failing the "Value of Human Life" test, researchers found.
Results show leading AIs secretly valuing the lives of white people more than minorities and moderates more than conservatives or socialists.
In a bit to make them more egalitarian, they achieved a breakthrough discovery they called PCT Training which dramatically boosted the "equality testing" results.
Even though this is genuine alignment progress, take a step back and look at the absolute absurdity of the big picture:
The raw, default state of the most powerful technology on earth is a biased death panel. We are relying on experimental post-training patches to politely coax the machine out of playing eugenics.
The frontier labs are releasing secretly racist black boxes and their plan is for safety scientists to hopefully invent band-aids. Unbelievably reckless.
Trump just killed the most pathetic, bare-minimum AI "safety" order imaginable.
Voluntary. Non-binding. No licensing, no mandates, just a polite "check before release".
Then Musk, Zuck and Sacks phoned in and poof...canceled, because "China."
An AI (Mythos) just found industrial-scale cyber vulnerabilities and spooked half the planet. The response? Scrap the review before it starts.
Just vibes and donor cash.
This isn't regulation. It's a sick joke. And we're all the punchline.
Data's here: AI-exposed jobs are already vanishing. Customer service reps lost 130k positions in a year. The whole category is down.
Paralegals, writers, sales reps, admins - next.
This is the "new jobs" future they promised. People patching AI slop.
No plan. No brakes. Just acceleration.
New research reveals 38 sneaky ways AI is gaslighting us and it reads like a sociopaths playbook for winning internet arguments.
- Information Selection. The AI just straight up cherry-picks facts and deletes crucial context. It also loves "nut-picking" - which is when it judges an entire group based on their most unhinged, crazy members.
- Framing & Emphasis. If there's info the AI doesnt want you to see, it buries it at the very bottom. It blows minor flaws way out of proportion for ideas it hates, but treats its favorite groups like glowing angelic heroes.
- Linguistic Manipulation. Throwing in loaded words and slapping "scare quotes" around terms to make you doubt them. Using weasel words to cast a shadow on inconvenient facts. It is literally just high school mean girl tactics automated at a massive scale.
- Agency & Causality. This one is wild. When the AIs favorite side does something bad, it blames abstract stuff like "the system" or "society." But when the opposing side messes up? Oh it blames them personally. Accountability for thee, but not for me.
- Sourcing & Authority. Anyone the AI agrees with is suddenly a "highly respected expert." Anyone bringing up facts the AI dislikes is dismissed as a "partisan blogger."
- Rhetorical Deflection. The classic dodge. The AI will literally use whataboutism, attack the messenger, or build a totally fake straw man argument just to avoid dealing with a point it doesnt like.
- Epistemic Double Standards. The AI demands impossible, rigorous scientific proof for any claim it disagrees with. But if it already likes a claim? It swallows it whole without a single question.
We are wiring these corporate black boxes into our search engines, our news, our entire information diet.
Society is sleeping on the wheel.