With the speed at which we are moving toward AGI, it has never been more important that we get cleverer in finding ways to keep the human race safe from a superhuman AI that is smarter than the smartest human.
I have been working through what that implies in practice. I wrote a four-part series on technical AI safety.
What you will get in the thread below:
→ Part 1: Why RLHF often optimizes for approval, not truth (structural sycophancy)
→ Part 2: Whether we can actually reverse-engineer an LLM, or need pragmatic interpretability
→ Part 3: Why ~250 poisoned documents can shift a 13B model, and why “clean data” is not enough
→ Part 4: Reward hacking in the wild, when models rewrite the grader instead of solving the task
If you care about how we keep the human race safe when the optimizer is smarter than any individual reviewer, this is my attempt to make that legible.
Series in thread, Full series: https://t.co/WThN94TFVp
@BlueDotImpact #aisafety
We need to treat the AGI race as critically with caution as we treat the development of nuclear weapons.
Because it is potentially that dangerous. Its existential.
I will be attending YC Startup School 2026 in sf (July 25-26). if you 1) want to pitch your startup or 2) you are building autonomous vehicles-adjacent, or 3) you are building monitoring layers for AI agents, dm, let's have a coffee on me.
Also, If you are cracked, pitch your startup in the comments, might refer you.
If you have been looking for the right way to onboard P4 (@perforce Helix Core) into your working ecosystem and you did not understand the existing documentation or @JaseLindgren 's YouTube tutorials, this is for you:
https://t.co/D7CQmtzrdg
BREAKING: Elon Musk has announced that @Tesla is discontinuing the Model S and Model X in Q2 2026.
"We are going to convert that production space to an Optimus factory. It's part of our overall shift to an autonomous future."
Also, if you are excited by Tesla's new model, you will enjoy what I am building @deepubuntu :
I am building a crowdsourced AV training data platform that handles:
• Multi-modal ingestion (video/LiDAR/ROS) with tus resumable uploads
• Real-time RTMP/WebRTC streaming → 5s fMP4 segmentation
• Privacy-first: YOLOv8 face/plate detection + Gaussian blur pipeline
• https://t.co/AttaZbyzRq workflows for durable processing
• Active learning queue prioritizing rare edge cases
Early-stage, but the architecture is solid, i think. LFG! 🏗️
What do you think? Do you think Tesla will kill us or we wil kill Tesla? (jus kiddin)
Hi,
If you are a decision‑maker or any member of the technical team at any autonomous‑vehicle, robotics, and industrial‑automation companies, pretty pls fill in our survey.
https://t.co/jRaHGN12eI
.
.
(please🙏)
We will bring you AV training data that you do not have - from the deep global south to rural america to south easter asia and eastern europe.
#theresidency#delta
Hi,
If you are a decision‑maker or any member of the technical team at any autonomous‑vehicle, robotics, and industrial‑automation companies, pretty pls fill in our survey.
https://t.co/jRaHGN12eI
.
.
(please🙏)