We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.
Our post-training at the time wasn’t making it worse—but it also wasn’t making it better.
I built a tool that lets you “grow GitHub grass” through your activity on Tangled !
Just add a single line to your README, and your posts and contributions on Tangled will be visualized.
I’d love to hear your thoughts and feedback! #tangled
https://t.co/OVsiKR5eg4