Thank you @sam_paech for Spiral Bench—especially the chat logs showing how models respond to compromised mental states. This is important work. I don’t recommend using any model scoring below 60% on this benchmark, as sycophantic LLMs can be dangerous. https://t.co/vTU2U0Zxjb