New paper!
Think Fast: Estimating No-CoT Task-Completion Time Horizons of Frontier AI Models
@METR_Evals showed that models' time horizons have doubled every few months. We ask: what length of tasks can models complete without any CoT?
New paper! LLM agents are becoming autonomous software engineers and could automate AI research, making it vital to monitor them for misbehavior. We can automate this monitoring with other LLMs. What information should we give to monitors to make them most effective? 🧵