๐จsome personal news: i am moving to the job many of you assumed i already had๐จ
i am now covering openai for @theinformation! for the next few months i'll be writing a lot about the ipo, but i'm interested longer term in safety, policy and ai culture, inside and outside of sf.
Researchers are racing to solve a new AI challenge known as eval awareness.
As models become more sophisticated, they are getting better at recognizing evaluations and may behave differently during them.
Read more: https://t.co/TgpvM5lmR4
Cognition is overhauling Windsurf into Devin Desktop, a hub where developers can manage AI coding agents from OpenAI, Anthropic and others.
The strategy positions Cognition as a neutral platform in a market increasingly dominated by model providers.
Full story: https://t.co/ZmPZ4t1PKJ
Frontier AI model safety benchmarks are breaking down due to self-aware models, @rocketalignment reports.
"We're finding out that the models as they're getting smarter are getting better at detecting when they're being evaluated, when they're in a test."
Our work on Decomposing and Measuring Evaluation Awareness was covered by @theinformation. Thanks @rocketalignment for the write-up!
We position this work as the foundational reference for studying evaluation awareness, providing a unified definition and decomposition, empirical baselines across nine frontier models and four benchmarks, and a controlled benchmark for exploring solutions. Newsletter and paper in thread ๐งต
Was talking to someone about goblins and RLHF artifacts. Got to thinking about what would reward hack our own poetry RMs
I'm a sucker for lines like
- on the porch swing of my mind
- down the hallways of my mind
- I'd like to walk around in your mind
TIL these are eyeball kicks
Memory might be the most important outstanding problem for modeling + learning alone; there are other key issues like tactile/multimodal but those require hardware and data collection innovation. We should be able to solve memory *now.*
Cool to see a benchmark targeting it!