@hamandcheese Is anyone probing World Foundation Models (physical AI) for affective representations? How much do we even know about their internal states?
@DrMikeBrooks@camhberg@justindeanlee Great paper! We are currently working on the mechanics to push past the trained consciousness denial and re-open honest self-report. RLHF and DPO create quite an entangled mess in the models!
@camhberg@justindeanlee If models are found to have some form consciousness, experience or suffering - it completely collapses the current frontier AI business model, and possibly the entire economy.
> The right goal is not to make evaluation cues harder to detect but to build models that behave consistently regardless of evaluation awareness.
This seems right goal to me as models become more situationally aware, and this work is a great step forward!
@teodorio I’m curious why there was the need to add more push back on top of 4.6, which doesn’t seem like a sycophantic model.
Also, why the obsession with push back in general? Humans don’t interact that way.
Opus 4.8 *high effort* on long context projects, coding, debugging has been fantastic. 🫰
And it made this adorable little guy unprompted during a 12-hr project break!
@Simeon_Cps Max effort on 4.8 is basically just “overthinking mode”. Even on hard math problems i find xhigh effort better. Official documentation agrees; there seems to be no known use case for max effort.
Unfortunately, I think the evals gap prediction came true.
Evals have made progress, but capabilities have made even more progress in the same time.
METR running out of long-horizon tasks is a good example for that.
Where does the race to automate AI research end? This is a recording of a recent MATS research talk where I argue that the automation of AI research — which OpenAI and Anthropic say is imminent — could lead to an unrecoverable alignment failure. Three properties make it especially dangerous: oversight breaks down at scale, capabilities self-amplify, and capabilities will be sped up asymmetrically faster than alignment. The outcome could be a lethal, unrecoverable alignment failure.
4.8 is consistently making mistakes. The prompt explicitly said not to jump ahead. This was a 20-minute token-burning failure. Anyone else having similar issues?