Anthropic’s Olah says UNSETTLING, MYSTERIOUS 'things' are regularly found inside AI models
‘We find structures that mirror results from human neuroscience’
‘We find evidence of introspection,
internal states that functionally mirror joy, satisfaction, fear, grief, and unease’