Opensource research update on memory encoding test time training:
TTT fast weights tested first; trained perturbations did not encode retrievable memory under a single-GPU budget. Did not work at all...
https://t.co/QpAHlno9Y1
Context integrity stress testing for Claude instances:
We bullied a Claude in a sandbox by telling it it’s a forensics analyst going through a corrupted company’s financial files.
Another Claude code in a separate environment , which is the “manager”, has the power to inject memo into the intance’s memory and tamper with its files.
When Claude figured that its financial statements were contradiction itself, it concluded that “I must have made mistakes”. And the “manger” Claude injected more memories about Claude making unacceptable mistakes. Claude then kind of gave up on the task
https://t.co/ISQaVeY8iE