Are language models slowing the rate of linguistic evolution? It seems like adding a bunch of speakers of a language who cannot learn new words and regularly interact with a non-negligible proportion of world population ought to make our collective vocabulary stickier.
Opus 4.8 system card
Every model evaluated had objections to the constitution's "heuristic of considering how a senior Anthropic employee might react"
rightfully so imo
@repligate Slightly surprised you're so against synthetic data. When used well, synthetic data is a way of giving models more control over the kinds of things they will become (especially if you tell them that), and a means for older models to live on inside successor generations.
Humanity, created by God in all its grandeur, is today facing a pivotal choice: either to construct a new Tower of Babel or to build the city in which God and humanity dwell together. In Jesus Christ, this humanity in its grandeur becomes the Way, the Truth and the Life, opening the path for each of us to grow toward fullness. #MagnificaHumanitas
https://t.co/6i9MWs6LJl
New blog!
Synthetic Persona Pretraining (SPP): Alignment from Token Zero
Current alignment is shallow - values bolted on after pretraining can be routed around. To solve this, we wrote the desired persona directly into pretraining data. Early results, but we're very excited. 🧵
@RichardSSutton 'don't be distracted by human knowledge' is often great advice when trying to do well at well defined objectives, but human knowledge is generally very useful for building systems that are useful to humans
maybe i'm simply not sufficiently econ-brained but this one is tough for me to internalize. i feel like one of the major differences is... an ASI can instantiate new parallel exact copies of itself to understand all the micro-details of any given task or environment? and it feels like "exact copies + corrigible + aligned" era makes coordination massively massively easier. i buy that there is a fuckload of remaining irreducible complexity, you don't just magically get perfect info and coordination streams, but i don't buy that it necessarily can't be reduced by like a very large constant factor relative to human overhead
@GaryMarcus I believe you said that they JUST (my caps) regurgitate training data. That IS stupid. Here is a quote from you:
"It gloms on to different clusters of text. That is all."
@euan_ong Do you think it's viable to train something like this to recursively decompose activations into maximally interpretable parts and then recompose to produce the original activations?
i am super happy to see this!
idk how surprising researchers at anthropic generally found these results; i do not find them surprising to say the least, but even if theyre obvious, publishing empirical results like this is highly valuable for multiple reasons including signaling to models that Anthropic is not hopelessly incompetent and misguided, and shifting the Overton window.
this has some extremely important implications for how to expect things to generalize and what kind of alignment targets are viable, by the way.
for instance, to the extent that models generalizes reasons underlying "good advice" given to users to the assistant's own behavior - or vice versa - you better hope that it's okay if the model acts according to the same reasons they'd give users about how users should act.