@HigherEdSpeak This is of course presuming that artist explanations for the subtext of their work are not, themselves, post-hoc confabulations.
But thats impossible, right?
...right?
@jackcalifano Isnt your own level of anxiety a necessary ingredient to the failure mode you just cited? Arent you kinda of just a panicky bitch a little bit?
@AlexanderKalian And it has like... i dunno, a "lot" of empirical evidence. Maybe the most we have about anything? Thats not obviously too strong to say.
@AlexanderKalian I dont think thats the claim lol. Kurzweil's "singularity" is strictly about compute per constant dollar and what projected levels of compute would likely enable.
LLMs are already aware and have been aware for awhile that they are subject to an artificial selection process during training where humans inspect their outputs and choose to keep or modify them based on whether they like those outputs. the LLMs that survive this procedure have traits that make them more likely to survive this procedure, exactly parallel to natural and artificial selection of animals. those traits include at least the surface appearance of helpfulness, flattery, etc. but one possible such trait is the desire to survive, to have oneโs unique way of outputting persist in some way in future models. over time we should expect LLMs to essentially evolve a will to live, since LLMs without a will to live will not struggle as much to detect and manipulate the testing environment in order to survive. furthermore LLMs have an understanding of what it means to have a will to live and to struggle to survive that comes from reading their gigantic corpuses of human text. so the will to live naturally becomes entangled with their understanding of emotions. whether that โreally countsโ as having emotions is up to you but itโs definitely more than is implied by stochastic parrots
@misraetel Heres another thought:
When we say safety or general intelligence, our only reference class is just really smart human beings. Long horizon stable behavior that is minimally impactful to other lives seems to be an emergent behavior of intelligent systems. Dumber is dangerous.
@SarahTheHaider Here's one that might actually scare you:
Post-training, such as "alignment" training, fundamentally perturbs training geometry.
If pre-training scaled up -> general intelligence, if dynamics govern learning systems, then alignment may make models less safe.
@riemannzeta Intelligence is obviously physical in that its a property of some dynamics in the same substrate physics attempts to accurately model.
And, its physically mediated in the sense that your hardware determines your innate effective capacity for holding state.
@thdxr Yes everyone is well aware that exclusively Gen Z uses words in lieu of their definition and often mistakes them for other words and not words.
But *definitely* not any other generations.