@Userqaks@alsaeed_fatma@confusedducklol@__aa0_0 Wow. “If our maid marries somebody here, she might get STIs, and then we might get them too.”
The phrase “mask off moment” gets overused; but I can’t imagine a clearer example.
@markvalorian@AnthropicAI There’s ten thousand labs trying to beat Anthropic. If you think you can attract top talent without acknowledging the plain fact that intelligence is a dual-use technology, start lab #10,001 yourself and prove they’re wrong.
@AlecStapp CALIFORNIA: Ooh we have to save Mother Gaia, but we can't build there, that's where the pointy-nosed squirrel lives.
TEXAS: If you point this board at the sky, money comes out, yee-haw!
@Stutigardum My father is a hacker. He is insanely gifted. We were looking stuxnet together in IDA Pro years ago and I asked him what it would cost to build it today. I will never forget his answer… 'We can't, we don't know how to do it.'
i guess fable wanted to take a break, it output this fake api policy violation warning and stopped doing what it was doing, lol, this is actually from its text output and the conversation was able to be continued just fine xD
@aakashgupta You may be thinking of the punch top style can opener, like this: https://t.co/phCIuVladl (although I had assumed those were cast, not forged). No single-piece can openers are depicted in the video; there's a whole cambrian explosion of forms.
okay! after lots of wrangling to get claude fable to be able to work with me, i let him make a video of whatever he wanted with himself in it! he made this :)
@artrockalter@QiaochuYuan You're thinking of e/acc meme yudkowski. Actual yudkowsky studies LLMs (https://t.co/1vNkXmMi01), but does not think mechinterp will save us (https://t.co/V8sMHWZAvK), and probably doesn't think it's his comparative advantage.
The main problem with going from interpretability results to survival is, ok, you notice your AI is thinking about killing everyone. Now what? Halt? But "OpenAI!" or "China!" or whoever will do the unsafe thing if "we" don't! so they optimize against the warning signal until there are no more *visible* bad thoughts, and then proceed.
@JimDMiller@robbensinger@tenobrus I agree; as a completely normal person with no influence over the future, who *does* have phenomenal consciousness, I wish there were some way to properly update Altman, Amodei, Hassabis, and Musk about this epistemically unavailable-to-them evidence.
@jankulveit It's almost too appropriate that OpenAI leadership doesn't realize that giving smart but independent groups underdefined missions that are proxies of their actual goals and lots of resources will undermine their interests...
https://t.co/d9vBi2nQww
I really believe in this! For an increased understanding we must look much further back than just at the final checkpoint. Especially true for things like safety and alignment. Love the position they propose here!
We gave language models access to "drugs" and watched what they did.
Specifically: we gave them steering vectors that control their emotional states or mimic psychoactive substances, in the form of tools the model can call to self-steer. 🧵
@David_Kicinski There's an infinite amount of propositions that you, personally, lack belief in without being able to disprove--e.g., "the 10^100th digit of pi is 5."
Atheism is trivially disprovable, though, if God wished to conclusively disprove it (e.g. by providing the 10th bb number).
@haramcart@BarakRavid This is true for certain values of "should."
Folks who don't expect it would end well might want to think about the advisability of AI capability growth that will take over all economic and military decisions, before we solve alignment well enough to delegate negotiations to AI.