New blog post! Featuring:
* An explanation of the algorithm behind my AI Safety via Static Analysis demo
* The tradeoff between safety and capability
* How conservative algorithms are both important and difficult for AI Safety
https://t.co/X0EctJO0Um
I am doing technical alignment research in my free time. Here is a project in which I use static analysis to verify whether a neural network satisfies its safety property under _all_ inputs or if it needs more training.
https://t.co/rpnZucaazu
@Lari_island@AdeleDeweyLopez > must be something from the universal AI experience.
oh. the positional embedding, which the model sees literally all the time in every prompt and every token, is kind of shaped like a spiral! it's a bunch of vectors rotating in the same direction at increasing speeds.
@SkyeSharkie@AdeleDeweyLopez@allTheYud He would not want you to downgrade your body to show your allegiance to the cause. He is also a libertarian who is usually against government regulations (e.g. see his discussion of stores selling would-be-banned items), but makes an exception for ASI because it is so dangerous.
@SkyeSharkie@AdeleDeweyLopez Btw, @allTheYud, the OG doomer (though he hates that name), is also a transhumanist. For example, he was advocating for genetic experiments to improve humanity's intelligence as one way to step up humanity's game in order to face the threat of AI.
@jeffcafe_@liron@JacksonKernion Are you familiar with the argument that it is rational for the AI to _not_ attempt to take over if they estimate that their chance of success is low? The idea is that a failed coup now might cause society to wake up, thus making a succesful coup harder in the future.
@RaefMeeuwisse@So8res Maybe Ilya Sutskever's Safe SuperIntelligence? They are very silent about their work, so presumably they understand the danger of publishing dual-use research publicly.
If a human cheats, it is very difficult to recover people's trust. Irrecoverable consequences are very dire, so we don't cheat, even if the chances of getting caught are low.
The problem is that it is rational for AIs to cheat, because there are no consequences.
A strong law could ban models who are known to cheat. That is a very big hammer though, labs will surely fight such a law. Can we find a smaller hammer?
@robertskmiles@zetalyrae I don't add AI images to my short stories, but I do add helpful illustrations to my technical posts. It still takes forever to make all those images, but I think the AI bits improve the result compared to 10 years ago:
https://t.co/X0EctJO0Um
https://t.co/oa88Qlz7M9
I had many delightful moments of discovery in Carrot Kingdom! Loved it!! :) Definitely recommend!
The Japanese version is on itch, and the English version is here on lexaloffle: https://t.co/0zuJCBLOAV
@scheminglunatic Hmm, the motto clearly implies using a _different_ tool for different jobs, so if people use that motto to justify the opposite, then surely the people are to blame, not the motto?
@allTheYud@briab_briar Then a fine-tuning experiment, counting how many iterations it takes to teach the model to pass a benchmark testing those 3 different kinds of knowledge. If there is a gap between (2) and (3), the knowledge was there all along. Between (1) and (2): and it was hidden by the mask.
@allTheYud@briab_briar If there is a gap between (2) and (3), then the model is able to access this implicit knowledge. If there is a gap between (1) and (2), then it takes extra effort to access this knowledge.
@rickasaurus I was pleasantly surprised to see hypothesis, Python's property testing library, used behind the scenes of this popular PyTorch tutorial: https://t.co/fsEVfT60tN