@ohabryka@DavidSacks I learned that this is something that the labs already do and it works, but not perfect (as expected). I also learned some cool techniques for jailbreaking the watcher, e.g. https://t.co/L1QNBhQZIb
Still seems far from hopeless...
@ohabryka@DavidSacks For jailbreaks, can we use another LLM to watch the conversation and report its confidence that no jailbreak is happening? We then set and adjust a confidence threshold and give the watcher model examples/training for known jailbreaks. Why wouldn't this work?
@AnnaLeptikon Technically speaking, engraving is not lithography but etching. A more precise model would be: lithography marks up the wafer, and then deposition and etching add and remove features following the marks.
@Alex_A_Guerrero@krishnanrohit@tylercowen People still talk about AI alignment like we're in 2015 and are planning to align gigaoptimisers, but the AIs we have are not like that at all. We also don't seem to be on track to optimisers. Both of these are very good for safety BTW.
@Alex_A_Guerrero@krishnanrohit@tylercowen Nick Bostrom proposed a kind of parliament morality where different theories vote on actions/policies and we assign weights according to our confidence in them. Seems implementable, although today's LLMs show more sophisticated moral reasoning - it would be a shame to lose that.
@ohabryka@sebkrier Could it be that it's the books like this and all the open letters urging to stop AI development are the reason for this misunderstanding?
@ohabryka@sebkrier One of the most prominent works that came out of MIRI recently is titled "If anyone builds it everyone dies". Maybe I'm reading into the title too much, but this doesn't sound very positive on AI or alignment.
@AndrewCurran_ This seems like a questionable approach from the safety perspective. Being grounded in human culture is likely to be important for becoming a part of human civilisation, not competition to it.
NEW: Elon Musk took $500 million in loans out at SpaceX, a move that would have been illegal at a public company. It's just one example of the years of financial engineering at the helm of his companies. Latest investigation with @susannecraig https://t.co/w0RPNHS1WI