1/I built an open-source prompt injection detector. DeBERTa-v3-base trained on a curated dataset of 54K samples.
96.1% detection rate at 0.1% false positive rate — same architecture as ProtectAI v2, but +17pp from better training data.
4/It's not a silver bullet. Jailbreak detection is the weakest at 93.5%. English-dominant. Single-turn only. And at real-world attack rates (<0.1%), even 0.1% FPR means false alarms outnumber real detections.
It's Layer 1 in a defense stack — fast pre-filter, not the final word.
NeRD and Neural-PIL enable decompositions into shape, BRDF, and lighting. However, they require camera poses. Common in-the-wild collections are not only in multiple illuminations but also locations. Here, COLMAP fails. SAMURAI aims to fix this: https://t.co/gAi1dZBs8M
🧵
🔴 Светът има проблем - руският паранаичният диктатор #Путин започна война срещу #Украйна. Заплашва и всички по #света, които му се противопоставят.
Не, не е сценарий на евтин екшън.
My non-inclusive tweet for the year: I'm looking forward to the time when all of the new year's resolution gym members realize a certain amount of commitment is needed and most stop going. It's far too crowded during the first few weeks of a new year.
Excited to share our #emnlp2018 work: SOTA on many NLP tasks where performances already feel saturated! ideas: models self-train on unlabeled data by refraining access from full representations of text sequences + multitask-traing. by @clark_kev, @chrmanning, @quocleix & I.
@BSidesZurich Many thanks for organizing, I had a great time at my first BSides Zurich. Great job on selecting a really nice mix of interesting speakers who were able to cover a wide spectrum of topics!
Transfer learning with language models is getting hot! 🔥New state-of-the-art results today by two different research groups: Trinh and Le (Google) on the Winograd challenge and Radford et al. (OpenAI) on a diverse range of tasks.
https://t.co/UKwmyslQbf
https://t.co/Xhh01WZkzy