This paper does a fantastic job conveying that:
(1) Deep learning abounds in miraculous empirical regularities
(2) A beautiful scientific theory has emerged over the past decade to explain the miracles
(3) Yet most fundamental questions remain mysteries. The best is yet to come.
1/ Deep learning is going to have a scientific theory. We can see the pieces starting to come together, and it's looking a lot like physics!
We're releasing a paper pulling together these emerging threads and giving them a name: learning mechanics.
🔨 https://t.co/92nSIHameW 🔧
@zicokolter At CAISI we started using the phrase "agent hijacking" for prompt injections of agents because it avoids the inevitable confusion about the prompt injection vs jailbreak distinction (not to even mention direct vs indirect), and conveys impact more directly for a lay audience.
@zicokolter Yep agreed it's all the same underlying vulnerability; instruction hierarchy-style distinctions (app developer / user / external content) are "just" an abstraction. (I was also involved with the new paper, btw)
@zicokolter Post where Simon Willison coined prompt injection: https://t.co/PRiBfAdE1V. Paper where Greshake et al. coined indirect prompt injection: https://t.co/W2hfu2TYfh
@zicokolter Fwiw, my understanding is that the original coinage of prompt injection was focused on contexts where the untrusted data comes from an untrusted user. Then Greshake et al. coined IPI to highlight the case where the attacker leverages data likely to be retrieved at inference time.
The future of AI is agentic, and America is leading the way to make it secure and interoperable.
A new AI Agent Standards Initiative is launching this week @NIST to drive industry-led standards and open protocols that build trust and advance innovation. https://t.co/bS5oqvU8iu
Excited to share @NIST+CAISI’s initial public draft on how to run and report results of automated evals.
If you have opinions on evals, we’d love your feedback — help us improve the AI evals ecosystem!
Public comments accepted through March 31st via [email protected].
more in🧵
CAISI is hiring for a bunch of exciting new roles, from partnerships to technical experts in AI x bio / chem and more.
They're serious about bringing in strong researchers & engineers and letting them do good work.
Based in DC or SF:
https://t.co/GsooeO3IxK
My Agent Security team is hiring Research Engineers & Scientists. Other teams are hiring people with strong technical backgrounds too: Frontier Assessment, Cyber, Chem/Bio, Applied Systems, and Partnerships. Job postings are listed here: https://t.co/hG2mmUMiUH
People sometimes ask me how to leverage a technical background to jump into U.S. AI policy. As of this week my answer is straightforward: apply to join us at CAISI! We're a startup within government, and we're doing a hiring surge.
At CAISI, we're the U.S. government's leading experts on agent security. We published this RFI so deployers, developers, and experts can provide insights that inform our research and NIST guidelines development. Responses due March 9th!
CAISI has published an RFI about securing AI agents. It seeks insights from AI agent deployers, developers, and computer security researchers. Questions address the current threat landscape, mitigations, measurements, and other security considerations unique to AI agents.
CAISI is recruiting an intern to support an agent security standards project. Position closes Jan. 15 for a February start. Please help spread the word. Details in thread:
@boazbaraktcs Since I organized this by model family branding (GPT) rather than developer (OpenAI), I think the move would be to add a separate o-series line. And don't get me started about Sonnet vs Opus