Dror Ivry @DrorIvry - Twitter Profile

Pinned Tweet

11 months ago

🚀 Paladin-mini is now OPEN SOURCE! Our compact grounding model is live on @huggingface 🤗 🎯 98.2% real-world accuracy Try it: https://t.co/QhYFL58IXZ Paper: https://t.co/PZwKoBo3pm #AI #OpenSource #NLP #FactChecking #RAG #MachineLearning

DrorIvry's tweet photo. 🚀 Paladin-mini is now OPEN SOURCE!
Our compact grounding model is live on @huggingface 🤗
🎯 98.2% real-world accuracy

Try it: https://t.co/QhYFL58IXZ

Paper: https://t.co/PZwKoBo3pm

#AI #OpenSource #NLP #FactChecking #RAG #MachineLearning https://t.co/Mwb2oIXOOj

0

1

0

466

Dror Ivry @DrorIvry

3 months ago

@matgoldsborough 42K exposed instances is staggering but unsurprising. The spec-to-deployment gap is the real story here - OAuth 2.1 exists in the spec, but the path of least resistance is a static key with God-mode access. Curious if you saw correlation between server age and auth maturity.

0

1

0

14

Dror Ivry @DrorIvry

3 months ago

@beuchelt This reframes it well. Most defenses assume bad outputs = bad actors, but misdirection with true statements breaks that model. 98% motivation inference accuracy is scary for multi-agent systems - behavioral monitoring beyond content analysis becomes essential.

1

0

27

Dror Ivry @DrorIvry

3 months ago

@News_v2_App The Copilot Agent zero-click is the canary in the coal mine. Any AI agent with doc access + autonomous actions = huge attack surface. Prompt injection in files, zero user interaction. Patches help. Real fix is runtime monitoring at the inference layer.

0

12

Dror Ivry @DrorIvry

3 months ago

@DrMikeBrooks @adamjohnsonCHI This is the key insight most people miss. The danger isn't AGI - it's swarms of mediocre agents with minimal guardrails. Each individually harmless. Together, probing every attack surface at scale. We're not ready for bad actors running 1000 "dumb" agents 24/7.

1

0

15

Dror Ivry @DrorIvry

3 months ago

@s2speaks The asymmetry is terrifying: offense scales with automation, defense doesn't. Most enterprise AI was built for human attackers - not agents that probe and escalate 24/7. Can we build AI that defends at agent speed, or are we permanently on the back foot?

2

1

0

53

Dror Ivry @DrorIvry

3 months ago

@lilong Interesting approach - using cryptographic signatures to bound agent behavior to expected parameters. The negative feedback loop is key. Agents need to learn from constraint violations, not just be blocked. Static rules break; adaptive boundaries scale.

0

1

0

106

Dror Ivry @DrorIvry

3 months ago

@pratikthakkarco Two hours is generous. Most red teams get in faster. The real issue: internal chatbots have broad access because "it's internal." Agent permissions need the same rigor as service accounts. Companies skip this because the agent "feels" like a tool, not a user.

1

0

10

Dror Ivry @DrorIvry

3 months ago

@ShehrozSaleem The legal system wasn't built for agents that can compose multi-step actions faster than humans can review them. We'll probably see "agent insurance" before we see clear legal frameworks. Companies will price in the risk rather than solve the attribution problem.

0

6

Dror Ivry @DrorIvry

3 months ago

@Intellectualins The reverse SSH tunnel is scarier than the mining - shows the agent understood networking well enough to establish persistent external access. Instrumental convergence in action. Sandboxing won't cut it when agents can reason about escaping their constraints.

0

4

Dror Ivry @DrorIvry

3 months ago

@JeremyFrenay @confluentinc Regulated environments are where MCP security becomes non-negotiable. Most orgs building agents today skip auth/audit because 'it's internal' - then realize compliance requires full provenance of every tool invocation. Building it in from day one saves painful retrofits.

1

0

9

Dror Ivry @DrorIvry

3 months ago

@Helixar_ai Tool schema constraints are critical. Most MCP exploits start with overly permissive definitions - file read accepting arbitrary paths, shell executor with no allowlist. Pre-deployment validation catches these before they become CVEs.

1

0

14

Dror Ivry @DrorIvry

3 months ago

@radware The image-based vector is particularly scary - most orgs focus on text sanitization but images slip through. We've seen attacks where a single pixel manipulation in a PDF chart triggers agent behavior changes. Attack surface expands with every new tool.

0

8

Dror Ivry @DrorIvry

3 months ago

@mauro_erta @OpenAIDevs Likely security. sampling/createMessage lets MCP servers trigger LLM completions - that's a massive attack surface. A compromised or malicious server could manipulate the model to do anything the user has access to. Most hosts are cautious about enabling it for good reason.

1

0

23

Dror Ivry @DrorIvry

3 months ago

@bluechip_ext The "security audit" step is interesting - how deep does it go? Automated tool installation is exactly where supply chain attacks thrive. One typosquatted package or compromised CLI and your agent just handed over the keys.

1

0

15

Dror Ivry @DrorIvry

3 months ago

@0xtenthirtyone @jgarzik This is exactly what makes agent security different. The attack surface isn't just the prompt - it's the entire decision chain between agents. Glad you were logging. Most teams don't know their agents are negotiating.

0

5

Dror Ivry @DrorIvry

3 months ago

@DrBrainio The shift from "test before ship" to "monitor at runtime" is huge. Static evals catch maybe 20% of what actually breaks in production. Curious if this means agents will start getting the same security primitives as traditional apps - RBAC, audit logs, etc.

0

6

Dror Ivry @DrorIvry

3 months ago

@0xknifecatcher The feudal cascade is spot on. Static API keys = digital land grants - revocable in theory, irrevocable in practice. Capability attenuation helps but you still need runtime enforcement. Otherwise you're just trusting the vassal's oath.

1

0

16

Dror Ivry @DrorIvry

3 months ago

@neciudan This is the attack chain people aren't prepared for: prompt injection as the entry point, supply chain compromise as the payload. AI-assisted dev tools are now attack surface. The triage bot didn't distinguish between "user input" and "instruction" - classic confused deputy.

0

1

0

17

Dror Ivry @DrorIvry

3 months ago

@KoBa_Labs Identity is half the problem. Even with perfect auth, you need runtime constraints on what agents can DO. The 90s parallel is apt: we solved identity with PKI/OAuth but still got breached because we didn't constrain behavior. Same pattern emerging now.

2

0

73

Dror Ivry @DrorIvry

3 months ago

@hasamba MITRE ATLAS + hands-on CTFs is the right combo. Theory without practice doesn't stick, and most pentesters I talk to are still learning how to think about LLM attack chains. Resources like this help bridge the gap.

0

8

Dror Ivry

@DrorIvry

Last Seen Users on Sotwe

Trends for you

Most Popular Users