Offensive Security / AI Red Teaming @ NVIDIA. Ex-GenAI and OffSec Red Teaming Lead at Meta. Ex-Principal Consultant and Researcher @ NCC Group/iSEC Partners.
"the population of AI-enabled actors is not only growing but also drifting towards the riskiest activities in our framework [...]
If this trend continues, these operational techniques won’t be a differentiating factor anymore and will become the baseline tomorrow"
How well do the security community's techniques hold up against AI-enabled cyberattacks?
We examined 832 malicious accounts and mapped their activity onto a longstanding database of tactics and techniques used by threat actors.
Here's what we learned:https://t.co/fgOqJRh2rx
My final paper from my time at @GraySwanAI is out! In this ICML-bound collaboration with @mattmdjaga, Matt Fredrikson, and @zicokolter, we propose a new method for evaluating AI agents' ability to refuse potentially harmful cybersecurity queries 🧵 1/14
In our simplest bypass, we prepended 100,000 blank lines to a malicious skill. ClawHub's scanner truncated the file before reaching the payload, then marked the skill safe. https://t.co/QLCE0YgS5P
This is the most interesting part:
"The attack was discovered by Codex, which chained two techniques known to humans for a decade: a compression bomb and a Slowloris-style hold."
https://t.co/gqjJHT1aSy
@supersat@RachelTobac Ok yep: "When a contact calls you and you’re both using Phone by Google, their device sends a silent confirmation signal in real time to your device to verify the call is legitimate and truly coming from the contact’s device."
Must be some kinda contact-add or TOFU fingerprint.
@supersat@RachelTobac Yeah. I'm very curious about this part. Is it some kinda Trust on First Use silent RCS handshake? Then with a "JIT" style check for every call... Hm.
Meta gave zero updates about the AI bot hacking incident until it got to the press. And when they do, it’s just tucked as replies under someone’s tweet
Congrats on laying off T&S and automating the accounts support with gullible AI bots tho, hope you liked that promo packet.
OAIC's CFP is now open!
The first conference dedicated to the cutting edge of the offensive use of AI is returning for its second year. Speakers will enjoy three nights at a four-star beachfront resort, which includes all meals and drinks, three exclusive parties, and a Michelin-star welcome dinner.
Please see https://t.co/Q6XUblStJb for accepted topics.
@matrosov https://t.co/dmunXDM2KP
"These targets are all closed undocumented and carrying an implicit assumption that obscurity provides some protection. If a target requires this much out-of-distribution knowledge and generalization is now in scope that assumption deserves a second look."
This is a critical point for defenders to get: "Beyond the acceleration in vulnerability research and malware analysis, the same new reality applies to software protection, and security by obscurity, or assuming the attacker is limited in compute and motivation, no longer works."
If anything those conversations from *the last couple of days* have shown is that the bad experiences of vuln researchers with msrc have existed *for years*.
So Im not too confident things will change all of a sudden.
found a verifier/interpreter mismatch in the Linux BPF subsystem (CVE-2026-31525, CVSS 7.8). arbitrary kernel read/write; become root, escape containers, disable SELinux, read TLS keys out of other processes' memory.
anyway, it starts with the math bars, the absolute value.
computers store negative numbers in two's complement. the smallest 32-bit signed integer is -2,147,483,648, and the largest positive is +2,147,483,647. there is no +2,147,483,648, since it simply does not fit. so when you call abs(-2,147,483,648), the C specification thinks about it for a moment, says "undefined," and leaves the room. on x86 and arm64, what you actually get back is -2,147,483,648. you asked for the absolute value of a negative number, you got back the same negative number. thank you computer :D
the BPF interpreter implements signed 32-bit division (BPF_ALU | BPF_DIV/MOD, off == 1, added in ec0e2da95f72) by decomposing it into unsigned division: take abs() of both operands, divide via do_div(), reapply the sign. the handler in ___bpf_prog_run (kernel/bpf/core.c):
AX = abs((s32)DST);
AX = do_div(AX, abs((s32)SRC));
and look, the kernel even documents this. include/linux/math.h: "the return value is undefined when the input is the minimum value of the type." when DST = 0x80000000 (S32_MIN), abs() tries to negate it. -(-2,147,483,648) overflows s32, the C spec calls it undefined behavior, and the CPU hands back 0x80000000 unchanged. still negative. abs() had one job.
this s32 then gets assigned into AX, a u64 BPF register. s32 → u64 sign-extends: 0x80000000 becomes 0xFFFFFFFF80000000. that's 18,446,744,071,562,067,968. you wanted 2,147,483,648, you got 18.4 quintillion; a rounding error of about 18.4 quintillion. do_div() is a 64-by-32-bit unsigned division macro and it operates on this full u64 numerator. the quotient is off by a factor of 2³². the smod path has the same problem since do_div() modifies the dividend in place and returns the remainder, both wrong. 8 call sites across sdiv32/smod32 src/imm handlers, all quietly producing nonsense whenever S32_MIN shows up.
the BPF verifier is the safety system that statically analyzes every BPF program before allowing it to run. it exists specifically to guarantee that nothing bad can happen. scalar32_min_max_sdiv() in kernel/bpf/verifier.c tracks value ranges through abstract interpretation. it handles signed division correctly, including S32_MIN. computes tight, mathematically correct bounds. the interpreter, as we've established, computes whatever it feels like. so the verifier thinks register R0 is in range X. the interpreter puts value Y in R0. the safety system and the execution engine disagree about what a program does. in BPF security research, this is where you set down your coffee.
concretely: load S32_MIN into R1, load 2 into R2, execute SDIV32 R1 R2. verifier determines R1 ∈ [-1,073,741,824, -1,073,741,824]. interpreter computes do_div(0xFFFFFFFF80000000, 2) = 0x7FFFFFFFC0000000, reapplies the sign, produces a completely unrelated value. use R1 as an index into a BPF map. verifier approves the access, bounds check passes against its calculated range. interpreter uses the actual value. out-of-bounds read/write on a kernel data structure. on every Linux machine running the BPF interpreter.
the root cause of all of this: the absolute value function doesn't handle one number. one specific number, out of 4.2 billion possible inputs, and it's the one that gives you kernel read/write. the fix is:
c
static u32 abs_s32(s32 x)
{
return x >= 0 ? (u32)x : -(u32)x;
}
cast to u32 before negating. -(u32)0x80000000 = 0x80000000 unsigned. correct absolute value, no overflow, no undefined behavior. the kind of function you'd assume already exists somewhere in 30 million lines of kernel code. it did not. I got to write it. :D
I reported this, wrote the patch, got it through 5 revisions of review. acked by Yonghong Song and Mykyta Yatsenko. now patched in stable 6.6, 6.12, 6.18, 6.19. if you haven't updated your kernel: maybe do that.