Huge announcement! 💥
It is an honor to share that my talk on Linux Exploitation and Container Escape has been accepted at Black Hat USA.
See you in Las Vegas!
@BlackHatEvents#BHUSA
@halvarflake@halvarflake Agree and thankfully, unprivileged eBPF (kernel.unprivileged_bpf_disabled=0) has been the default-off setting since, I think, 5.16, and every major distro ships with it disabled.
Privilege Escalation via a Page Use-After-Free in Qualcomm's AI Accelerator Linux Kernel Driver
Article by Lukas Maar about exploiting a bug in the mmap handler of the QAIC driver that causes a page UAF.
https://t.co/RqMi8QuuLG
found a verifier/interpreter mismatch in the Linux BPF subsystem (CVE-2026-31525, CVSS 7.8). arbitrary kernel read/write; become root, escape containers, disable SELinux, read TLS keys out of other processes' memory.
anyway, it starts with the math bars, the absolute value.
computers store negative numbers in two's complement. the smallest 32-bit signed integer is -2,147,483,648, and the largest positive is +2,147,483,647. there is no +2,147,483,648, since it simply does not fit. so when you call abs(-2,147,483,648), the C specification thinks about it for a moment, says "undefined," and leaves the room. on x86 and arm64, what you actually get back is -2,147,483,648. you asked for the absolute value of a negative number, you got back the same negative number. thank you computer :D
the BPF interpreter implements signed 32-bit division (BPF_ALU | BPF_DIV/MOD, off == 1, added in ec0e2da95f72) by decomposing it into unsigned division: take abs() of both operands, divide via do_div(), reapply the sign. the handler in ___bpf_prog_run (kernel/bpf/core.c):
AX = abs((s32)DST);
AX = do_div(AX, abs((s32)SRC));
and look, the kernel even documents this. include/linux/math.h: "the return value is undefined when the input is the minimum value of the type." when DST = 0x80000000 (S32_MIN), abs() tries to negate it. -(-2,147,483,648) overflows s32, the C spec calls it undefined behavior, and the CPU hands back 0x80000000 unchanged. still negative. abs() had one job.
this s32 then gets assigned into AX, a u64 BPF register. s32 → u64 sign-extends: 0x80000000 becomes 0xFFFFFFFF80000000. that's 18,446,744,071,562,067,968. you wanted 2,147,483,648, you got 18.4 quintillion; a rounding error of about 18.4 quintillion. do_div() is a 64-by-32-bit unsigned division macro and it operates on this full u64 numerator. the quotient is off by a factor of 2³². the smod path has the same problem since do_div() modifies the dividend in place and returns the remainder, both wrong. 8 call sites across sdiv32/smod32 src/imm handlers, all quietly producing nonsense whenever S32_MIN shows up.
the BPF verifier is the safety system that statically analyzes every BPF program before allowing it to run. it exists specifically to guarantee that nothing bad can happen. scalar32_min_max_sdiv() in kernel/bpf/verifier.c tracks value ranges through abstract interpretation. it handles signed division correctly, including S32_MIN. computes tight, mathematically correct bounds. the interpreter, as we've established, computes whatever it feels like. so the verifier thinks register R0 is in range X. the interpreter puts value Y in R0. the safety system and the execution engine disagree about what a program does. in BPF security research, this is where you set down your coffee.
concretely: load S32_MIN into R1, load 2 into R2, execute SDIV32 R1 R2. verifier determines R1 ∈ [-1,073,741,824, -1,073,741,824]. interpreter computes do_div(0xFFFFFFFF80000000, 2) = 0x7FFFFFFFC0000000, reapplies the sign, produces a completely unrelated value. use R1 as an index into a BPF map. verifier approves the access, bounds check passes against its calculated range. interpreter uses the actual value. out-of-bounds read/write on a kernel data structure. on every Linux machine running the BPF interpreter.
the root cause of all of this: the absolute value function doesn't handle one number. one specific number, out of 4.2 billion possible inputs, and it's the one that gives you kernel read/write. the fix is:
c
static u32 abs_s32(s32 x)
{
return x >= 0 ? (u32)x : -(u32)x;
}
cast to u32 before negating. -(u32)0x80000000 = 0x80000000 unsigned. correct absolute value, no overflow, no undefined behavior. the kind of function you'd assume already exists somewhere in 30 million lines of kernel code. it did not. I got to write it. :D
I reported this, wrote the patch, got it through 5 revisions of review. acked by Yonghong Song and Mykyta Yatsenko. now patched in stable 6.6, 6.12, 6.18, 6.19. if you haven't updated your kernel: maybe do that.
@alisaesage Would you agree that exploiting Chrome is generally harder than exploiting the Linux kernel, if such a comparison can be made?
Popping calc in modern Chrome is more challenging than popping a root shell 😅
Also if you want to speed up your learning and get into this kind of role, we are running our training "Practical AI for CTI" at @BlackHatEvents USA this August 👇
https://t.co/YQaSrv34he