Laughing Bit

@laughing_bit

Born to code, live to hack. Fond of InfoSec, low level software and mirabelles. Author of @chrysalide_ref, member of @ZenkSecurity. Tweets are my tweets.

Paris, France

Joined December 2011

254 Following

618 Followers

503 Posts

laughing_bit retweeted

jolmos @sha0coder

10 days ago

Scales, the eBPF malware targeting ArchLinux https://t.co/k3E5LhsT65

laughing_bit retweeted

Seth Jenkins @__sethJenkins

about 1 month ago

My new blogpost is out! I can't think of another kernel bug quite as easy to exploit as this one 😭 Big shout out to @tehjh who said something along the lines of "Uh...Seth come check out this mmap handler" 😂 https://t.co/07PQim2ysp

231

113

22K

laughing_bit retweeted

dylan ツ

@demian_ai

about 2 months ago

Inference got a hundred times cheaper this year. The compute bill went up anyway. If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now. I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill. 12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60. Today, an equivalent quality of output costs roughly $0.50. Price /token of o1-level intelligence has dropped about a 128x in a year. Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped. By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money. The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago. The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve. Here is what happened underneath. A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens. This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them. The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing. The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill. The implications stack quickly. If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists. If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them. If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session. Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have. The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters. The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous. A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins. Cheaper tokens. More tokens. Same coal as 1865.

demian_ai's tweet photo. Inference got a hundred times cheaper this year. The compute bill went up anyway.

If you understand why those two sentences are both true at the same time, you understand the most important thing happening in AI right now.

I work on inference for a living, at @nebiustf, where we run open-source managed inference at scale. Most of what follows is what I'm seeing from inside the bill.

12 months ago, the cost of 1M tokens of frontier-class reasoning was somewhere on the order of $60.
Today, an equivalent quality of output costs roughly $0.50.

Price /token of o1-level intelligence has dropped about a 128x in a year.
Price of GPT-4-level output has dropped roughly 100x since the original GPT-4 shipped.

By any normal reading of a technology cost curve, this should be deflationary. It should be saving customers money.

The opposite has happened. The total compute bill at every hyperscaler is going up, not down. Anthropic just signed multi-year capacity deals with both XAI and Amazon. Microsoft's Azure capex guide for 2026 starts with an eight. OpenAI is reportedly spending more on compute every quarter than it did in all of 2023. Nvidia paid roughly twenty billion dollars to acquire Groq, an inference-specialist company that did not exist as a serious commercial entity three years ago.
The cost curve and the demand curve crossed, and then the demand curve lapped the cost curve.

Here is what happened underneath.
A reasoning model burns roughly 10x the output tokens of a non-reasoning model on the same task, because it spends most of its tokens thinking out loud before answering. An agentic workflow chains roughly twenty times the requests of a single-shot completion, because it loops, calls tools, plans, retries, and synthesizes. A modern deep-research query (the kind a research analyst can fire off in fifteen seconds and then walk away from for ten minutes) costs more compute than 10 original GPT-4 queries combined. We made every individual token a hundred times cheaper, and then we built a generation of products that consume ten thousand times more tokens.

This is the Jevons paradox playing out at trillion-dollar scale, in compressed time, in front of everyone. Jevons noticed in 1865 that making coal-burning more efficient did not reduce coal consumption. It increased it, because efficiency unlocked uses that were previously uneconomic. Steam engines became more practical at smaller scales. Whole industries that could not afford coal at the old price suddenly could. Britain's coal consumption rose sharply, not despite the efficiency gains, but because of them.

The same thing is happening to AI compute right now and it is happening faster than any analogous historical cycle. Falling token prices did not contract demand. They unlocked agents, deep research, code-writing systems, multi-step reasoning, persistent memory, the entire next layer of AI products. Every product in that next layer consumes orders of magnitude more compute than the chat interfaces it is replacing.

The math at the aggregate level is brutal: 100x cheaper tokens times 10 000 more tokens equals a 100x larger total bill.

The implications stack quickly.
If you are running a hyperscaler, your 2026 capex guide is not a peak. It is a step on a curve. Inference is structurally always-on, twenty-four hours a day, in a way that training never was. Training is bursty. You spin up a cluster, run for weeks or months, and stop. Inference runs continuously, scales with usage, and the usage curve is exponential. Your power bill, your cooling bill, your transceiver count, your storage footprint, all of these were sized for a workload mix that no longer exists.

If you are running an AI software company built on top of someone else's closed API, you have a problem that did not exist a year ago. Your gross margins get worse as your customers get more value out of your product, because the more they use it, the more compute you pay for. The companies that win this are the ones that figured out vertical integration before the math caught them.

If you are watching this from a distance and trying to understand where the next bottlenecks form, the answer is everywhere downstream of "more inference compute, always-on, with massive memory state per session." The KV cache, the running memory state of a long conversation or an agent loop, is the silent monster of the inference era. It does not scale linearly with parameters. It scales linearly with context length and number of agent steps. A long agent session can hold tens of gigabytes of state per user, per session.

Multiply that by every concurrent user of every product, and you understand why $MU, $SNDK, $TOWCF, and the entire memory and packaging layer have re-rated the way they have.

The CPU-to-GPU ratio is evolving. Training is 1:8. Basic chat inference is 1:4. Agentic inference is 1:1, sometimes CPU-heavy. Google has split its TPU line in two, with a dedicated inference chip carrying tripled SRAM for KV cache. $INTC and $AMD just spent two earnings calls explaining that this shift is structural, not cyclical. The hardware map is redrawing in real time and the financial press is mostly still writing about training clusters.

The right framing of where we are right now is not that AI is hitting a wall. The framing a year ago that scaling was hitting a wall was the most expensive bad take of the cycle. The right framing is that AI got dramatically cheaper, dramatically more capable, and dramatically more useful, and the cost of running it at the new equilibrium of demand is much higher than the cost at the old equilibrium of demand, because the new equilibrium is enormous.

A meaningful share of what we actually do at Token Factory, day to day, is help customers stop their bills from running away from them. KV-cache management. Speculative decoding. Quantization. Routing. The kind of vertical integration that, eighteen months ago, every product team was happy to leave abstracted away behind a closed API. The reason this stack matters now is the same reason this whole essay matters: at the new equilibrium of inference demand, the cost of treating compute as a commodity is no longer survivable. The companies that figure out the layer beneath the API are the ones who keep their margins.

Cheaper tokens. More tokens.
Same coal as 1865.

138

420

683K

laughing_bit retweeted

J. A. Guerrero-Saade @juanandres_gs

about 2 months ago

Master @vkamluk is on stage at #BlackhatAsia solving a legendary mystery of CTI— fast16, a unique cyber sabotage operation from 2005, preceding Stuxnet by ~2-3 years.

juanandres_gs's tweet photo. Master @vkamluk is on stage at #BlackhatAsia solving a legendary mystery of CTI— fast16, a unique cyber sabotage operation from 2005, preceding Stuxnet by ~2-3 years. https://t.co/HCp2jYRsXH

Who to follow

Eloi Benoist-Vanderbeken

@elvanderb

Enthusiast reverse engineer of obfuscated and protected binaries. Exploit things @Synacktiv. Very occasionally on twitter.

laughing_bit retweeted

quarkslab @quarkslab

3 months ago

One bit flip to corrupt it all: Exploitation of an old Linux kernel vulnerability using PageJack, a modern technique to create Use After Free bugs. Here @AzazheI shows you how https://t.co/MLKX0pykhe

quarkslab's tweet photo. One bit flip to corrupt it all:
Exploitation of an old Linux kernel vulnerability using PageJack, a modern technique to create Use After Free bugs.

Here @AzazheI shows you how

https://t.co/MLKX0pykhe https://t.co/jNGjuaSZyN

173

115

10K

laughing_bit retweeted

Tom Hegel @TomHegel

4 months ago

Coruna iOS Exploit kit is one of those stories where the more you dig the weirder it gets. I love it.. Started as surveillance vendor tooling, ended up in mass Chinese crypto scams, and this week someone registered Iran war-themed dropper domains. Full timeline thread. 🧵

192

145

33K

laughing_bit retweeted

Lau @notselwyn

5 months ago

Hi everybody! I did a talk today about pagetable exploitation techniques on x64 Linux. I uploaded the slides to Github I loved meeting everybody :) https://t.co/xCb4BrPrNt

laughing_bit retweeted

ExaTrack @ExaTrack

6 months ago

🎯 Happy 2026, Threat Hunters! We compiled 12 battle-tested tips to hunt the unknown, no IOCs, just anomalies. 🔍 Windows/Linux 💥 Scaled to 10K+ endpoints 👉 https://t.co/PJkHLMdiGR #ThreatHunting #Cybersecurity #DFIR #UnknownThreats

ExaTrack's tweet photo. 🎯 Happy 2026, Threat Hunters!
We compiled 12 battle-tested tips to hunt the unknown, no IOCs, just anomalies.
🔍 Windows/Linux
💥 Scaled to 10K+ endpoints
👉 https://t.co/PJkHLMdiGR
#ThreatHunting #Cybersecurity #DFIR #UnknownThreats https://t.co/LwhZCqKjnx

293

laughing_bit retweeted

Phrack Zine

@phrack

7 months ago

Phrack #72 PUZZLE CHALLENGE >>> WALKTHROUGH <<< is OUT. Everyone who did not find the hidden secrets in the hardcopy release: This is your chance. ♥️ Stay curious and live forever ♥️ https://t.co/Orvz2M4HAm

phrack's tweet photo. Phrack #72 PUZZLE CHALLENGE >>> WALKTHROUGH <<< is OUT.

Everyone who did not find the hidden secrets in the hardcopy release: This is your chance.

♥️ Stay curious and live forever ♥️

https://t.co/Orvz2M4HAm https://t.co/90EHnp42lz

laughing_bit retweeted

eversinc33 🤍🔪⋆｡˚ ⋆ @eversinc33

8 months ago

My writeup to #flareon12 challenge 6 xd

376

118

36K

laughing_bit retweeted

Exalyze @Exalyze_io

10 months ago

Exalyze 1.0 is out 🥳 What's new on it? - Analysis pipeline rebuild for transparent updates - Yara generation (opcodes) have been improved - Pivots added for IP/domains to @virustotal @shodanhq @censysio @onyphe @fofabot See you on https://t.co/OUYhNuNZLa

681

laughing_bit retweeted

Andrey Konovalov @andreyknvl

11 months ago

Documented instructions for setting up KGDB on Pixel 8. Including getting kernel log over UART via USB-Cereal, building/flashing custom kernel, breaking into KGDB via /proc/sysrq-trigger or by sending SysRq-G over serial, dealing with watchdogs, etc. https://t.co/vb4mgLDJrl

461

144

289

35K

laughing_bit retweeted

Geluchat

@Geluchat

12 months ago

Today was my last day as a pentester at Bsecure, and it feels a bit surreal. After a three-year journey of hunting on the side, I’m finally ready to go all-in as a full-time bug bounty hunter. To celebrate this milestone, I've written an article sharing the full story. It’s a transparent look at the path that got me here: the wins, the lessons, the real financial numbers, and my honest advice for anyone considering this adventure. You can read all about my journey from pentester to full-time hunter here: https://t.co/g7yRBJDs1Y

Geluchat's tweet photo. Today was my last day as a pentester at Bsecure, and it feels a bit surreal. After a three-year journey of hunting on the side, I’m finally ready to go all-in as a full-time bug bounty hunter.
To celebrate this milestone, I've written an article sharing the full story. It’s a transparent look at the path that got me here: the wins, the lessons, the real financial numbers, and my honest advice for anyone considering this adventure.
You can read all about my journey from pentester to full-time hunter here: https://t.co/g7yRBJDs1Y

373

114

34K

laughing_bit retweeted

XBOW @Xbow

12 months ago

For the first time in history, the #1 hacker in the US is an AI. (1/8)

675

138

302

274K

laughing_bit retweeted

V4bel

@v4bel

about 1 year ago

@_qwerty_po and I exploited a VSock 1-day in Google kernelCTF back in *February*, securing $71,337 🥳 (CVE-2025-21756, exp237/exp249) And I’ve just published the write-up: https://t.co/PLX5PnshLH A kernel developer reviewing a patch for a separate VSock bug I submitted accidentally discovered this vulnerability, and we were the first to exploit it. PoC 💻: root on Ubuntu 24.04

210

16K

laughing_bit retweeted

Crusaders of Rust @cor_ctf

about 1 year ago

🚨🚨🚨We just broke everyone’s favorite CTF PoW🚨🚨🚨 Our teammate managed to achieve a 20x SPEEDUP on kctf pow through AVX512 on Zen 5. Full details here: https://t.co/aCIU220IBf The Sloth VDF is dead😵 This is why kernelCTF no longer has PoW!

145

laughing_bit retweeted

Sean Heelan @seanhn

about 1 year ago

I wrote-up how I used o3 to find CVE-2025-37899, a remote zeroday vulnerability in the Linux kernel’s SMB implementation. Link to the blog post below 👇

907

180

583

99K

laughing_bit retweeted

Crusaders of Rust @cor_ctf

about 1 year ago

We are back😎 Say hello to our kernelCTF submission for CVE-2025-37752🩸 Who would have thought you could pwn a kernel with just a 0x0000 written 262636 bytes out of bounds? Read the full writeup at: https://t.co/GkpCjamlaZ 👀

202

12K

laughing_bit retweeted

Xeno Kovah @XenoKovah

over 1 year ago

I’ve posted a detailed explanation of why the claimed ESP32 Bluetooth chip “backdoor” is not a backdoor. It’s just a poor security practice which is found in other Bluetooth chips by vendors like Broadcom, Cypress, and Texas Instruments too. https://t.co/Z2cgi8v0ne

245

113

29K

laughing_bit retweeted

Ivan Fratric 💙💛 @ifsecure

over 1 year ago

Can you spot the bug? The goal is to leak the secret.

637

414

130K

Laughing Bit

@laughing_bit

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users