Elie Bursztein

@elie

AI Cybersecurity @Google & @DeepMind. Help advance AI cybersecurity capabilities and make AI safe & secure for all. @EtteillaOrg Art Foundation founder.

Mountain View

Joined July 2009

133 Following

60.8K Followers

3.9K Posts

Elie Bursztein

@elie

27 days ago

[Weekend Read] ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks? 📄 Read here: https://t.co/fq7pg5w0CC In our latest joint research with academia and other frontier labs, we tested the ability of models to turn vulnerabilities into working exploits across different attack surfaces and mitigation conditions. Beyond the benchmark numbers, here is what this means for the industry: -🛡️ Blue Teams: Speeding up patch development and deployment is no longer optional. Integrating AI directly into CI/CD workflows should be your top priority. -🔬 Researchers: Current mitigation techniques reduce success rates, but they aren't a silver bullet. We need to step up our game—where do we focus next? -⚔️ Offensive Security: As models get better at finding bugs and writing exploits, we have to rethink disclosure timelines entirely. What does the future of bug bounties look like in this new era? I'd love to hear how your teams are preparing for this shift. Let me know

elie's tweet photo. [Weekend Read] ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks? 📄 Read here: https://t.co/fq7pg5w0CC

In our latest joint research with academia and other frontier labs, we tested the ability of models to turn vulnerabilities into working exploits across different attack surfaces and mitigation conditions.

Beyond the benchmark numbers, here is what this means for the industry:

-🛡️ Blue Teams: Speeding up patch development and deployment is no longer optional. Integrating AI directly into CI/CD workflows should be your top priority.

-🔬 Researchers: Current mitigation techniques reduce success rates, but they aren't a silver bullet. We need to step up our game—where do we focus next?

-⚔️ Offensive Security: As models get better at finding bugs and writing exploits, we have to rethink disclosure timelines entirely. What does the future of bug bounties look like in this new era?

I'd love to hear how your teams are preparing for this shift. Let me know

Elie Bursztein

@elie

about 1 month ago

[Weekend Read] BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows: https://t.co/Odo8tGviHx -> New benchmark that looks at real-world investment banking tasks. Models are not yet ready to replace investment bankers. As expected, models still don't perform very well on novel tasks, as they continue to have generalization issues — which might not be fixable with current LLM architectures/training processes. The task breakdown is interesting, as it shows different frontier models performing better across different categories, highlighting distinct strengths and weaknesses so the great convergence as yet to come #LLM #AI #Agent #finance #defi

elie's tweet photo. [Weekend Read] BankerToolBench: Evaluating AI Agents in End-to-End Investment Banking Workflows: https://t.co/Odo8tGviHx -> New benchmark that looks at real-world investment banking tasks.

Models are not yet ready to replace investment bankers. As expected, models still don't perform very well on novel tasks, as they continue to have generalization issues — which might not be fixable with current LLM architectures/training processes.

The task breakdown is interesting, as it shows different frontier models performing better across different categories, highlighting distinct strengths and weaknesses so the great convergence as yet to come

#LLM #AI #Agent #finance #defi

455

Elie Bursztein

@elie

about 1 month ago

How to secure agentic workflows? How to deal with AI agent identities? We explore those burning questions in the latest episode of the AI Security Podcast https://t.co/PG02gYmkCT #agent #AI #LLM #cybersecurity

350

elie retweeted

Vaishnavi

@_vmlops

about 2 months ago

GOOGLE BUILT A SECRET WEAPON FOR FILE DETECTION they ran it internally for years, gmail, drive, safe browsing, hundreds of billions of files every week then they open sourced it it's called magika and it exposes what files really are, not what they pretend to be rename malware to "resume.pdf"? magika sees through it disguise a script as an image? magika sees through it any trick attackers use with file extensions? magika sees through all of it ai trained on 100 million files. 200+ content types. 99% accuracy. 5ms per file one command `pip install magika` the same tool protecting google's billion users is now protecting yours https://t.co/Jr3LjmQobq

118

870

511K

Who to follow

Axel Souchet

@0vercl0k

¯\_(ツ)_/¯, blogging on https://t.co/36oOc8Mgha and posting codes on https://t.co/P83Oen94Rc.

Simon Anholt

@SimonAnholt

Advisor to numerous governments, author, researcher, podcaster, 12M+ TED talker. Worried, but unfashionably optimistic.

CGTN Radio

@CGTNRadio

CGTN Radio provides global audiences with news, reports and feature programs with a distinctive Chinese flavor and an international perspective.

Elie Bursztein

@elie

about 2 months ago

[Weekend Read] The “AI Vulnerability Storm”: Building a “Mythos-ready” Security Program https://t.co/xX76OiJTfj Collective paper on how to get ready to withstand the deluge of vulnerabilities that next generation of models, including Mythos from Anthropic are going to unleash. #LLM #claude #AI #cybersecurity

elie's tweet photo. [Weekend Read] The “AI Vulnerability Storm”: Building a “Mythos-ready” Security Program https://t.co/xX76OiJTfj Collective paper on how to get ready to withstand the deluge of vulnerabilities that next generation of models, including Mythos from Anthropic are going to unleash.

#LLM #claude #AI #cybersecurity

429

Elie Bursztein

@elie

2 months ago

[Weekend Read] TurboQuant: Redefining AI efficiency with extreme compression - https://t.co/5bh761BcCj This research got a lot of attention because TurboQuant help reduce LLM memory usage (6x) and improve generation speed (8x on a h100). A technical note: there seems some confusion floating around about how TurboQuant applies to LLMs: TurboQuant is NOT used to compress model weights, which is the usual quantization target, it is used to compress the model KV cache. This distinction matters because token generation is fundamentally memory-bandwidth bound; at larger context lengths the KV cache footprint start to eclipses model weights, creating a bottleneck that previous quantization methods couldn't address due to accuracy loss or dequantization latency.

elie's tweet photo. [Weekend Read] TurboQuant: Redefining AI efficiency with extreme compression - https://t.co/5bh761BcCj This research got a lot of attention because TurboQuant help reduce LLM memory usage (6x) and improve generation speed (8x on a h100).

A technical note: there seems some confusion floating around about how TurboQuant applies to LLMs: TurboQuant is NOT used to compress model weights, which is the usual quantization target, it is used to compress the model KV cache. This distinction matters because token generation is fundamentally memory-bandwidth bound; at larger context lengths the KV cache footprint start to eclipses model weights, creating a bottleneck that previous quantization methods couldn't address due to accuracy loss or dequantization latency.

407

Elie Bursztein

@elie

4 months ago

[Weekend Read] CL-bench: A Benchmark for Context Learning https://t.co/l3PdXqOE0C Context learning—the ability of models to learn from data stored in their context via tools, skills, and previous interactions—has recently gained traction as a promising research direction. This paper presents a novel benchmark designed to evaluate if models are truly capable of utilizing this context effectively. The results are a reality check: recent frontier models barely reach a 15% to 23% success rate. Improving in-context learning is essential if we want agents that can reliably execute complex, many-step workflows. #research #LLM #AI #weekend

elie's tweet photo. [Weekend Read] CL-bench: A Benchmark for Context Learning https://t.co/l3PdXqOE0C

Context learning—the ability of models to learn from data stored in their context via tools, skills, and previous interactions—has recently gained traction as a promising research direction. This paper presents a novel benchmark designed to evaluate if models are truly capable of utilizing this context effectively. The results are a reality check: recent frontier models barely reach a 15% to 23% success rate.

Improving in-context learning is essential if we want agents that can reliably execute complex, many-step workflows.

#research #LLM #AI #weekend

526

Elie Bursztein

@elie

4 months ago

[Weekend Read] How Healthy is the Android Crypto-Ecosystem? We analyzed 1.5 trillion cryptographic samples from 600 million devices to find out - https://t.co/rLbRJupp7F The good news? Overall baseline encryption error rates are incredibly low across the board, showing the ecosystem is performing as intended👍 Additionally the massive scale of this study allowed us to uncover several hard-to-detect failure patterns—including weak entropy and timing side channels—that specifically impact few chipsets and device models. #cryptography #android #research

431

Elie Bursztein

@elie

5 months ago

FastMCP v3 is out - https://t.co/MmYsmkgPOh Key changes include the support of skills, tools version, and robust authentication that allows to expose tools to specific users or sessions. #LLM #AI

514

Elie Bursztein

@elie

5 months ago

[Weekend Read] Anamnesis: LLM Exploit Generation Evaluation - https://t.co/idUGdUpfgP Deep dive by Sean Heelan evaluating frontier models' ability to write 0-day exploits (vulnerabilities not in training data) against modern mitigations like ASLR, CFI, and Seccomp sandboxing. Using a real QuickJS zero-day across 6 scenarios, GPT-5.2 solved all tasks while Claude Opus 4.5 solved 4/6—producing 40+ distinct working exploits. #research #cybersecurity #AI #LLM

484

Elie Bursztein

@elie

6 months ago

[Weekend Read] LLMs Can Get "Brain Rot - https://t.co/vlQkcAV66S LLMs fine‑tuned on junk data lead to lower performance on reasoning benchmarks and negative personality shifts. In AI, as always: garbage data in, garbage model out #AI #LLM #research

elie's tweet photo. [Weekend Read] LLMs Can Get "Brain Rot - https://t.co/vlQkcAV66S

LLMs fine‑tuned on junk data lead to lower performance on reasoning benchmarks and negative personality shifts. In AI, as always: garbage data in, garbage model out

#AI #LLM #research https://t.co/bcC331RYoO

499

Elie Bursztein

@elie

7 months ago

[Weekend Read] Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models – https://t.co/aHYuxwt9sX The paper shows how to decompose complex tasks into recursive agents. Beyond the examples they provide, the approach feels very general and a strong foundation for meta-agents—as demonstrated by ROMA (https://t.co/OJSXU6VUpi), which extends these ideas into a robust meta-agent framework. I actually recommend starting with ROMA, since the paper is somewhat abstract and can be harder to grok on first pass. #AI #LLM #AICommunity #artificial_intelligence

elie's tweet photo. [Weekend Read] Beyond Outlining: Heterogeneous Recursive Planning for Adaptive Long-form Writing with Language Models – https://t.co/aHYuxwt9sX

The paper shows how to decompose complex tasks into recursive agents. Beyond the examples they provide, the approach feels very general and a strong foundation for meta-agents—as demonstrated by ROMA (https://t.co/OJSXU6VUpi), which extends these ideas into a robust meta-agent framework.

I actually recommend starting with ROMA, since the paper is somewhat abstract and can be harder to grok on first pass.

#AI #LLM #AICommunity #artificial_intelligence

511

Elie Bursztein

@elie

7 months ago

I'm pleased to share that Magika 1.0, our AI-powered file type detection tool, is now officially released. Building on the incredible community adoption of over 1 million monthly downloads, this first stable version delivers key upgrades: • Expanded support to 200+ file types • A completely new, high-performance engine rewritten in Rust • A native Rust command-line client for enhanced speed and security Learn more about what's new in our blogpost: https://t.co/SnTSnGavAL #Magika #OpenSource #AI #MachineLearning #Rust

455

Elie Bursztein

@elie

8 months ago

[Weekend Read] Don’t Look Up: There Are Sensitive Internal Links in the Clear on GEO Satellites https://t.co/wkf4hUFepP Remarkable work on satellite security that uncovered that 50% of Geosynchronous (GEO) satellite US links studied have encryption issues. Non-encrypted traffic include calls, SMS, utility infrastructure control systems messages, military asset tracking, and in-flight wifi. #cybersecurity #research #satellites

elie's tweet photo. [Weekend Read] Don’t Look Up: There Are Sensitive Internal Links in the Clear on GEO Satellites https://t.co/wkf4hUFepP Remarkable work on satellite security that uncovered that 50% of Geosynchronous (GEO) satellite US links studied have encryption issues. Non-encrypted traffic include calls, SMS, utility infrastructure control systems messages, military asset tracking, and in-flight wifi.

#cybersecurity #research #satellites

529

Elie Bursztein

@elie

8 months ago

[Weekend Read] Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models - https://t.co/7aFiHdg7kU Latest iteration on having a context that is dynamically modified by the agent as it iterates through the problem. Benchmarking shows that this type of approach is only useful in some cases, so mileage may vary. #AI #Research #agent

elie's tweet photo. [Weekend Read] Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models - https://t.co/7aFiHdg7kU

Latest iteration on having a context that is dynamically modified by the agent as it iterates through the problem. Benchmarking shows that this type of approach is only useful in some cases, so mileage may vary.

#AI #Research #agent

581

Elie Bursztein

@elie

8 months ago

[Weekend read] A Treatise on Bitcoin Seed Backup Device Design https://t.co/NdhJT1bSQ5 Best piece I read on how to have an indestructible recovery option. Considering doing this also for my key accounts including email. #research #cybersecurity #crypto #cryptocurrency #BTC

536

Elie Bursztein

@elie

9 months ago

Excited to share that the GenSec CTF we ran at DEF CON 33 with Airbus to let the community explore how human-AI collaboration can speed up cybersecurity was a success. Overall: • Nearly 500 participants completed initial challenges • 85% found it useful for learning AI security workflows • 23% were using AI for cybersecurity for the very first time More details: https://t.co/1vQasNRgWs #Cybersecurity #AI #DEFCON

454

Elie Bursztein

@elie

9 months ago

[Weekend Read] On the Theoretical Limitations of Embedding-Based Retrieval - https://t.co/pbUGBGCq8h Shows the harsh limits of AI vector search (aka semantic search) and how older techniques such as BM25 likely scale better for many retrieval tasks. Yet another strong piece of evidence that hybrid search is needed for RAG solutions despite the hype around pure vector search solutions. Full research note: https://t.co/glmCnPcmN1 #AI #embeddings #search #IR

elie's tweet photo. [Weekend Read] On the Theoretical Limitations of Embedding-Based Retrieval - https://t.co/pbUGBGCq8h Shows the harsh limits of AI vector search (aka semantic search) and how older techniques such as BM25 likely scale better for many retrieval tasks. Yet another strong piece of evidence that hybrid search is needed for RAG solutions despite the hype around pure vector search solutions.

Full research note: https://t.co/glmCnPcmN1

#AI #embeddings #search #IR

513

Elie Bursztein

@elie

Who to follow

Last Seen Users on Sotwe

Trends for you

Most Popular Users