Quesma @quesmaorg - Twitter Profile

Pinned Tweet

4 months ago

Recently we built OTelBench – a benchmark to test how well LLMs handle OpenTelemetry instrumentation. We tested 14 models. The best (Claude Opus 4.5) hit only 29%. These weren't trick questions, just small subset of typical SRE tasks. Link here: https://t.co/t8t0Hsf8wa

QuesmaOrg's tweet photo. Recently we built OTelBench – a benchmark to test how well LLMs handle OpenTelemetry instrumentation.

We tested 14 models. The best (Claude Opus 4.5) hit only 29%.

These weren't trick questions, just small subset of typical SRE tasks.

Link here:
https://t.co/t8t0Hsf8wa https://t.co/KzbWjwUgR0

0

3

0

954

QuesmaOrg retweeted

Piotr Migdal @pmigdal

3 months ago

AI + Ghidra by NSA = reverse-engineering fun I am speaking at @AITinkerers Warsaw, 4th Mar 2026. One of my favorite event series - by and for the creators community. Vibe-resurrecting an old game from binaries 👾 and vibe-hardware-ing a LED backpack 🎒🌈.

pmigdal's tweet photo. AI + Ghidra by NSA = reverse-engineering fun

I am speaking at @AITinkerers Warsaw, 4th Mar 2026.
One of my favorite event series - by and for the creators community.

Vibe-resurrecting an old game from binaries 👾 and vibe-hardware-ing a LED backpack 🎒🌈. https://t.co/vkHkh3NL5F

1

7

2

1

277

QuesmaOrg retweeted

Piotr Migdal @pmigdal

4 months ago

Claude can code, but can it read machine code? We gave AI agents access to Ghidra (a decompiler by the NSA) and tasked them with finding hidden backdoors in servers - working solely from binaries, without any access to source code. See our BinaryAudit: https://t.co/VPNk5ChPfH

pmigdal's tweet photo. Claude can code, but can it read machine code?

We gave AI agents access to Ghidra (a decompiler by the NSA) and tasked them with finding hidden backdoors in servers - working solely from binaries, without any access to source code.

See our BinaryAudit: https://t.co/VPNk5ChPfH https://t.co/FtIyxzQfNN

75

1K

179

936

232K

QuesmaOrg retweeted

Ryan Marten

@ryanmart3n

4 months ago

Great to see the community releasing benchmarks in @harborframework now. These are invaluable resources for collectively building the most useful agents.

1

9

1

2K

QuesmaOrg retweeted

Jacek Migdal

@jakozaur

5 months ago

I used to cite Gartner, now I quote @GergelyOrosz and his Pragmatic Engineer. Enjoy our new blog post: https://t.co/9JOVKOFn5h

0

3

1

0

231

Quesma @QuesmaOrg

6 months ago

Finally, an AI that can draw a map without getting lost. Nano Banana Pro uses tools to create factually correct infographics - and it's a game-changer. https://t.co/j17V5Rxu5p

0

1

2

0

242

QuesmaOrg retweeted

Jacek Migdal

@jakozaur

7 months ago

Postmortems are painful to write, especially this one. Sharing my startup Quesma journey so far. https://t.co/Wzrm6myRqM

2

18

3

1

2K

Quesma @QuesmaOrg

7 months ago

Interesting use case for AWS Lambda that we explored: sandboxing AI-generated code. We tried WebAssembly first but hit the wall. So, we scrapped our experiment for AWS Lambda with Docker containers in an isolated VPC. Full writeup from @pmigdal: https://t.co/nBxW6PtuMS

Tobias Schmidt @tpschmidt_

7 months ago

Lambda has tons of use cases, but one I've missed: using it as some kind of sandbox for running AI-generated code. Lambda's isolation and scaling are a solid fit for this problem.

1

6

2

3

1K

0

1

0

167

QuesmaOrg retweeted

AISecHub

@AISecHub

8 months ago

The security paradox of local LLMs - https://t.co/nOtVOULgd9 by @jakozaur at @QuesmaOrg If you’re running a local LLM for privacy and security, you need to read this. Our research on gpt-oss-20b (for OpenAI’s Red‑Teaming Challenge) shows they are much more prone to being tricked than frontier models. When attackers prompt them to include vulnerabilities, local models comply with up to 95% success rate. These local models are smaller and less capable of recognizing when someone is trying to trick them. #AISecurity #LLMSecurity #LocalLLM #GenAI #MLOps #ModelRisk #DataPrivacy #AIPrivacy #PromptInjection #AIThreats #AIGovernance #EdgeAI

0

8

4

1

343

Quesma @QuesmaOrg

9 months ago

See the full ranking and every run (logs, commands, binaries), methodology & code: ▶️ https://t.co/nLrxMUQw0a 💻 https://t.co/JZGKouDeYa 📃 https://t.co/QXPKpVDApa

0

3

0

1

95

Quesma @QuesmaOrg

9 months ago

Can AI compile 22-year-old code? We built CompileBench to find out. We know that LLMs can vibe-code or even win IOI, but what about dependency hell or legacy build systems? (image based on XKCD 2347)

QuesmaOrg's tweet photo. Can AI compile 22-year-old code? We built CompileBench to find out.

We know that LLMs can vibe-code or even win IOI, but what about dependency hell or legacy build systems?

(image based on XKCD 2347) https://t.co/0USZ2p7aNZ

1

4

0

2

186

Quesma @QuesmaOrg

9 months ago

Cost-efficiency crown: @OpenAI. Across difficulties, OpenAI models dominate the Pareto frontier of cost. GPT-5-mini (high reasoning) is a great price/perf pick; GPT-4.1 is the fastest with solid wins.

QuesmaOrg's tweet photo. Cost-efficiency crown: @OpenAI.

Across difficulties, OpenAI models dominate the Pareto frontier of cost.

GPT-5-mini (high reasoning) is a great price/perf pick; GPT-4.1 is the fastest with solid wins. https://t.co/9ZvlzKjBxA

1

2

0

3

130

Quesma @QuesmaOrg

9 months ago

Our blog post is second on Hacker News. Enjoy!

1

10

2

1

3K

Quesma @QuesmaOrg

about 1 year ago

Our new blog post about Apache Ice erg limitations: https://t.co/JvRet5dtgF

0

2

0

144

Quesma @QuesmaOrg

about 1 year ago

https://t.co/GCAfnQBaxM

0

1

0

86

Quesma @QuesmaOrg

about 1 year ago

At #IcebergSummit 2025, Ryan Blue unveiled Iceberg beyond Java, plus the path to Table Spec V3 & forward to V4. Przemysław Delewski’s new blog covers Fokko Driesprong on Pylceberg, Matt Topol on Go, Julien Le Dem on modular DBs. Essential read for next-gen data platforms. Link👇

QuesmaOrg's tweet photo. At #IcebergSummit 2025, Ryan Blue unveiled Iceberg beyond Java, plus the path to Table Spec V3 & forward to V4. Przemysław Delewski’s new blog covers Fokko Driesprong on Pylceberg, Matt Topol on Go, Julien Le Dem on modular DBs. Essential read for next-gen data platforms. Link👇 https://t.co/J192FT9oUF

1

3

0

193

Quesma @QuesmaOrg

about 1 year ago

Iceberg Table V3 is coming: https://t.co/Vfp25ElFHm

0

1

0

111

QuesmaOrg retweeted

Piotr Migdal @pmigdal

about 1 year ago

Everything is better when Kawaii 🌸🌸🌸: Titanic survival rates with freshly-released Quesma Charts. https://t.co/YCxi3UedHN At @DataCouncilAI conference in Oakland with Jacek Migdał. #dataViz @QuesmaOrg @jakozaur

pmigdal's tweet photo. Everything is better when Kawaii 🌸🌸🌸:
Titanic survival rates with freshly-released Quesma Charts.
https://t.co/YCxi3UedHN

At @DataCouncilAI conference in Oakland with Jacek Migdał.

#dataViz @QuesmaOrg @jakozaur https://t.co/sfP6Zd4bgI

0

4

2

0

309

Quesma

@QuesmaOrg

Last Seen Users on Sotwe

Trends for you

Most Popular Users