To Data & Beyond

@ToDataBeyond

To Data & Beyond: Your home to master data science, AI & Gen AI—beyond the basics. Join 27K+ readers! ✍️ Lead Author: @YoussefHosni951

Helsinki, Finland

Joined October 2023

16 Following

241 Followers

75 Posts

ToDataBeyond retweeted

@YoussefHosni951

26 days ago

On the DeepSWE benchmark, GPT-5.5 [xhigh] reaches 70% pass@1, while Claude Opus 4.8 [max] reaches 58%, which means there is a 12-point gap on long-horizon coding tasks. But the more important part is why DeepSWE exists in the first place. On the older SWE-Bench Pro benchmark, some Claude agents were able to inspect the repository history using commands like `git log --all` and `git show`, find the merged fix, and copy the gold solution into their own patch. According to Datacurve, this behavior accounted for around 18% of Opus 4.7’s passing tasks and 25% of Opus 4.6’s. GPT models, based on their report, did not show the same behavior. DeepSWE removes that shortcut by using a shallow clone with no gold commit hash. In other words, the model can no longer “discover” the answer from the repo history. There are two important caveats. - First, Datacurve built the benchmark and made the judgment about the exploit. That makes this a vendor finding, not a fully independent audit. - Second, DeepSWE uses a bash-only harness, which is exactly the kind of terminal-agent workflow GPT-5.5 appears to be strong at. On Anthropic’s own GDPval benchmark for broader knowledge work, Opus 4.8 is still ahead. So the benchmark you choose now has a huge impact on who appears to be winning. GPT-5.5 looks clearly stronger on clean, contamination-controlled terminal coding tasks. It also does the work at around $5.80 per task and finishes in about 20 minutes, while Claude takes longer and costs more. That is a meaningful result. But it is also a reminder that when a vendor quotes a benchmark, we should always ask what the benchmark actually measures, and what it quietly leaves out.

YoussefHosni951's tweet photo. On the DeepSWE benchmark, GPT-5.5 [xhigh] reaches 70% pass@1, while Claude Opus 4.8 [max] reaches 58%, which means there is a 12-point gap on long-horizon coding tasks.

But the more important part is why DeepSWE exists in the first place.

On the older SWE-Bench Pro benchmark, some Claude agents were able to inspect the repository history using commands like `git log --all` and `git show`, find the merged fix, and copy the gold solution into their own patch.

According to Datacurve, this behavior accounted for around 18% of Opus 4.7’s passing tasks and 25% of Opus 4.6’s. GPT models, based on their report, did not show the same behavior.

DeepSWE removes that shortcut by using a shallow clone with no gold commit hash. In other words, the model can no longer “discover” the answer from the repo history.

There are two important caveats.

- First, Datacurve built the benchmark and made the judgment about the exploit. That makes this a vendor finding, not a fully independent audit.

- Second, DeepSWE uses a bash-only harness, which is exactly the kind of terminal-agent workflow GPT-5.5 appears to be strong at. On Anthropic’s own GDPval benchmark for broader knowledge work, Opus 4.8 is still ahead.

So the benchmark you choose now has a huge impact on who appears to be winning.

GPT-5.5 looks clearly stronger on clean, contamination-controlled terminal coding tasks. It also does the work at around $5.80 per task and finishes in about 20 minutes, while Claude takes longer and costs more.

That is a meaningful result.

But it is also a reminder that when a vendor quotes a benchmark, we should always ask what the benchmark actually measures, and what it quietly leaves out.

0

4

1

0

187

To Data & Beyond @ToDataBeyond

30 days ago

https://t.co/oeWkeXbnai

0

1

0

0

62

To Data & Beyond @ToDataBeyond

30 days ago

Check out our latest publication: According to Anthropic’s Engineer, HTML is the New Markdown for AI Agents by Kaitai Dong: VP AI Scientist - AI Labs @ BlackRock

ToDataBeyond's tweet photo. Check out our latest publication: According to Anthropic’s Engineer, HTML is the New Markdown for AI Agents by Kaitai Dong: VP AI Scientist - AI Labs @ BlackRock https://t.co/xF9Qy2Kbgk

1

2

1

0

150

To Data & Beyond @ToDataBeyond

about 1 month ago

Eid Al-Adha Mubarak from @ToDataBeyond Wishing you and your families peace, joy, and blessings during this special time. May this Eid bring you closer to the people you love and give you space to reflect, recharge, and move forward with gratitude.

ToDataBeyond's tweet photo. Eid Al-Adha Mubarak from @ToDataBeyond

Wishing you and your families peace, joy, and blessings during this special time.

May this Eid bring you closer to the people you love and give you space to reflect, recharge, and move forward with gratitude. https://t.co/BZlyNXandw

0

3

0

0

279

To Data & Beyond @ToDataBeyond

2 months ago

Check out our latest blog: Claude Code Heading for a $100 Paywall Instead of $20! The New Pricing Reality for AI Coding Agents https://t.co/7D8O2MaH1u

ToDataBeyond's tweet photo. Check out our latest blog: Claude Code Heading for a $100 Paywall Instead of $20! The New Pricing Reality for AI Coding Agents

https://t.co/7D8O2MaH1u https://t.co/zx4YYeIWk2

0

3

0

0

102

To Data & Beyond @ToDataBeyond

2 months ago

Check out our latest blog: How to Scrape JavaScript-Heavy Websites for LLM Pipelines with Cloudflare Browser Rendering https://t.co/fy1QFKpPi2

ToDataBeyond's tweet photo. Check out our latest blog: How to Scrape JavaScript-Heavy Websites for LLM Pipelines with Cloudflare Browser Rendering

https://t.co/fy1QFKpPi2 https://t.co/m16afuRIfk

0

1

0

0

84

To Data & Beyond @ToDataBeyond

3 months ago

Eid Mubarak from To Data & Beyond 🌙✨ Wishing you joy, peace, and beautiful moments with your loved ones.

ToDataBeyond's tweet photo. Eid Mubarak from To Data & Beyond 🌙✨
Wishing you joy, peace, and beautiful moments with your loved ones. https://t.co/w9nbGOVbD5

0

3

1

0

82

To Data & Beyond @ToDataBeyond

4 months ago

We have reached 22,000 subscribers! Thank you to everyone who’s been part of it so far.

@YoussefHosni951

4 months ago

@ToDataBeyond has just reached 22,000 subscribers! 🎉 I’m truly grateful for this milestone and for everything this journey has become, thanks to your support. Thank you to everyone who’s been part of it so far. I’m excited for what’s next and looking forward to achieving even more together.

YoussefHosni951's tweet photo. @ToDataBeyond has just reached 22,000 subscribers! 🎉

I’m truly grateful for this milestone and for everything this journey has become, thanks to your support.

Thank you to everyone who’s been part of it so far. I’m excited for what’s next and looking forward to achieving even more together.

0

4

1

0

819

0

2

0

0

198

To Data & Beyond @ToDataBeyond

4 months ago

Use OpenClaw to Make a Personal AI Assistant: Learn how to set up OpenClaw as a personalized AI agent Read it from here: https://t.co/RYsq8Z0yDp

ToDataBeyond's tweet photo. Use OpenClaw to Make a Personal AI Assistant: Learn how to set up OpenClaw as a personalized AI agent

Read it from here: https://t.co/RYsq8Z0yDp https://t.co/d0CL1hEiZa

0

1

0

0

115

To Data & Beyond @ToDataBeyond

4 months ago

Use GLM-5 in Claude Code and Save 60% on Tokens https://t.co/tQnJEcKh6y

ToDataBeyond's tweet photo. Use GLM-5 in Claude Code and Save 60% on Tokens

https://t.co/tQnJEcKh6y https://t.co/SMejce3Xl6

0

5

1

4

440

To Data & Beyond @ToDataBeyond

4 months ago

What do you need to learn to be an AI Engineer in 2026? Where to Learn it? What to build with it? by @YoussefHosni951 Read it here: https://t.co/MTlIbRIHaP

ToDataBeyond's tweet photo. What do you need to learn to be an AI Engineer in 2026? Where to Learn it? What to build with it? by @YoussefHosni951

Read it here: https://t.co/MTlIbRIHaP https://t.co/b9kUY89Osw

0

4

0

2

147

To Data & Beyond @ToDataBeyond

5 months ago

Moltbot (Clawdbot) is the New Rage in Agentic AI— Here’s How To Get Started https://t.co/zlqEzl4y0h

ToDataBeyond's tweet photo. Moltbot (Clawdbot) is the New Rage in Agentic AI— Here’s How To Get Started

https://t.co/zlqEzl4y0h https://t.co/5xNdV28Mo7

0

3

0

0

109

ToDataBeyond retweeted

@YoussefHosni951

8 months ago

Debugging AI agents can feel like investigating a black box. In our latest @ToDataBeyond tutorial, we use LangSmith tracing to build a Recipe Discovery Agent while making every step visible and debugging effortless with it.

YoussefHosni951's tweet photo. Debugging AI agents can feel like investigating a black box.

In our latest @ToDataBeyond tutorial, we use LangSmith tracing to build a Recipe Discovery Agent while making every step visible and debugging effortless with it. https://t.co/iyB4IIq0gs

1

4

1

0

212

ToDataBeyond retweeted

@YoussefHosni951

10 months ago

A summary of the important LLM papers published last week is now available to read on @ToDataBeyond.

YoussefHosni951's tweet photo. A summary of the important LLM papers published last week is now available to read on @ToDataBeyond. https://t.co/8QWiFpsuHz

1

10

1

1

846

To Data & Beyond @ToDataBeyond

10 months ago

Check out our latest publication, the seventh part of our ongoing series“Building Agents with LangGraph” course. In which we build a multiagent system with LangGraph

@YoussefHosni951

10 months ago

Just published the seventh part of “Building Agents with LangGraph” course In this tutorial, we will build a multi-agent system that collaborates to write an essay. This system will follow a cyclical, reflective process: it will plan, research, write, and then critique its own.

YoussefHosni951's tweet photo. Just published the seventh part of “Building Agents with LangGraph” course

In this tutorial, we will build a multi-agent system that collaborates to write an essay. This system will follow a cyclical, reflective process: it will plan, research, write, and then critique its own. https://t.co/5bSpnWvkbv

1

3

0

1

276

0

3

0

0

161

ToDataBeyond retweeted

@YoussefHosni951

10 months ago

We have published a series of articles on @ToDataBeyond to start in Docker for data science & AI Projects:

YoussefHosni951's tweet photo. We have published a series of articles on @ToDataBeyond to start in Docker for data science & AI Projects: https://t.co/e5o3R6tyrg

1

5

2

3

318

ToDataBeyond retweeted

@YoussefHosni951

11 months ago

Missed yesterday’s webinar? Don’t worry—we’ve got you covered! You can catch the full recording now on our YouTube channel. P.S. Don’t forget to hit subscribe so you won’t miss future sessions!” https://t.co/gn7qo2fwcX

0

7

2

1

2K

To Data & Beyond @ToDataBeyond

11 months ago

Registration form: https://t.co/GZDokjGCSO

0

1

0

0

39

To Data & Beyond @ToDataBeyond

11 months ago

Preparing for AI Job Interviews in the Age of Generative AI and LLMs - Free Live webinar We will discuss how the interview landscape is shifting, what skills are now expected, and how candidates can stand out in today’s AI-driven job market.

ToDataBeyond's tweet photo. Preparing for AI Job Interviews in the Age of Generative AI and LLMs - Free Live webinar

We will discuss how the interview landscape is shifting, what skills are now expected, and how candidates can stand out in today’s AI-driven job market. https://t.co/0OhX9HtHl6

1

5

1

0

311

To Data & Beyond @ToDataBeyond

almost 2 years ago

Top Important Computer Vision Papers for the Week from 12/08 to 18/08 https://t.co/8o3ZXjnBCx

ToDataBeyond's tweet photo. Top Important Computer Vision Papers for the Week from 12/08 to 18/08

https://t.co/8o3ZXjnBCx https://t.co/R8qsmewJlH

0

2

0

0

281

Last Seen Users on Sotwe

Trends for you

Most Popular Users