Datacurve @datacurve - Twitter Profile

Pinned Tweet

4 days ago

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

76

2K

121

379

879K

Datacurve @datacurve

4 days ago

@winkey_h 🫡🫡

0

5

0

1K

Datacurve @datacurve

4 days ago

@bmptrsn @winkey_h W animated graph 📊

1

13

0

512

Datacurve @datacurve

4 days ago

Full deep dive coming soon. Check out the full benchmark here → https://t.co/tqvBb5vzG7

2

97

4

37

21K

Datacurve @datacurve

4 days ago

Opus 4.8 is now on DeepSWE. On the default high thinking effort, it scores 6% higher than Opus 4.7 xhigh, while also lowering average cost per task.

76

2K

121

379

879K

Datacurve @datacurve

4 days ago

Opus 4.8 delivers efficiency gains by solving tasks in fewer steps, directly reducing the total number of input tokens required per task.

datacurve's tweet photo. Opus 4.8 delivers efficiency gains by solving tasks in fewer steps, directly reducing the total number of input tokens required per task. https://t.co/LYY77qf8Ao

8

193

9

31

115K

datacurve retweeted

Matthew Berman

@MatthewBerman

7 days ago

DeepSWE reflects what I’m hearing from engineers better than any other benchmark. They took the hard path to build a good one.

20

182

14

57

37K

Datacurve @datacurve

8 days ago

@cbovolo @Neesh774 ok. https://t.co/V3LIf3Wt6G

2

3

0

62

Datacurve @datacurve

8 days ago

@Neesh774 lots of love put in here! @bmptrsn @shiqyy @LeonardMainnet & albert 🩵 who said data has to be boring

1

5

0

1

376

datacurve retweeted

Garry Tan

@garrytan

8 days ago

This is the new standard for engineering evals

32

859

63

505

114K

datacurve retweeted

Serena Ge (Datacurve)

@serenaa_ge

9 days ago

Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks. On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work.

serenaa_ge's tweet photo. Today we’re releasing DeepSWE, a new standard for agentic coding benchmarks.

On public leaderboards, top models often look relatively close in capability. DeepSWE shows where they actually diverge, reflecting the realistic experience of developers in their day-to-day work. https://t.co/HCDcjNuTFK

507

6K

751

3K

2M

datacurve retweeted

Serena Ge (Datacurve)

@serenaa_ge

about 2 years ago

I presented today at Demo day Day 2 and @TechCrunch featured us @datacurve! Just been reading TC and listening to TC Daily Crunch since high school mornings... a surreal feeling to see us on it. Also, post-demo sadness cuz now YC is coming to an end

serenaa_ge's tweet photo. I presented today at Demo day Day 2 and @TechCrunch featured us @datacurve! Just been reading TC and listening to TC Daily Crunch since high school mornings... a surreal feeling to see us on it. Also, post-demo sadness cuz now YC is coming to an end https://t.co/BfCz7Z8RoA

9

150

5

24

30K

Datacurve

@datacurve

Last Seen Users on Sotwe

Trends for you

Most Popular Users