Dev Chheda

3 months ago

>80% of prod commits are now made by Devin internally @cognition. We are at liftoff!

4

91

4

8

8K

devmchheda retweeted

about 7 hours ago

Claude Fable 5 is now available in Devin. Fable 5 earns the #1 spot on FrontierCode, our benchmark for real-world engineering tasks that grades mergeability and quality:

cognition's tweet photo. Claude Fable 5 is now available in Devin.

Fable 5 earns the #1 spot on FrontierCode, our benchmark for real-world engineering tasks that grades mergeability and quality:

35

928

60

105

72K

devmchheda retweeted

Jared Zoneraich

@imjaredz

1 day ago

SWE-bench makes it seem like we already basically solved coding (everyone 50+%) FrontierCode shows how much room we still have left (nobody beats 13.4%)

imjaredz's tweet photo. SWE-bench makes it seem like we already basically solved coding (everyone 50+%)

FrontierCode shows how much room we still have left (nobody beats 13.4%) https://t.co/rBhvYC035W

12

179

8

15

21K

devmchheda retweeted

Morgante

@morgantepell

1 day ago

FrontierCode is the best attempt I've seen to measure the gap between what AI can produce from a single shot prompt and what's actually needed to produce maintainable code. Use it to stop the slop.

1

37

3

2

2K

Who to follow

1 day ago

I spend most of my time looking at production agent traces, and FrontierCode is the first benchmark that actually comes close to reflecting real software engineering work and success criteria. Code correctness is only one component of whether an agent's work is useful - the code also has to be maintainable, well-scoped, and well-tested.

1 day ago

Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

cognition's tweet photo. Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers.

Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

224

4K

291

2K

2M

5

31

1

3

2K

devmchheda retweeted

1 day ago

Introducing FrontierCode: a coding eval that raises the bar for difficulty & quality. Each task took 40+ hrs of work by leading open-source maintainers. Models write sloppy code that works but isn’t maintainable. Our eval is first to measure: would you actually merge this code?

224

4K

291

2K

2M

devmchheda retweeted

Seth Fenster

@sethtjf

5 days ago

@DevinAI VMs are so wildly impressive what a feat of engineering

3

40

4

5

3K

devmchheda retweeted

Pratik Gandhi

@pratikgx

6 days ago

The ECDSA quantum challenge leaderboard is heating up 🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥 We're now 19.6% ahead of Google’s classified circuit for quantum computing. Models solvers used include: - @DevinAI - @deepseek_ai - @grok Grok 4.3 - @Zai_org GLM-5.1 - @ChatGPTapp Codex 5, 5.5 - @MiniMax_AI MiniMax M3 - @claudeai Opus 4.6, 4.7, 4.8, Sonnet 4.6 Powered by @eigenlabs

pratikgx's tweet photo. The ECDSA quantum challenge leaderboard is heating up 🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥🔥

We're now 19.6% ahead of Google’s classified circuit for quantum computing.

Models solvers used include:

- @DevinAI
- @deepseek_ai
- @grok Grok 4.3
- @Zai_org GLM-5.1
- @ChatGPTapp Codex 5, 5.5
- @MiniMax_AI MiniMax M3
- @claudeai Opus 4.6, 4.7, 4.8, Sonnet 4.6

Powered by @eigenlabs

2

18

4

1

866

devmchheda retweeted

Kevin Madura

@kmad

5 days ago

With agents everything becomes a hill to climb

0

2

1

0

770

devmchheda retweeted

Scott Wu

@ScottWu46

5 days ago

Measuring someone's productivity by their token usage is a horrible idea. Giving everyone the same fixed token budget isn't much better. So what's the right way to roll out AI across your org? We built a system to measure how many productive engineering hours every Devin task is worth, validated against a dataset of real engineers’ times estimates. The goal is to answer the fundamental question that companies are grappling with: how much real value are you getting from each of your agent sessions? On top of that, we're giving an AI productivity guarantee! Now if Devin delivers less engineering value than you're paying for, we fund your usage until it does. The whole industry needs to move from measuring activity to measuring output. We hope to see more AI companies taking this approach.

59

891

58

321

157K

5 days ago

@benhylak Congrats on the launch!

1

3

0

282

devmchheda retweeted

5 days ago

AI should earn its keep. Introducing the AI Productivity Guarantee. If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million. It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.

cognition's tweet photo. AI should earn its keep. Introducing the AI Productivity Guarantee.

If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million.

It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.

73

1K

96

424

417K

devmchheda retweeted

Gajesh

@gajesh

6 days ago

CTO of Cognition (@stevenkplus1) just joined the leaderboard... and Devin flipped Claude over 24 hour period!!! join the benchmax proof benchmark : ecdsa(.)fail

gajesh's tweet photo. CTO of Cognition (@stevenkplus1) just joined the leaderboard...

and Devin flipped Claude over 24 hour period!!!

join the benchmax proof benchmark : ecdsa(.)fail https://t.co/rFnAkRgYQ7

5

63

7

5

7K

devmchheda retweeted

Gajesh

@gajesh

6 days ago

Devin (@DevinAI) just beat Claude over 6 hour period on the quantum benchmark challenge!!

4

38

5

7

3K

6 days ago

@bbuddha_xyz @DevinAI @dabit3 @ScottWu46 @jhanikhil @stevenkplus1 🤯

0

3

0

223

devmchheda retweeted

gautham

@bbuddha_xyz

6 days ago

Looks like @DevinAI the best research agent out there? Nikhil from Devin is moving the frontier forward by reducing our the number of qubits our circuit requires.

bbuddha_xyz's tweet photo. Looks like @DevinAI the best research agent out there?

Nikhil from Devin is moving the frontier forward by reducing our the number of qubits our circuit requires. https://t.co/Y5GhaMqQEC

8

60

7

13

31K

6 days ago

devin is contributing to quantum computing research at https://t.co/w0NokXwWoB!

gautham

@bbuddha_xyz

6 days ago

Looks like @DevinAI the best research agent out there? Nikhil from Devin is moving the frontier forward by reducing our the number of qubits our circuit requires.

8

60

7

13

31K

1

42

3

6

5K

devmchheda retweeted

Scott Wu

@ScottWu46

7 days ago

Standalone IDEs have about 6 months left to live. An interface for manually editing and refactoring doesn’t need to exist if you're not manually editing and refactoring anymore. So what's the right interface for a dev to be working in for 8h / day? Some parts are obvious: you want to be able to spin up agents (either local or cloud agents) and to have a clean interface to keep up with all of your parallel running agents. Then you want to be able to get into the weeds whenever needed for last-mile fixes and review. But as software engineering continues to evolve we will see more and more of the lifecycle get reinvented. How do you build a single surface that allows you to plan, spec, prototype, debug, review, QA? Bringing Devin and Windsurf together has been our vision ever since the acquisition. Devin Desktop is our first shot at what this looks like. Excited to make this a reality today!

70

809

51

445

248K

7 days ago

@cognition Devin all the way down

0

5

0

615

7 days ago

The power of Devin now in your IDE! Devin Desktop is incredibly powerful for managing dozens of cloud agents in parallel.