ChatPPT

@ChatPPT

Stanford

Joined December 2022

2 Following

20 Followers

28 Posts

ChatPPT retweeted

Cognition @cognition

14 days ago

AI should earn its keep. Introducing the AI Productivity Guarantee. If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million. It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.

cognition's tweet photo. AI should earn its keep. Introducing the AI Productivity Guarantee.

If Devin delivers less engineering value than you’re paying for, Cognition will fund your usage until it does, up to $10 million.

It’s time for the AI industry to stop maximizing tokens and start maximizing productive output.

73

1K

100

433

428K

ChatPPT retweeted

14 days ago

Noticing a shift in the industry: 6 months ago everybody was tokenmaxxing. Supposedly unlimited token budgets & every CTO was afraid their team’s weren‘t using enough tokens. This is no longer the case: Our typical customer saw agent usage grow 1000% in the last few months. Suddenly the spend is becoming significant. Of course, it is good to spend as much as possible on tokens if it delivers real productivity gains. However, a lot of tokens are wasted, e.g. through wasteful experiments and inefficient prompting. Devin is built to ensure you have the fine-grained control over your ROI. If you work with Cognition, we‘re confident you will get the highest "return on tokens" (we should maybe coin a term for it: ROT?) out of any product in the industry. We‘re so confident that we‘re putting our money where our mouth is: we‘ll cover up to $10 million in Devin usage if we don’t deliver positive ROI. It‘s kind of insane for us to do this and we’re taking a big bet here. We‘re in the position to do this only because of @ryanbai1412‘s research on automated measurement of AI agent productivity. Read our technical blog (linked in the thread below) on how we built a system to predict time savings which we can convert into dollars saved. What we found: Cognition customers are seeing real & provable ROI on Devin. With our coaching & enterprise-specific Devin features this ROI can be increased further. A lot more to come on this!

3

79

5

18

6K

ChatPPT retweeted

15 days ago

Very excited for this! The best research content requires beautiful design. There‘s no better example of this than @3blue1brown‘s video content which has incredible craft. We‘re looking forward to hosting him at our office and celebrating what design & research can do together

1

45

3

6

3K

ChatPPT @ChatPPT

2 months ago

@silasalberti 👀

0

2

0

0

165

Who to follow

Verified account

机会聚合，人才流动。专注web3华人职场交流与职业成长。主频道🛰 https://t.co/C9a3lgbthV

Avalanche_中文🔺

Verified account

Avalanche 🔺中文官方推 Avalanche 是对开发者最为友好的、可定制化的、高速安全可靠的高性能公链。请通过下方链接加入我们的中文社区⬇️

ChatPPT retweeted

2 months ago

With SWE-1.6 we've made significant progress on "intelligence per token". We post-trained the model from scratch (same pre-trained model) with a similar recipe as SWE-1.6 Preview. Our latest algorithm achieves similar intelligence at ~40% fewer assistant turns. We also shipped further infra improvements, so our latest training run was 1.6x faster end-to-end for the same amount of FLOPs. In our next training run, we're aiming to 5x the FLOPs.

silasalberti's tweet photo. With SWE-1.6 we've made significant progress on "intelligence per token". We post-trained the model from scratch (same pre-trained model) with a similar recipe as SWE-1.6 Preview. Our latest algorithm achieves similar intelligence at ~40% fewer assistant turns.

We also shipped further infra improvements, so our latest training run was 1.6x faster end-to-end for the same amount of FLOPs. In our next training run, we're aiming to 5x the FLOPs.

9

142

16

22

39K

ChatPPT retweeted

4 months ago

Over the last few months we started building our research team at Cognition and we've come a long way! It's been exciting to figure out what it takes to build a large-scale post-training stack from scratch and push towards the frontier. My personal take is it's been easier than expected, e.g. we were surprised to match Opus 4.5 which seemed so far way just 3 months ago. We definitely still got lots to figure out but the slope is high and this model is just the beginning.

silasalberti's tweet photo. Over the last few months we started building our research team at Cognition and we've come a long way!

It's been exciting to figure out what it takes to build a large-scale post-training stack from scratch and push towards the frontier. My personal take is it's been easier than expected, e.g. we were surprised to match Opus 4.5 which seemed so far way just 3 months ago.

We definitely still got lots to figure out but the slope is high and this model is just the beginning.

16

228

21

57

37K

ChatPPT retweeted

4 months ago

There‘s an underestimated axis in coding models: "delight". Windsurf arena mode probably indexes 60% on delight & 40% on capabilities. Observations: - Anthropic is crushing it on delight - GPT-5.2 & Gemini 3 underperform in delight relative to IQ - SWE-1.5 outperforms GPT-5.2 here! (I didn’t expect this)

2

35

3

9

6K

ChatPPT retweeted

6 months ago

There was a moment in mid Nov when all labs (GPT-5.1, Sonnet 4.5, Gemini 3 Pro) were roughly similar on SWE-Bench in the 76-77% range. Then Opus charged ahead and broke 80%. GPT-5.2 is now the second model to break 80% (+ new SOTA on SWE-Bench Pro). In day-to-day usage still seeing people prefer Opus but it's early to tell

silasalberti's tweet photo. There was a moment in mid Nov when all labs (GPT-5.1, Sonnet 4.5, Gemini 3 Pro) were roughly similar on SWE-Bench in the 76-77% range.

Then Opus charged ahead and broke 80%. GPT-5.2 is now the second model to break 80% (+ new SOTA on SWE-Bench Pro). In day-to-day usage still seeing people prefer Opus but it's early to tell

16

218

11

35

35K

ChatPPT retweeted

7 months ago

So Google just forked the Windsurf codebase and they even forgot to remove the Cascade branding in some places

silasalberti's tweet photo. So Google just forked the Windsurf codebase and they even forgot to remove the Cascade branding in some places https://t.co/zRf5boMd0O

105

3K

159

544

746K

ChatPPT retweeted

8 months ago

The era of fast coding models has begun: Cognition SWE-1.5: ~950 tok/s Cursor Composer-1: ~250 tok/s Haiku 4.5: ~140 tok/s

27

600

31

107

105K

ChatPPT retweeted

8 months ago

super excited to finally release SWE-1.5 - a frontier-scale model (~hundreds of billions of params) running at insane speeds (up to 950 tok/s) - outperforms GPT-5-High on SWE-Bench-Pro - 13x faster than Sonnet 4.5, 6x faster than Haiku 4.5 - more than double the benchmark perf of SWE-1 it's been fun to scale up RL to large models on our cluster of thousands of GB200 chips. (this might be the first public model release trained on the new GB200 NVL72 generation 🤔) more details on the training in 🧵

10

210

15

42

34K

ChatPPT retweeted

8 months ago

super excited to release SWE-grep and SWE-grep-mini! SWE-grep-mini achieves extreme inference speeds of >2,800 TPS: 20x faster than Haiku 4.5 while beating Sonnet 4.5, Opus 4.1 & GPT-5 on our CodeSearch eval our vision: make agentic search as fast as embedding search. the SWE-grep models can perform multi-turn agentic search in <3 seconds compared to 20-60 seconds for other frontier models. https://t.co/ymd7tYFAaZ

silasalberti's tweet photo. super excited to release SWE-grep and SWE-grep-mini!

SWE-grep-mini achieves extreme inference speeds of >2,800 TPS:
20x faster than Haiku 4.5 while beating Sonnet 4.5, Opus 4.1 & GPT-5 on our CodeSearch eval

our vision: make agentic search as fast as embedding search. the SWE-grep models can perform multi-turn agentic search in <3 seconds compared to 20-60 seconds for other frontier models.
https://t.co/ymd7tYFAaZ

24

356

20

98

80K

ChatPPT retweeted

Cognition @cognition

8 months ago

Introducing SWE-grep and SWE-grep-mini: Cognition’s model family for fast agentic search at >2,800 TPS. Surface the right files to your coding agent 20x faster. Now rolling out gradually to Windsurf users via the Fast Context subagent – or try it in our new playground!

72

1K

128

517

663K

ChatPPT retweeted

9 months ago

cognition has been ripping while maintaining the highest talent bar: since 1 year ago, revenue has grown more than 100x – but our core engineering team is still only around 30 people we're hiring exceptional people in any role but I wanted to highlight two roles in particular: - product design: cognition is known for many things but not (yet!) for its design. we're looking for a world-class designer to change that - research & RL infra: we have a small & excellent research team and lots of compute. looking for researchers who can own large scope across product x models x infra

8

316

24

110

145K

ChatPPT retweeted

11 months ago

DeepWiki just had its biggest week ever. It's now at >2.5 million lifetime users and also recently crossed 1 million MAUs (up 70% compared to the previous month). What should we build next? (If you're interested in working at the intersection of product & research, DMs open)

silasalberti's tweet photo. DeepWiki just had its biggest week ever.

It's now at >2.5 million lifetime users and also recently crossed 1 million MAUs (up 70% compared to the previous month).

What should we build next?

(If you're interested in working at the intersection of product & research, DMs open) https://t.co/SNGICDFbK6

16

250

32

85

37K

ChatPPT retweeted

Cognition @cognition

12 months ago

We needed instant VM snapshots for Devin but EC2 took 30+ minutes. So, @silasalberti built blockdiff—a new file format that makes snapshots 200x faster. Today, we’re open-sourcing blockdiff & sharing how it works 🔗👇

cognition's tweet photo. We needed instant VM snapshots for Devin but EC2 took 30+ minutes. So, @silasalberti built blockdiff—a new file format that makes snapshots 200x faster.

Today, we’re open-sourcing blockdiff & sharing how it works 🔗👇 https://t.co/uSN12cf0QW

11

506

56

311

155K

ChatPPT retweeted

about 1 year ago

Thanks to @karpathy for featuring DeepWiki at his startup school talk today! Context is everything. Agents are good at reading dozens of files dumped into context – but high-level codebase structure is missing. DeepWiki is focused on solving the high-level understanding.

silasalberti's tweet photo. Thanks to @karpathy for featuring DeepWiki at his startup school talk today!

Context is everything. Agents are good at reading dozens of files dumped into context – but high-level codebase structure is missing. DeepWiki is focused on solving the high-level understanding. https://t.co/ffJGgLGc5W

2

127

15

51

24K

ChatPPT retweeted

about 1 year ago

We built an official MCP server for DeepWiki. Free & no auth required! Thank you to @OpenAI for featuring it as the main example in their docs for Remote MCP! DeepWiki is becoming the go-to place for documentation & questions on open-source repos – for humans and agents.

silasalberti's tweet photo. We built an official MCP server for DeepWiki. Free & no auth required!

Thank you to @OpenAI for featuring it as the main example in their docs for Remote MCP!

DeepWiki is becoming the go-to place for documentation & questions on open-source repos – for humans and agents. https://t.co/eFLBuY0I5S

2

126

14

65

17K

ChatPPT retweeted

about 1 year ago

Devin & friends are finally a product category ❤️ We've been working on this problem for over 1 year now – and many didn't believe in it. Great to see that the world is realizing! The evolving landscape of coding tools: - 2020: Autocomplete (Copilot, TabNine) -- AI suggests your next line of code - 2023: AI IDEs (Cursor, Windsurf, Zed) -- your editor augments you - 2024: App Builders (Lovable, Bolt, v0) -- from idea to prototype in minutes - 2025: Cloud Agents (Devin, Codex, Jules) -- team of AI software engineers working in parallel Since we released Devin publicly in December 2024, a big user confusion was: what category does this belong to? We believe all tools have their place: IDEs augment yourself, Devins multiply yourself. App Builders take you from 0 -> 1, Devins take you from 1 -> 100. Glad to see Devin & friends finally taking off in 2025!

12

191

20

82

42K

ChatPPT retweeted

about 1 year ago

DeepWiki is now natively powering Devin

0

21

3

5

1K

Last Seen Users on Sotwe

Trends for you

Most Popular Users