Sudeep Das

about 19 hours ago

https://t.co/SuQe5bi1lY

1

34

10

68

43K

datamusing retweeted

about 22 hours ago

We've spent over a decade building an app that puts everything in your city at your fingertips. The average person has 800k+ menu items and grocery products available to them on @DoorDash, but more options shouldn’t mean more work to find what you want. Now the app works harder so you don’t have to: https://t.co/FKxVzWUe93 [3/3]

0

18

2

1

2K

datamusing retweeted

about 22 hours ago

Today we’re launching Ask DoorDash — a new conversational way to search the app in your own words through chat, voice, a recipe link or photo. Ask DoorDash can build you a grocery cart ~5x faster than doing it manually. It takes a single prompt to complete your cart in under 2 minutes. In early testing, nearly half of all restaurant orders made with Ask DoorDash were from a place the customer had never ordered from before, and grocery baskets built with Ask DoorDash were over 35% larger than those without. [1/3]

41

436

23

180

127K

datamusing retweeted

ML for healthcare and equity. Assistant Professor @UCBerkeley and @UCSF. Prev @Harvard, @MIT, @MSFTResearch

3 days ago

Check out our latest Engineering Blog for more on how we built the memory platform: https://t.co/PtjncsI5FT [4/4]

0

2

289

Who to follow

Irene Chen

@irenetrampoline

Aish Fenton

@aishfenton

Wrangling models @ OpenAI, Kiwi, and cat dad. He/Him.

CLASS Project

@CLASS_telescope

Cosmology Large Angular Scale Surveyor | Measuring the Cosmic Microwave Background @JohnsHopkins | Discovering when the universe first lit up #NSFfunded

datamusing retweeted

3 days ago

Traditional ML models excel at learning what consumers do: what they order, search, skip, or substitute. But they don’t capture the why in a way LLMs can reason over. That’s why we built a unified consumer memory platform: converting behavioral signals into structured, versioned semantic memory blocks that both ML models and LLMs can use. As the graphic illustrates: • Without memory: Generic carousel pool like “Popular Snacks,” reranked for each consumer. • With memory: LLMs use rich memory blocks (dietary preferences, brand affinities, store preferences, and more) to generate truly personalized carousels, like “Peanut-Free Snacks to Keep Stocked.” [1/4]

2

11

2

10

700

datamusing retweeted

3 days ago

We benchmarked Claude Fable on DashBench PR Review (our internal code-review benchmark) Claude Fable had the strongest overall performance: • 62.7% recall (almost double that of Opus 4.8) • 88.7% precision • 73.4% F1 GPT-5.5 had the highest precision at 91.7%. Also, Fable was ~2.5x more expensive per PR. Code review is one of our biggest blockers at DoorDash - excited to see the rapid model progress in making agentic coding superhuman!

AIatDoorDash's tweet photo. We benchmarked Claude Fable on DashBench PR Review (our internal code-review benchmark)

Claude Fable had the strongest overall performance:
• 62.7% recall (almost double that of Opus 4.8)
• 88.7% precision
• 73.4% F1

GPT-5.5 had the highest precision at 91.7%.
Also, Fable was ~2.5x more expensive per PR.

Code review is one of our biggest blockers at DoorDash - excited to see the rapid model progress in making agentic coding superhuman!

1

33

7

14K

datamusing retweeted

7 days ago

Yes, DoorDash does in fact do hardcore autonomy tech 😎

1

17

1

2

2K

Sudeep Das

@datamusing

11 days ago

One mission: "help me do my grocery run." Behind it: agentic intent routing, consumer memory, semantic IDs, LLM-supervised search. The agent IS the stack. I'm breaking it down in the Conference Auditorium at #QConAI at Boston today, 3:40 PM. 🤖 @DoorDash AI is pushing the edge!

datamusing's tweet photo. One mission: "help me do my grocery run." Behind it: agentic intent routing, consumer memory, semantic IDs, LLM-supervised search. The agent IS the stack. I'm breaking it down in the Conference Auditorium at #QConAI at Boston today, 3:40 PM. 🤖 @DoorDash AI is pushing the edge! https://t.co/13gzcShre8

1

2

0

224

datamusing retweeted

Lossfunk

@lossfunk

3 months ago

🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%. Presenting EsoLang-Bench. Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵

152

2K

282

1K

1M

datamusing retweeted