We've spent over a decade building an app that puts everything in your city at your fingertips. The average person has 800k+ menu items and grocery products available to them on @DoorDash, but more options shouldn’t mean more work to find what you want.
Now the app works harder so you don’t have to: https://t.co/FKxVzWUe93 [3/3]
Today we’re launching Ask DoorDash — a new conversational way to search the app in your own words through chat, voice, a recipe link or photo. Ask DoorDash can build you a grocery cart ~5x faster than doing it manually. It takes a single prompt to complete your cart in under 2 minutes.
In early testing, nearly half of all restaurant orders made with Ask DoorDash were from a place the customer had never ordered from before, and grocery baskets built with Ask DoorDash were over 35% larger than those without. [1/3]
Traditional ML models excel at learning what consumers do: what they order, search, skip, or substitute. But they don’t capture the why in a way LLMs can reason over.
That’s why we built a unified consumer memory platform: converting behavioral signals into structured, versioned semantic memory blocks that both ML models and LLMs can use. As the graphic illustrates:
• Without memory: Generic carousel pool like “Popular Snacks,” reranked for each consumer.
• With memory: LLMs use rich memory blocks (dietary preferences, brand affinities, store preferences, and more) to generate truly personalized carousels, like “Peanut-Free Snacks to Keep Stocked.”
[1/4]
We benchmarked Claude Fable on DashBench PR Review (our internal code-review benchmark)
Claude Fable had the strongest overall performance:
• 62.7% recall (almost double that of Opus 4.8)
• 88.7% precision
• 73.4% F1
GPT-5.5 had the highest precision at 91.7%.
Also, Fable was ~2.5x more expensive per PR.
Code review is one of our biggest blockers at DoorDash - excited to see the rapid model progress in making agentic coding superhuman!
One mission: "help me do my grocery run." Behind it: agentic intent routing, consumer memory, semantic IDs, LLM-supervised search. The agent IS the stack. I'm breaking it down in the Conference Auditorium at #QConAI at Boston today, 3:40 PM. 🤖 @DoorDash AI is pushing the edge!
🚨 Shocking: Frontier LLMs score 85-95% on standard coding benchmarks. We gave them equivalent problems in languages they couldn't have memorized. They collapsed to 0-11%.
Presenting EsoLang-Bench.
Accepted to the Logical Reasoning and ICBINB workshops at ICLR 2026 🧵
Today we are welcoming the Metis team to DoorDash as part of DoorDash AI Research.
For the past six months, DoorDash has partnered with Metis to build AI agents together, and we have been consistently impressed by their team. By joining forces, we aim to accelerate our plans on building agentic commerce and pushing the frontier of physical intelligence. Excited to share more there soon.
It’s still early innings with how AI will transform local commerce, and we’re looking forward to exploring those possibilities together with Aryan, Aayush, Marcus and the Metis team!
Proud to have been a part of this effort! PLUS shows that optimized user summaries outperform vector embeddings and ICL for personalization and better align LLMs to real user diversity. This is one of the core ideas we’re exploring at DoorDash to drive AI-powered personalization.
Excited to share that "Learning to summarize user information for personalized reinforcement learning from human feedback" is accepted to ICLR.
TL;DR We can train a conversation summarizer with RL to capture diverse user preferences for pluralistic LLM alignment.
w/ @natashajaques@mickel_liu@yanming_wan@PeterAhnnDD
arxiv: https://t.co/ikGvQMVJpl
website: https://t.co/wes5PIep4E
code: https://t.co/erH09waJfU
@airindiain after you assured me that my flight was rescheduled to 11 am (two days after it was cancelled) I made connection arrangements. This morning it is changed to 4 pm. I need a refund as this new time does not work for me anymore. AI 180 SFO-BOM.