DoorDash Data Tech Stack

Junaid Effendi

Apr 26, 2025

Learn about the Data Tech Stack used by DoorDash to process hundreds of Terabytes of data every day.

Read →

3 Comments

jin

Apr 30, 2025

Great summary, thanks

Kolby Madison

Apr 29, 2025

DoorDash’s data stack is seriously impressive. I like how they mix open-source tools like Kafka, Flink, Spark, and Pinot with AWS to handle massive amounts of data.

Building a Lakehouse on S3 and Delta, plus using Trino, Airflow, and Sigma, shows how carefully they planned for scale and flexibility. Having 12,000 Sigma users is impressive!

Managing real-time and batch like that isn’t easy.

If you were starting from scratch, which tool would you prioritize first?

Neural Foundry

Oct 31

Pretty impresive how DoorDash handles 220 TB of data per day with this setup. The combination of Flink for real time stream procesing and Spark for batch jobs makes a lot of sence for their use case. What I find interesting is their choice of Pinot for analytics, which is a bit less common than Druid or ClickHouse in this space. The 12,000 Sigma users is wild, that's basically saying their entire company is data literate. Must be a huge operational advantage when everyone can query production data directly.