3 Comments
User's avatar
jin's avatar

Great summary, thanks

Expand full comment
Kolby Madison's avatar

DoorDash’s data stack is seriously impressive. I like how they mix open-source tools like Kafka, Flink, Spark, and Pinot with AWS to handle massive amounts of data.

Building a Lakehouse on S3 and Delta, plus using Trino, Airflow, and Sigma, shows how carefully they planned for scale and flexibility. Having 12,000 Sigma users is impressive!

Managing real-time and batch like that isn’t easy.

If you were starting from scratch, which tool would you prioritize first?

Expand full comment
Neural Foundry's avatar

Pretty impresive how DoorDash handles 220 TB of data per day with this setup. The combination of Flink for real time stream procesing and Spark for batch jobs makes a lot of sence for their use case. What I find interesting is their choice of Pinot for analytics, which is a bit less common than Druid or ClickHouse in this space. The 12,000 Sigma users is wild, that's basically saying their entire company is data literate. Must be a huge operational advantage when everyone can query production data directly.

Expand full comment