Discussion about this post

User's avatar
Kolby Madison's avatar

DoorDash’s data stack is seriously impressive. I like how they mix open-source tools like Kafka, Flink, Spark, and Pinot with AWS to handle massive amounts of data.

Building a Lakehouse on S3 and Delta, plus using Trino, Airflow, and Sigma, shows how carefully they planned for scale and flexibility. Having 12,000 Sigma users is impressive!

Managing real-time and batch like that isn’t easy.

If you were starting from scratch, which tool would you prioritize first?

Expand full comment
jin's avatar

Great summary, thanks

Expand full comment

No posts