DoorDash’s data stack is seriously impressive. I like how they mix open-source tools like Kafka, Flink, Spark, and Pinot with AWS to handle massive amounts of data.
Building a Lakehouse on S3 and Delta, plus using Trino, Airflow, and Sigma, shows how carefully they planned for scale and flexibility. Having 12,000 Sigma users is impressive!
Managing real-time and batch like that isn’t easy.
If you were starting from scratch, which tool would you prioritize first?
Great summary, thanks
DoorDash’s data stack is seriously impressive. I like how they mix open-source tools like Kafka, Flink, Spark, and Pinot with AWS to handle massive amounts of data.
Building a Lakehouse on S3 and Delta, plus using Trino, Airflow, and Sigma, shows how carefully they planned for scale and flexibility. Having 12,000 Sigma users is impressive!
Managing real-time and batch like that isn’t easy.
If you were starting from scratch, which tool would you prioritize first?