Coinbase Data Tech Stack
See what Coinbase use in the backend to handle big data that processes billions of data every day for their 120 million plus users.
Explore how Coinbase ingests billions of events daily to power trading, custody, and compliance for one of the world’s largest cryptocurrency platforms. This article dives into the essential tools, architectures, and innovations Coinbase employs for data ingestion, processing, storage, and analytics.
Metrics
120+ million verified users worldwide, source.
8.7+ million monthly transacting users (MTU), source.
$400+ billion in assets under custody, source.
Billions of events processed daily across user activity, blockchain data, and market feeds.
30 Kafka brokers with ~17TB storage per broker.
Content is based on multiple sources including Coinbase Blog, AWS Blog and other public articles etc. You will find references to dive deep as you read.
Platform
AWS
Coinbase leverages several AWS cloud services to solve their complex large scale challenges. The company also partnered with AWS to modernize and optimize its cloud infrastructure, migrating legacy workloads to Amazon EC2 instances powered by AWS Graviton processors and adopting Amazon EKS for automated scaling and resource management. As a result of this partnership, they have saved cost by roughly 62% and infra scaling time by 50%.
📖 Read More: Coinbase Boosts Efficiency and Accelerates Development by Collaborating with AWS
Messaging System
Kafka
For centralized messaging service, Coinbase uses Kafka through AWS managed offering, known as Managed Streaming for Apache Kafka (MSK). Kafka ingests billions of events everyday from user actions, applications, crypto feeds, and database change data capture (CDC).
With MSK, Coinbase reduced operational burden, achieved very low end-to-end latency (<10ms) for many pipelines (versus ~200ms with previous systems), improved reliability across AZs, and made scaling more seamless.
It takes few steps to provision a new MSK cluster.

📖Recommended Reading: How we scaled data streaming at Coinbase using AWS MSK
Processing
Spark (SOON)
Coinbase built SOON (Spark cOntinuOus iNgestion) on Databricks to replace slow, siloed Airflow <> Kafka <> Snowflake ETLs with a unified, low-latency streaming framework. Using Spark Structured Streaming and Delta Lake, SOON supports both append-only and merge (upsert/delete) ingestion, enabling scalable real-time data processing.
They also use Spark outside of SOON framework for batch processing.
📖 More on SOON: Spark cOntinuOus iNgestion for near real-time data
Orchestrator
Airflow
Coinbase adopted Airflow in 2017 when it was still gaining popularity. They made Airflow as their centralize orchestrator for data pipelines used by hundreds of data engineers and scientists.
With their adoption of Databricks, they are most likely leveraging Databricks Workflows, however there is no public information available.
📖 Recommended Reading: Revamping the Apache Airflow
Warehouse
Snowflake
Coinbase is also a customer of Snowflake, they have migrated the real time pipelines to Databricks, but other workflows still rely heavily on Snowflake as per this source. Furthermore, their BI team that leverages Looker which is most likely fetching data from Snowflake.
I could not find enough public information except one article from ex-Coinbase leader.
Lakehouse
Delta Lake
Delta is used through Databricks as their open table format. One of the usecase is for the Streaming pipeline that is built using SOON on Databricks, see image in Spark section.
S3
S3 is the object storage under Delta, however this is managed through Databricks. It is also used to dump full data snapshots from various Databases like PostgresSQL and DynamoDB.
Data Store
StarRocks
Coinbase uses StarRocks via CelerData to enable real-time analytics directly on their data lakehouse, avoiding complex ETL. This setup delivers sub-second query latency, supports high concurrency, and scales with growing data volumes, improving performance for analytics workloads.
Dashboard
Looker
As per this source from 2022, they onboarded Looker as their Business Intelligence (BI) Platform mainly due to its technical capabilities.
Related Content:
💬 Coinbase is a modern tech company that heavily relies on commercial and cloud offerings such as MSK, Databricks, Snowflake, and CelerData, reflecting a tech culture that prefers buying solutions rather than building them in-house.





