Subscribe
Sign in
Home
🧑💻 Become a DE
🔍 Deep Dive
⭐ Tech Stack
🚀 Growth
🗃️ Archive
❓ About
Deep Dive
Latest
Top
Discussions
How Delta UniForm works
Learn how Delta UniForm enables read and write interoperability from Delta tables to Iceberg and Hudi formats.
Mar 7
•
Junaid Effendi
4
1
Solving Spark’s Small File Problem for 100x Faster Reads
Understand the Spark common small file problem, learn how to solve in the modern open table formats through offline and online optimizations.
Dec 6, 2025
•
Junaid Effendi
17
2
3
How Delta Lake Works
Understand how Delta Lake handles reads and writes using the transaction log, ensuring ACID guarantees through snapshot isolation and optimistic…
Sep 6, 2025
•
Junaid Effendi
18
4
1
Benchmarking Spark - Open Source vs EMRs
Diving into four approaches from Spark Operator to EMR (EKS, EC2, and Serverless), sharing benchmarking results and key insights to help you choose the…
Jul 5, 2025
•
Junaid Effendi
5
2
2
Data Governance in Lakehouse Using Open Source Tools
Discover how to build a complete data governance ecosystem in a Lakehouse architecture using leading open-source tools. Explore access control, metadata…
May 10, 2025
•
Junaid Effendi
24
2
7
Securely Share and Automate File Transfers with AWS Transfer Family & Terraform
Learn to deploy a secure, automated SFTP server with AWS Transfer Family & Terraform. Set up restricted users, enforce SSH & MFA, and leverage workflows…
Mar 22, 2025
•
Junaid Effendi
7
1
Six Effective Ways to Reduce Compute Costs
Lets look into the top six ways that can help reducing your compute costs on AWS.
Feb 1, 2025
•
Junaid Effendi
13
Terraform vs Asset Bundles for Databricks Workflows
Sharing the experience of moving from Terraform to Asset Bundle for Databricks workflow deployment, covering challenges and benefits.
Oct 28, 2024
•
Junaid Effendi
7
Challenges: From Databricks to Open Source Spark & Delta
Sharing the challenges to save hours when doing migration from Databricks to Open Source.
Sep 25, 2024
•
Junaid Effendi
6
Data Modelling Using Complex Data Types
Complex data types like struct, array, map in modern warehouses are game changer, learn the useful aspects from a Data Engineer.
Jul 6, 2024
•
Junaid Effendi
16
2
2
Messaging Systems: Queue Based vs Log Based
Learn the key differences and important properties of queue and log based messaging systems.
Jun 15, 2024
•
Junaid Effendi
29
4
Handling Duplicates In Streaming Pipeline
Three ways to handle duplicate data in streaming pipelines. Learn the benefits, use cases and more in this article.
May 22, 2024
•
Junaid Effendi
17
1
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts