Subscribe
Sign in
Home
🧑💻 Become a DE
🔍 Deep Dive
⭐ Tech Stack
🚀 Growth
🗃️ Archive
❓ About
Deep Dive
Latest
Top
Discussions
How Delta Lake Works
Understand how Delta Lake handles reads and writes using the transaction log, ensuring ACID guarantees through snapshot isolation and optimistic…
Sep 6
•
Junaid Effendi
18
4
1
Benchmarking Spark - Open Source vs EMRs
Diving into four approaches from Spark Operator to EMR (EKS, EC2, and Serverless), sharing benchmarking results and key insights to help you choose the…
Jul 5
•
Junaid Effendi
5
2
2
Data Governance in Lakehouse Using Open Source Tools
Discover how to build a complete data governance ecosystem in a Lakehouse architecture using leading open-source tools. Explore access control, metadata…
May 10
•
Junaid Effendi
23
2
6
Securely Share and Automate File Transfers with AWS Transfer Family & Terraform
Learn to deploy a secure, automated SFTP server with AWS Transfer Family & Terraform. Set up restricted users, enforce SSH & MFA, and leverage workflows…
Mar 22
•
Junaid Effendi
7
1
Six Effective Ways to Reduce Compute Costs
Lets look into the top six ways that can help reducing your compute costs on AWS.
Feb 1
•
Junaid Effendi
13
Terraform vs Asset Bundles for Databricks Workflows
Sharing the experience of moving from Terraform to Asset Bundle for Databricks workflow deployment, covering challenges and benefits.
Oct 28, 2024
•
Junaid Effendi
7
Challenges: From Databricks to Open Source Spark & Delta
Sharing the challenges to save hours when doing migration from Databricks to Open Source.
Sep 25, 2024
•
Junaid Effendi
6
Data Modelling Using Complex Data Types
Complex data types like struct, array, map in modern warehouses are game changer, learn the useful aspects from a Data Engineer.
Jul 6, 2024
•
Junaid Effendi
15
2
2
Messaging Systems: Queue Based vs Log Based
Learn the key differences and important properties of queue and log based messaging systems.
Jun 15, 2024
•
Junaid Effendi
29
4
Handling Duplicates In Streaming Pipeline
Three ways to handle duplicate data in streaming pipelines. Learn the benefits, use cases and more in this article.
May 22, 2024
•
Junaid Effendi
17
1
1
Data Pipeline - Incremental vs Full Load
Learn the pros, cons and use cases about the data pipeline design patterns, full load and incremental commonly used across the industry.
Apr 13, 2024
•
Junaid Effendi
56
5
7
Data Processing in 21st Century
Timeline of Data Processing technologies covering from MapReduce to Polars.
Mar 20, 2024
•
Junaid Effendi
17
1
This site requires JavaScript to run correctly. Please
turn on JavaScript
or unblock scripts