Azure Synapse Analytics | Spark pool | Delta Lake - Part 1
Arshad Ali - Aas Trailblazers Arshad Ali - Aas Trailblazers
1.91K subscribers
6,796 views
139

 Published On Nov 21, 2021

Delta lake is an open-source storage layer (a sub project of The Linux foundation) that sits in Azure Data lake store, when you are using it within Spark pool of Azure Synapse Analytics. In other way, you can think of Delta Lake as an optimized Spark table that brings data reliability and performance optimization to the scale. It makes your data lake faster, more reliable and accelerates the pace of innovation.

This video is focused on delta lake (what, why) and how you can use it in Spark pool of Azure Synapse Analytics.

00:00:00 Introduction - Delta Lake
00:00:28 Delta Lake - What and Why
00:05:07 Spark executor configuration (fixed vs dynamic)
00:13:55 Convert Parquet to Delta Lake
00:30:33 Working with dataframe and Delta Lake
00:38:27 Data Merge / Upsert with Delta Lake
00:47:33 Time Travel with Delta Lake
00:54:08 Vaccum operation to manage history of changes
00:59:20 File compaction
01:05:22 Convert Delta Lake to Parquet

Thank you once again for watching, please do like, subscribe and let me know your feedback or any specific topic you would like me to cover next.

GitHub Repo to download deck and script used in the video:
https://github.com/AasTrailblazers/Az...

Delta Lake official documentation
https://docs.delta.io/latest/delta-in...

What is Delta Lake
https://docs.microsoft.com/en-us/azur...

Create a serverless Apache Spark pool with Autoscaling
https://docs.microsoft.com/en-us/azur...

show more

Share/Embed