Azure Synapse Analytics | Data Distribution Strategy and Best Practices
Arshad Ali - Aas Trailblazers Arshad Ali - Aas Trailblazers
1.91K subscribers
12,630 views
323

 Published On May 18, 2021

In any distributed system, for efficient parallel processing and for better performance, the data distribution strategy to store data evenly and colocation of data across nodes play important roles.

In this video, in the context of Azure Synapse Analytics – dedicated SQL pool, I am going to walk you through data distribution strategy - by way of distributions, different data strategies like round robin and hash and finally replicated table - and best practices to provide prescriptive guidelines in-terms of when to use which and what are the consideration for better performance.

0:00 Introduction of distributed system and data distribution
7:14 Table types in SQL pools
8:40 Round Robin Distribution - Introduction
13:18 Hash Distribution - Introduction
16:42 Concept of distribution and how it maps to compute nodes
22:48 Round Robin Vs Hash - Example and performance differences
35:51 Round Robin Vs Hash - Analyze execution plans
42:52 Round Robin Vs Hash - Join Compatibility
49:30 Hash Distribution - Data skewness
51:51 Round Robin - Best Practices and Guidelines
53:58 Hash Distributed - Best Practices and Guidelines
59:17 Replicated Table - Introduction, Best Practices and Guidelines
1:03:24 Replicated Table - Example

Thank you for watching, in my next video, I am going to talk in detail about columnstore index and how it helps in improving performance for analytical queries. Stay tuned.

GitHub Repo to download deck and script used in the video:
https://github.com/AasTrailblazers/Az...

Sample Databases
https://docs.microsoft.com/en-us/azur...

Table Design
https://docs.microsoft.com/en-us/azur...
https://docs.microsoft.com/en-us/azur...
https://docs.microsoft.com/en-us/azur...

Memory and concurrency limits
https://docs.microsoft.com/en-us/azur...

show more

Share/Embed