Azure Synapse Analytics | Data Distribution Strategy and Best Practices

Arshad Ali - Aas Trailblazers

Просмотров 13 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 16 сен 2024

Комментарии • 46

@orxanbabashov 7 месяцев назад
This is the first time I ever subscribed a channel as well. Huge thanks !!!!
@VirtusRex48 Год назад
One of the best Synapse videos out there; highly recommend!!!
@hoanglieuit Год назад
This is the first time I ever subscribed a channel.
@VK-ln9vk Год назад
i wish there are 100000 LIKE buttons. THE BEST VIDEO on the azure synapse distribution. Understood clearly about the distributions with the demo.Thank you so much 🙏
@abc_987 28 дней назад
JUST GOLD
@goelnikhils Год назад
What hard work in creating this video. Very good content
@vinayak6685 2 года назад ⁺¹
Really happy to find this video. Loved the practical demo on how the distributions happened. Subscribed(500th subscriber😁). Waiting for more such awesome content🤩
@Farisito 11 месяцев назад
Thank you a lot ALI, very useful in my case
@husnabanu4370 Год назад
wow so detailed explaination with all the visuals and query example is making so easy to understand...
@Zaf567 2 года назад
Have watched many videos related to this but yours is awesome.
@donanuradha2162 3 года назад ⁺¹
Very well explained how data is distributed in Synapse SQL DW
@ArshadAliAasTrailblazers 3 года назад
Thanks Anuradha, I am happy it was helpful for you!
@julianromero3359 Год назад
Amazing explanation, thanks for concepts are very clear and practical to understand. I hope find more contents from you. 🤗
@vaibhavvaidya1442 3 года назад
Never saw explanation like this on azure synapse, Amazing :)
@ArshadAliAasTrailblazers 2 года назад
Thanks Vaibhav for your kind words, glad it was helpful!
@gvgnaidu6526 2 года назад
Amazing explanation and nice representation of all the aspects. Thank you so much Arshad
@danielveraec 2 года назад
Thanks for sharing this knowledge. Really helpfully!!
@jubershikalgar4205 2 года назад
Thank you very much for this video.
It was a very helpful and learnt alot about synapse.
@peaceneeded 2 года назад
Simply Amazing Explanation !
@SQLTalk 2 года назад
This is a very well done and helpful video. Thank you for making it.
@Ali-q4d4c Год назад ⁺¹
👍🏻👍🏻👍🏻
@MohammedKhan-np7dn 3 года назад
Very Good session to understand the concepts in Synapse Analytics
@ArshadAliAasTrailblazers 2 года назад
Thanks Mohammed for your kind words, glad it was helpful!
@vivekvishal2500 2 года назад ⁺¹
Great Sir 👌
@MohammedKhan-np7dn 3 года назад
Thank you to explain the concepts in detail.
@ArshadAliAasTrailblazers 2 года назад
You are welcome!
@upendarjakkula2561 2 года назад
Extraordinary 👌
@Ali-q4d4c Год назад ⁺¹
👍👍👍👍
@MohammedKhan-np7dn 3 года назад
Looking forward for the next session
@ArshadAliAasTrailblazers 2 года назад
Thanks Mohammed, I just posted a video on CI/CD and planning to post few more in next couple of weeks.
@kuldeepgawande9550 3 года назад
Excellent explanation. Thank you.
@ArshadAliAasTrailblazers 2 года назад
You are welcome!
@amittyagi9171 2 года назад
Thank you so much. You are amazing.
@HGoIchetan09 3 года назад
Excellent explanation.. Thanks..
@ArshadAliAasTrailblazers 2 года назад
You are welcome
@shuaibpantnagar 2 года назад
Very nicely explained the Azure Synapse specially SQL pool. I have question here. Both Synapse and Azure Data bricks have spark engine. How would I choose one between them for my my project work?
@TiffanyMorris123 3 года назад ⁺¹
Thanks for this video! Question you touched quickly on creating statistics in Synapse prior to running queries based on the query patterns.. For my case I have a large group of users from admins to analysts to developers and I can not predict the types of queries that they will run. Is there a best practices that I can pass on to the users when planning to create the stats before running their queries? Do you plan on future tutorials on this topic? thanks!
@ArshadAliAasTrailblazers 2 года назад
Thanks Tiffany! While creating stats in advance is a proactive way to optimize the performance, engine also learns from first time submitted queries to optimize the performance for future submissions when AUTO_CREATE_STATISTICS setting is ON (which is ON by default). You can find more details about it here: docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-statistics
To shorten statistics maintenance time, be selective about which columns have statistics, or need the most frequent updating. For example, you might want to update date columns where new values may be added daily. Focus on having statistics for columns involved in joins, columns used in the WHERE clause, and columns found in GROUP BY. docs.microsoft.com/en-us/azure/synapse-analytics/sql/best-practices-dedicated-sql-pool#maintain-statistics
@Mohammad.aarif_222 5 месяцев назад
From where I need to store files in blob storage
@sumitrauniyar7347 2 года назад
how does replicate distribution work when we have 1 compute node?
@samuelrocha9079 2 года назад
Thank you for the video, one of the bests that I ever watched in terms of learning data.
Just a quick question, in round-robin table, you said the data will be shuffled when you query the group by ProductKey, and the distribution will be organized by that field, so, what if after that, I decide to execute the same query, but grouping by a different field? The shuffle will happen again? and the distribution will be by this other field that I'm considering to group?
@SushilChauhan Год назад
yes.
@Mohammad.aarif_222 5 месяцев назад
How do I make external table
@SSingh-lr2ue 3 года назад
Thank you for the clear explaination . however i am not clear about where does 60 buckets or 60 distribution gets stored , Is it in azure storage ? In short not getting the purpose/difference of azure storage and SQL Database instance attached with compute node , Could you please explain more about it ?
@ArshadAliAasTrailblazers 2 года назад
For developers, I think the important thing to consider is how it scales out, for example, if you have 2 nodes, each of these nodes will have 30 distributions attached to it, likewise if you 4 nodes, each of these nodes will have 15 distributions. By this scaling out from 2 to 4 nodes, each of these nodes now will have roughly half of the data (assuming there is no data skewness), and will take roughly half the time to complete processing. docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits#service-levels
@BabatundeAdeleye-mw5ce 10 месяцев назад
The 60 distributions are stored in the sql database instance in the sql pool. data from azure store are distributed to the distributions in different patterns, depending on the distribution type defined on the sql pool table during table creation. sql engine then gets these data from the distributions as instructed in your query, which may require it to move data around or not before executing the aggregate function on the data and sending the output to the control node, which in turn sends the same to the user for viewing.

Следующие

Автовоспроизведение

Azure Synapse Analytics | Index Options | Columnstore Index | Best Practices