What is Apache Spark? Learn Apache Spark in 15 Minutes

Mr. K Talks Tech

Просмотров 7 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 26 ноя 2024
#apachespark #databricks #sparkteam #dataengineering #pyspark #architecture
In this video, I have covered the most important topic of Data Engineer which is "Apache Spark". Especially, I have talked about the complete end to end Architecture of Spark Spark covering all the individual components as below,
1. Driver Programme
2. Worker Node
3. Cluster Manager
4. Spark Context or Spark Session
5. DAG
6. RDD
7. Lazy Evaluation
8. Stages and
9. Tasks
Watch the complete video and get a complete understanding of Apache Spark.
- - - Book a Private One on One Meeting with me (1 Hour) - - -
www.buymeacoff...
- - - Express your encouragement by brewing up a cup of support for me - - -
www.buymeacoff...
- - - Other useful playlist: - - -
1. Microsoft Fabric Playlist: • Microsoft Fabric Tutor...
2. Azure General Topics Playlist: • Azure Beginner Tutorials
3. Azure Data Factory Playlist: • Azure Data Factory Tut...
4. Databricks CICD Playlist: • CI/CD (Continuous Inte...
5. Azure Databricks Playlist: • Azure Databricks Tutor...
6. Azure End to End Project Playlist: • End to End Azure Data ...
7. End to End Azure Data Engineering Project: • An End to End Azure Da...
- - - Let’s Connect: - - -
Email: mrktalkstech@gmail.com
Instagram: mrk_talkstech
- - - About me: - - -
Mr. K is a passionate teacher created this channel for only one goal "TO HELP PEOPLE LEARN ABOUT THE MODERN DATA PLATFORM SOLUTIONS USING CLOUD TECHNOLOGIES"
I will be creating playlist which covers the below topics (with DEMO)
1. Azure Beginner Tutorials
2. Azure Data Factory
3. Azure Synapse Analytics
4. Azure Databricks
5. Microsoft Power BI
6. Azure Data Lake Gen2
7. Azure DevOps
8. GitHub (and several other topics)
After creating some basic foundational videos, I will be creating some of the videos with the real time scenarios / use case specific to the three common Data Fields,
1. Data Engineer
2. Data Analyst
3. Data Scientist
Can't wait to help people with my videos.
- - - Support me: - - -
Please Subscribe: / @mr.ktalkstech

Комментарии • 43

@MohanKrishna-yi9cc 2 месяца назад ⁺⁵
OMG , I even tried many UDEMY courses to understand this. None of the tutor explained this clearly .. iam loving it ... Sir please start full databricks course to help us. Please.. 🙏
@mr.ktalkstech Месяц назад
Thank you so much :) Sure, will do :)
@dprakash1793 3 месяца назад ⁺³
this is what i was looking for well explained. thank you
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@manikandan-fq5sh 2 месяца назад ⁺²
simply great explanation about SPARK architecture, how its connected step by step it connects all the dots in Spark.
@mr.ktalkstech Месяц назад
Thank you so much :)
@manikandan-fq5sh Месяц назад
@@mr.ktalkstech looking further concept in Spark, it would be great if you try full course
@chandrakumar348 Месяц назад
Simple and brilliant analogy Mr K
@benim1917 3 месяца назад
Clear and well explained
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@rakeshverma6867 3 месяца назад
Simplest and excellent explanation Mr K.
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@digitalabi Месяц назад
I appreciate your explanation; it has clarified the topic for me. Thank you. 🙏🏼
However, I have one question: if the CSV files are split into two, how will one worker determine if there are any duplications with another worker work?
@sharaniyaswaminathan8760 3 месяца назад
Excellent! Thank you for explaining this.
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@dogzrgood Месяц назад
Great explanation. Do you have a full pyspark tutorial?
@shyammaths5705 2 месяца назад
this is so simple and clear explanation that
it made to share to my friends.
keep making video
your efforts putting great impact in our life.
@mr.ktalkstech Месяц назад
Thank you so much :)
@smderoller 3 месяца назад
Very well explained!!!
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@Bijuthtt 3 месяца назад
Awesome explanation bro.
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@selvakumarr.k.8660 3 месяца назад
Useful presentation
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@satish1012 16 дней назад
This is my understanding
- Apache Spark falls under the compute category.
-It's related to MapReduce but is faster due to in-memory processing.
-Spark can read large datasets from object stores like S3 or Azure Blob Storage.
-It dynamically scales compute resources, similar to autoscaling and Kubernetes orchestration.
-It processes the data to deliver analytics, ML models, or other results efficiently.
@shabeerkhan379 2 месяца назад
Really good
@062nanthagopalm6 3 месяца назад
Wow! Just mind blowing brother💥💥!! Looking for more DE fundamentals videos ✨♥️👌
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@neeraj_dama 3 месяца назад
thanks for this
@AlexFosterAI Месяц назад
hey man, may be worth a shot checking out LakeSail's PySail built on rust. supposedly 4x faster with 90% less hardware a cost according to their latest benchmarks. and can migrate existing python code. might be cool to make a vid on!
love ur content!
@seethaba Месяц назад
Great primer @Mr. K! Thanks. Quick question - How does the driver program create task partitions for the plan? For example, if there are duplicates across two worker nodes, wouldn't the count be misrepresented if it simply adds 4500 and 5500? Does this get auto-handled or do we have to control the partitioning logic?
@Bhavik_9988 Месяц назад
According to number of partitions of the files. You can also control over task by setting up configuration of partitions limit after each transformation.by using below code
spark.conf.set("spark.sql.shuffle.partitions", num_partitions)
The task is always depends on the number of partitions.
Your question is that each and every worker nodes have duplicates and in count operation it will just sum the results right.
Ans- after getting the result from each worker nodes the driver program will again aggregate it and then give the final result
@Abhinavkumar-kt8gj 3 месяца назад
Excellent!
@mr.ktalkstech 3 месяца назад
Thank you so much :)
@mgdesire9255 3 месяца назад
waiting for your pyspark playlist:)
@mr.ktalkstech 3 месяца назад ⁺¹
Very soon :)
@zakeerp 3 месяца назад ⁺¹
Hi , what tools used to create this type of videos. Please help.
@mr.ktalkstech 3 месяца назад ⁺¹
Final cut pro, CapCut, PowerPoint and After effects.
@zakeerp 3 месяца назад
@@mr.ktalkstech thank you for the info
@kirankarthikeyan4940 3 месяца назад
will this same topic be covered in the other channel (Mr.K Talks Tech Tamil)
@mr.ktalkstech 3 месяца назад
No brother :)
@PatelTushya 13 дней назад
Respect++
@neuera9556 2 месяца назад
Did not said about rddd

Следующие

Автовоспроизведение

Tips and Tricks- Delta Lake Table in Apache Spark - Azure Data Engineering Interview Question