Thank you for watching! If you found Part 1 valuable and want to dive deeper, the full tutorial is available on Udemy. ▶ Get the Full Course on Udemy -> www.udemy.com/course/end-to-end-azure-data-engineering-real-time-project/?referralCode=626B44A4C9AA848ACB53 Thank you for supporting my work, and I’m excited to help you continue your learning journey!
I sincerely apologize for this situation. Unfortunately, due to Udemy's policies, I had to remove the content. Thank you so much for your understanding and continued support.
I just started doing projects on data engineering and to be honest, this series needs to be on top results. Very useful content and easily understandable to newbees. Eagerly waiting for new projects using new tools and cloud services
Hi Mr.K your lessons give the view of the roles of Data Engineering. I really appreciate your videos and would like to thank you sir. May God bless you and your family.
hi, great tutorial and indeed good learning for starters as me. Can you also please make end to end azure data engineering real time project with continuous data stream & readily available big data (so that we can readily download from your link). It would be of great help for us.
In real world, if we take similar setup, may I know what would be the reason for using Databricks instead of Data Factory for the transformation of the data between the layers?
Although we can use dataflow in adf for transformation but it is easy to use pyspark with dataframe and all for transformation and pyspark is fast also. And one more thing we most use adf for orchestration
Thank you so much :) We can connect directly from Data Lake as well- but its always recommended to use a structured database as a serving layer for reporting which will be scalable and handling the security will be simpler :)
Hi Mr.K, I have also worked on the same migration project wherein we migrated data from on prem sql server to azure data lake gen2. We have already transformed data into SQL server as per the business requirement and then copied it to data lake gen2 using activities in ADF. In this video you explained about lake house architecture which I was not aware earlier when I worked on this project. So I have a small doubt: As we transformed data already before migrating it to azure as per the client requirements in SQL server then after loading it to the azure data Lake, in which layer of lake house architecture it would have been copied by us among bronze, silver and gold? And is it possible to copy data directly to gold layer? It was my first project so I couldn't pay attention to more details, could you please help me understand about it. Thanks in advance!
Thanks for reaching out :) If the data is already transformed and it doesn't require any further transformation at all- then we can load directly to the Gold layer.
hi, Can u make Video to cover Azure and Spark relates interview questions and answers wrt to real time scenarios focusing on optimization done in specific for the use case and not the general methadologies. These are the questions I was asked recently. 1) How do u recover a corrupt parquet data file 2) U have millions of records in bronze layer and after transformations u have 50 million records in gold layer. U find that there are corrupt files in only one partition at the gold layer. How will u recover the file of that particular partition without rerunning the entire pipeline because we have millions of rows in both bronze layers 3) What are the actual optimization done in project by you to achieve a) Execution time optimization b) Join level optimization Interviewer did not want generic answers which we know or would have read theortically. He wanted in specific How i implemented in the project Please do video with such tricky questions
Sir this usecase doesn't make sense. They would want to eliminate/cut the on prem data warehouse to azure environment then why wiill we be connecting to it. For one bulk loat it is understandable but for daily refreshes the source should be an OLTP system? Still thanks for making this playlist...it is really helpful to understand important azure services.
Thank you so much for explaining the architecture. Wonderful content 😊 I have a question though- what is the use is azure synapse analytics as we already have gold layer with clean data. Why can’t we connect bi tool directly to gold layer? Can you please let me know sir?
It's an open source Adventure works database- follow the below link to import the database to the SSMS (I used the light weight version) learn.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver16&tabs=ssms
hello sir i have a problem with the transformation in part 6(data transformation) i keep getting the error AnalysisException: Found duplicate column(s) in the data to save: ship__to__address__id, sub__total, credit__card__approval__code, ship__method, ship__date, purchase__order__number, account__number, modified__date, order__date, revision__number, tax__amt, customer__id, due__date, sales__order__number, online__order__flag, bill__to__address__id, total__due, sales__order__id and the more i try to redefine the logic it still gives thesame errors
Dear really great video. I have couple of questions on this architecture a. Which type of challenges do we face if we connect Power BI to Databricks directly to prepare dashboards. b. We can do the transformations in Synapse as well , and how do we connect Gold Layer to Synapse to prepare the data before connect to the Power BI dashboards. c. What challenges do we face if we connect On-Premises SQL Server data to Power BI directly to prepare the dashboards. Kindly help me on that.
Previously databricks doesn't have a serverless DB (I guess they recently added it)- having serverless DB to Power BI integration will be better as we don't need to wait for the cluster to turn on as it will be readily available to query the tables.
@@mr.ktalkstech one more query was there like:- suppose client does not like to go on cloud :- c. What challenges do we face if we connect On-Premises SQL Server data to Power BI directly to prepare the dashboards.
Yes, you are right- synapse does both- in most cases Databricks is preferred for doing the data transformation, which works really well for the big data workloads and with the streaming data. But the main Idea of using databricks for this projects is to cover different resources as possible in the architecture, so that it would help people to understand how each resources works together. Hope that makes sense :)
Its very useful video . Can you please let me know if you have any hadoop data migration from hdfs to Azure sql server project . if Yes kindly share the link
Hi Mr k, Kindly help me out how to put this project in our resume. Whats the best way to present this project into resume so that we can explain the thing whatever we used in this.
Hey. thanks for reaching out. You can create a free azure account which will give you free credit of 200 dollars for 1 Year period, and you can use it to do the project if you would like to :) azure.microsoft.com/en-us/free/
Thank you for watching! If you found Part 1 valuable and want to dive deeper, the full tutorial is available on Udemy.
▶ Get the Full Course on Udemy -> www.udemy.com/course/end-to-end-azure-data-engineering-real-time-project/?referralCode=626B44A4C9AA848ACB53
Thank you for supporting my work, and I’m excited to help you continue your learning journey!
why did you hide your content sir? I thought you are the only teacher who help poor students like us providing best content in free.
I sincerely apologize for this situation. Unfortunately, due to Udemy's policies, I had to remove the content. Thank you so much for your understanding and continued support.
Great playlist for someone who has zero knowledge on ETL/AZURE. Good to clear fundamentals of azure resources
Thank you so much :)
This is the cleanest explanation I have ever come across on azure.
I just started doing projects on data engineering and to be honest, this series needs to be on top results. Very useful content and easily understandable to newbees. Eagerly waiting for new projects using new tools and cloud services
Thank you so much :)
@@mr.ktalkstech Hello Sir, can I have your linkedIn Id
Do you if they are all open source ?
this helped me to find my job. Thank you
which company and role?
wow what a explanantion ..Huge respect.
keep doing u will have good followers soon
Thank you so much :)
Amazing content. Congrats and thanks!
Plz plzz bring more.. U teach very well
Thank you so much :) sure, will do
Hi Mr.K your lessons give the view of the roles of Data Engineering. I really appreciate your videos and would like to thank you sir. May God bless you and your family.
Thank you so much :)
Am a starter in DE....your illustration is awesome and I have subscribed to page for more updates...
Thank you soo much :)
Amazing content, Thank you for sharing this video series..
Excellent....Superb tutorial. Fantastic explanation in a nut shell...
Thank you soo much :)
this video is savior for new aspirants
This video series is a game changer for me
You are really amazing seriously waiting for more such projects
Thank you so much :)
Why do we use Databricks? Azure Synapse Analytics does ETL.
Perhaps He wants to expose you to as many tools as possible
Awesome, some key concepts finally clicked in my brain. Great breakdown!
Thank you so much :)
hi, great tutorial and indeed good learning for starters as me. Can you also please make end to end azure data engineering real time project with continuous data stream & readily available big data (so that we can readily download from your link). It would be of great help for us.
Please carry with more viedos your knowledge sharing is helping us a lot🙏
Thank you so much :)
we are using the Medallion architecture at my job now.
Thank you for explaining concept simple with presentation.
Thank you so much :)
Excellent Explanation...🔥🔥🔥🔥
Thank you so much :)
Excellent content brother
Thank you so much :)
such a good explanation. great work. please post on complex challenges that faced by data engineers and its solutions.
Thank you so much :)
In real world, if we take similar setup, may I know what would be the reason for using Databricks instead of Data Factory for the transformation of the data between the layers?
Although we can use dataflow in adf for transformation but it is easy to use pyspark with dataframe and all for transformation and pyspark is fast also. And one more thing we most use adf for orchestration
very good session...thanks for brining this project. subscribed to channel by seeing the content..
Thank you so much :)
Amazing, pls add AKS too
I am not able to find data set about this project
Please refer other comments before commenting
Can you make an end to end project using Microsoft Fabric ? And please make more end to end to project like this
Hi, sure, I am already looking into Fabrics, you can expect the video in the near future, thanks for understanding :)
Thank you for these videos, really appreciate the time and efforts.
Thank you so much :)
Thank you so much for this content .can you also please bring up video for ADF to snowflake?
great informative video! quick question..why is Synapse analytics needed? Can't PowerBi directly get feed from the gold layer in datalake?
Thank you so much :) We can connect directly from Data Lake as well- but its always recommended to use a structured database as a serving layer for reporting which will be scalable and handling the security will be simpler :)
The concept you just showed in 11 mins is more worth then others playlist 😂, good to be ur subscriber man ❤ please keep making videos and help student
Thank you so much for the biggest compliment :)
@@mr.ktalkstech even i feel the same. Bhai your concepts are very clear, awesum videos.
Awesome explanation
Hi Mr.K,
I have also worked on the same migration project wherein we migrated data from on prem sql server to azure data lake gen2. We have already transformed data into SQL server as per the business requirement and then copied it to data lake gen2 using activities in ADF.
In this video you explained about lake house architecture which I was not aware earlier when I worked on this project.
So I have a small doubt:
As we transformed data already before migrating it to azure as per the client requirements in SQL server then after loading it to the azure data Lake, in which layer of lake house architecture it would have been copied by us among bronze, silver and gold? And is it possible to copy data directly to gold layer? It was my first project so I couldn't pay attention to more details, could you please help me understand about it.
Thanks in advance!
Thanks for reaching out :) If the data is already transformed and it doesn't require any further transformation at all- then we can load directly to the Gold layer.
@@mr.ktalkstech : Thanks for clearing the doubt.
Damn i did an exact project like this in my internship at Amazon
can you please bring more videos like this. Also DP203 certification guide videos.
Sure!
hi,
Can u make Video to cover Azure and Spark relates interview questions and answers wrt to real time scenarios focusing on optimization done in specific for the use case and not the general methadologies.
These are the questions I was asked recently.
1) How do u recover a corrupt parquet data file
2) U have millions of records in bronze layer and after transformations u have 50 million records in gold layer.
U find that there are corrupt files in only one partition at the gold layer.
How will u recover the file of that particular partition without rerunning the entire pipeline because we have millions of rows in both bronze layers
3) What are the actual optimization done in project by you to achieve a) Execution time optimization b) Join level optimization
Interviewer did not want generic answers which we know or would have read theortically.
He wanted in specific How i implemented in the project
Please do video with such tricky questions
How to integrate data from sources like salesforce, AWS, Azure data lakes, Genesys, SAP
Please share dataset to complete this project, really amazing videos
Hi Sir, Thank you for the video can you also do a 'End to End (Snowflake + Azure) Data Engineering Project' ?
Good explaination
Thank you so much :)
Sir this usecase doesn't make sense. They would want to eliminate/cut the on prem data warehouse to azure environment then why wiill we be connecting to it. For one bulk loat it is understandable but for daily refreshes the source should be an OLTP system?
Still thanks for making this playlist...it is really helpful to understand important azure services.
How do we load data from gold layer to synopsis...using ADF? or data bricks?
Thank you so much for explaining the architecture. Wonderful content 😊
I have a question though- what is the use is azure synapse analytics as we already have gold layer with clean data. Why can’t we connect bi tool directly to gold layer?
Can you please let me know sir?
+1
thanks for providing an amazing video... please provide the link to the dataset so we can practice.. thanks in advance
It's an open source Adventure works database- follow the below link to import the database to the SSMS (I used the light weight version)
learn.microsoft.com/en-us/sql/samples/adventureworks-install-configure?view=sql-server-ver16&tabs=ssms
@@mr.ktalkstechbro what is project name
hello sir i have a problem with the transformation in part 6(data transformation) i keep getting the error
AnalysisException: Found duplicate column(s) in the data to save: ship__to__address__id, sub__total, credit__card__approval__code, ship__method, ship__date, purchase__order__number, account__number, modified__date, order__date, revision__number, tax__amt, customer__id, due__date, sales__order__number, online__order__flag, bill__to__address__id, total__due, sales__order__id
and the more i try to redefine the logic it still gives thesame errors
plzz bring more using semi-structured and unstructured data
sure, ll do that :)
Project is amazing, can I get the database with tables you used in this project
Why do we need Synapse if PowerBI can read from any Gen2 storage at Gold level?
Hi Sir can u pls advise after free tier over how much cost it will come to use azure for learning this project
Dear really great video. I have couple of questions on this architecture a. Which type of challenges do we face if we connect Power BI to Databricks directly to prepare dashboards. b. We can do the transformations in Synapse as well , and how do we connect Gold Layer to Synapse to prepare the data before connect to the Power BI dashboards. c. What challenges do we face if we connect On-Premises SQL Server data to Power BI directly to prepare the dashboards. Kindly help me on that.
Previously databricks doesn't have a serverless DB (I guess they recently added it)- having serverless DB to Power BI integration will be better as we don't need to wait for the cluster to turn on as it will be readily available to query the tables.
@@mr.ktalkstech one more query was there like:- suppose client does not like to go on cloud :- c. What challenges do we face if we connect On-Premises SQL Server data to Power BI directly to prepare the dashboards.
Some challenges could be related to scalability
thanks for this!!
Why do you need DataBricks AND Synapse? Synapse does data transformation/loading also. Seems duplicative to me. Can you pls explain? Thanks
Yes, you are right- synapse does both- in most cases Databricks is preferred for doing the data transformation, which works really well for the big data workloads and with the streaming data.
But the main Idea of using databricks for this projects is to cover different resources as possible in the architecture, so that it would help people to understand how each resources works together. Hope that makes sense :)
is the project OS independent ? like any1 using mac linux ubuntu try it out ? or azure is only for Microsoft ?
Its very useful video . Can you please let me know if you have any hadoop data migration from hdfs to Azure sql server project . if Yes kindly share the link
Thank you. Do you have a community ? I wanna join please.
👍
bro why rest of the videos are hidden now?
Sir can you please upload the data set plz...... unable to do the project
Can you please share the project title for this project
HI where i can get the on prem data can u share that link it will be help full
Did pyspark use in databrics sir
Yes :)
Hi Mr k,
Kindly help me out how to put this project in our resume. Whats the best way to present this project into resume so that we
can explain the thing whatever we used in this.
Kindly tell me
Plz suggest me
I have not watched the entire video but you can put something like migrated on premise sql db to azure
i think we can transfer this data using data migration service right, if it's just for one time.
Hi, yes, that's right :)
@@mr.ktalkstech were is data set
Please say the use case for the project
Is this project available in udemy?
hey bro i have already supported you. I being charged and tried to copy the link but at the time of access it has gone. i want this project access
Do we need any subscription to build this project at any stage?
Same question
why did you disabled other parts brother, I was following the tutorial :(
Can I also do this project along side this video? I mean without paying anything for using Azure.
Hey. thanks for reaching out. You can create a free azure account which will give you free credit of 200 dollars for 1 Year period, and you can use it to do the project if you would like to :)
azure.microsoft.com/en-us/free/
@@mr.ktalkstech its 30 days I think
@@mr.ktalkstech Hi is 200 dollars enough to complete the whole project?
Could you please make this same project in using AWS services?
Sure, will do in the future :)
thanks bro
Hello Sir,
What should i mention project name on resume
Description
Roles n responsibilities
I am new in this field so pls help sir
please provide me dataset you have used during this project
Can the entire project be done by using the free subscription of Azure?
SIR CAN YOU BRING SOMETHING ON HEALTH CARE PROECT
Sure, will do that in the future :)
Can I ask is Microsoft Fabric basically using these same services?
Fabric contains Data Factory + Synapse + Data Lake (It does not have other services used in this Project)
Can you please say me the project name
is this project with free resources?
Can you please suggest good institute to learn azure data engineer course
I am not sure about that, Sorry :)
Buy ur own subscription. Learn 1 module at a time from open sources like youtube n documents.
sir datasets
unsub...
Hai Kishore u r explaining simply Superab how can I contact u
Thank you :) email: mrktalkstech@gmail.com
🤌🤌🤌🤌 this is perfect
Thank you so much :)
Amazing content, thanks for this video!!
Thank you so much :)
very nice explained
Thank you so much :)