Part 3- Data Ingestion (1) | End to End Azure Data Engineering Project
HTML-код
- Опубликовано: 28 сен 2024
- #azuredataengineer #endtoendproject #azuredataengineeringproject #azureintamil #azuredatafactory #azuredatabricks #azuresynapseanalytics #azuredatalake #datalake #powerbi #keyvault
This is a long awaited video of mine- Lets build a complete End to End Azure Data Engineering Project. In this project we are going to create an end to end data platform right from Data Ingestion, Data Transformation, Data Loading and Reporting.
The tools that are covered in this project are,
1. Azure Data Factory
2. Azure Data Lake Storage Gen2
3. Azure Databricks
4. Azure Synapse Analytics
5. Azure Key vault
6. Azure Active Directory (AAD) and
7. Microsoft Power BI
The use case for this project is building an end to end solution by ingesting the tables from on-premise SQL Server database using Azure Data Factory and then store the data in Azure Data Lake. Then Azure databricks is used to transform the RAW data to the most cleanest form of data and then we are using Azure Synapse Analytics to load the clean data and finally using Microsoft Power BI to integrate with Azure synapse analytics to build an interactive dashboard. Also, we are using Azure Active Directory (AAD) and Azure Key Vault for the monitoring and governance purpose.
Part 4 will be uploaded soon. Stay tuned.
- - - Book a Private One on One Meeting with me (1 Hour) - - -
www.buymeacoff...
- - - Express your encouragement by brewing up a cup of support for me - - -
www.buymeacoff...
- - - Other useful playlist: - - -
Azure Data Factory Playlist: • Azure Data Factory Tut...
Azure General Topics Playlist: • Azure Beginner Tutorials
Microsoft Fabric Playlist: • Microsoft Fabric Tutor...
Azure Databricks Playlist: • Azure Databricks Tutor...
Azure End to End Project Playlist: • End to End Azure Data ...
Databricks CICD Playlist: • CI/CD (Continuous Inte...
End to End Azure Data Engineering Project: • An End to End Azure Da...
- - - Let’s Connect: - - -
Email: mrktalkstech@gmail.com
Instagram: mrk_talkstech
- - - Tools & Equipment (Gears I use): - - -
Disclaimer: Links included in this description might be affiliate links. If you purchase a product or service with the links that I provide, I may receive a small commission. There is no additional charge to you! Thank you for supporting me so I can continue to provide you with free content each week!
DJI Mic: amzn.to/3sNpDv8
Dell XPS 13 Plus 13.4" 3.5K : amzn.to/45KqH1c
Rode VideoMicro Vlogger Kit: amzn.to/3sVFW8Y
DJI Osmos Action 3: amzn.to/44KYV3x
DJI Mini 3 PRO: amzn.to/3PwRwAr
- - - About me: - - -
Mr. K is a passionate teacher created this channel for only one goal "TO HELP PEOPLE LEARN ABOUT THE MODERN DATA PLATFORM SOLUTIONS USING CLOUD TECHNOLOGIES"
I will be creating playlist which covers the below topics (with DEMO)
1. Azure Beginner Tutorials
2. Azure Data Factory
3. Azure Synapse Analytics
4. Azure Databricks
5. Microsoft Power BI
6. Azure Data Lake Gen2
7. Azure DevOps
8. GitHub (and several other topics)
After creating some basic foundational videos, I will be creating some of the videos with the real time scenarios / use case specific to the three common Data Fields,
1. Data Engineer
2. Data Analyst
3. Data Scientist
Can't wait to help people with my videos.
- - - Support me: - - -
Please Subscribe: / @mr.ktalkstech
if you get this error "Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access.
A connection was successfully established with the server, but then an error occurred during the login process." then follow these steps in SSMS
Server (Right Click) -> Properties -> Security -> Server Authentication -> SQL Server and Windows Authentication Mode
Restart the server and test the connection
Hi Sir, Excellent Video. Clearly explained. Please upload all other videos. Waiting. Thanks
Thank you, the next part will be uploaded this week.
I have just uploaded the next part- cheers :)
@@mr.ktalkstech thank you so much
Best ADF demo, thanks for sharing.
Thank you :)
Having problems with the copy activity. The column 'geospatial' in the address table cannot be copied, due to its datatype 'geography'. I don't know how to handle this can anyone help?
Excellent work... please do more videos on real time projects... waiting for another Azure E2E pipeline
Thank you so much :)
excellent content, clarifying everything about Azure DE
Thank you so much :)
Excellent videos, will that possible to share the Database file for practice?
Hi
I have an issue in access policies in 12:47 . There are no any access policies avilable there. Can you please help to fix it. I almost try everything
grant the 'proprietary' access in the key-vault-demo for the current user what u using
Go to the access configuration and change it to vault policy
I did everything when connecting to the parquet and in click the debug button.
My status turned out failed with this pipeline run ID:47840b35-b608-463f-8ae4-2468cb2a7d65
What could be the problem?
I am getting error like "
"Schema import failed: Please select a table"" after connecting ssms and selecting table. Could you help me in resolving this.
Hello THANKS for your pretty videos. I have a question , if i have 3 environments ( DEV, TEST,PROD) , I need to create new pipeline for each environment ?
Thats where devops comes in picture , either we use ADO or git based pipelines using terraform
Hello bro, Thank u for replying to all my issues, I got one more error, "Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access.
Login failed for user '.,"
Please help me on this, Thanks in Advance
Hello,
Sure.
1. Please first check if the username and the password that you have in the Key vault is correct or not. Also check the server name and database name.
2. If correct, please use the same credentials to connect to the server/db using the SSMS.
3. If connected successfully, right click on the sever name in SSMS and click the properties option, under the properties menu go to the connection and see if the "Allow remote connections to the sever" checkbox is enabled or not. If not, please enable it and then give it a try.
I also assuming that, you have given the db_owner access to the user login as per the Part 2 video (environment setup)
Did you solve this? I have the same issue
@@mr.ktalkstech in Part 2 video you gave db_datareader access and not db_owner access. I still selected both the steps and I am not able to solve this error. Below is my error: Cannot connect to SQL Database. Please contact SQL server team for further support. Server: 'localhost', Database: 'AdventureWorksLT2017', User: . Check the linked service configuration is correct, and make sure the SQL Database firewall allows the integration runtime to access.
Login failed for user ''., SqlErrorNumber=18456,Class=14,State=1,
The solution to this issue is to restart the SQL server once we give access to the user created. To restart the server, right-click the server name and select restart.
@@rohanthekanath5901 Restarting SQL Server did not help.
million thanks
please post on steaming data and challenges
problems with linking :
Open MSSMS
Select Security.
Then, navigate to Logins and search for your user.
Right-click on the user and select Properties.
Reset the password
Click on the Server Roles tab.
Choose Public and Sysadmin roles.
Click OK to apply the changes.
This process should resolve the linking issues.
Your comment save my time. Tq very much
hi im unable to connect on prem sql server error make sure sql db firewall alows integration to acess
please help wasted whole day no solution
i encountered this. I checked the trust certificate button
I have get many clairification about azure data engineering in this channel. Thax sir 😀
Hello bro, its a really good explanation. I'm stuck at 15:13 it says "cannot connect to SQL server Database. Please conatact SQL server team further support. Make sure SQL database firewall allows the integration runtime to access". Where can I see if the SQL database can give access to integration runtime. Please help me in resolving this. TIA
Were you able to solve this? I can't find a solution..
I get the same error, how do you resolve it?
are you using sql authentication or windows authentication?
@@seusbeats534
@@josel2719 did you guys got the solution. If yes please share! Thanks in advance
did you guys got the solution. If yes please share! Thanks in advance
Hi Mate, I am getting the below error while installing the SHIR, Could you please help
Error:- Detail Error: {{Error [1]. }}Service '[2]' ([3]) failed to start. Verify that you have sufficient privileges to start system services. 1920 Integration Runtime Service DIAHostService Suggestion: Please use the manual setup to install the Integration Runtime (Self-hosted).
Where we can get data base script to use that in our laptops? I'm doing this project along side you.
Million Thanks
Hi all, where to get the data from, kindly help
Ironically, if i have never use MS SQL server before or any type of SQL database. i wouldn't really know much about how you can fix this. He should really cover on how to install and add adventure database since it can be challenging if it is your first time touching on these subject.
My connection fails everytime i do connection test for onprem server its shows this error ( Cannot connect to SQL Database. Please contact SQL server team for further support. Server: ) please help sir!
Hello! I really appreciate this excellent tutorial. However, when I tried to test the connection for the on-prem SQL server as per at 15:15 of the video, I encountered an error with the code "failed to connect." Could you kindly assist me with this issue? Thank you in advance for your help!
Please use the same credentials in the SSMS and see if you can connect to the DB successfully or not- If you can do it, please check the secret values in the Key Vault, there might be something wrong there.
i faced the same probelm, my linked service is not connecting
did you solve the issue?
hey did you solve the issue?
@@myahya968
nice video, i m having trouble when i create newlinked service and tested but failed. i m not sure why is failing I trying windown authentication and sql authentication as well and both failed.
I'm facing the same issue
does we need to have sql server configure in azure before we connect in azure data factory please help
wow loved the explanation , to the point .Thanks for sharing the knowledge .
Thank you so much :)
where is datasets@@mr.ktalkstech
not able to access access policies please help
fantastic, thank you we're testing Azure DF potential for risk management. Thank you very much
I use Mac OS, this works there ?
i'm sorry to say this, but mac is not integrated with azure services
@@entertainmentandcricket4294 thanks for the reply and clarification
Where can I find the database?
I am planing to go for Data Engineer jobs, In this sample project I dont see any coding. Is it requred to be in this field ? can you please clarify this thanks in advance.
Coding is used in the Data Transformation part :)
I am using mac , is there any way to use Self hosted Integration Runtime ?as it shows .exe file if i download
It's not supported in mac os I believe, you may need to create a Azure VM and install the SHIR in the VM
Excellent! Thank you.
My pipe line is getting failed due to JRE missing . I tried installing JRE and when i use sink as parquet ,it is repeating the same error. Could you please help me
could you find a solution?
Add path in environment variables
HI SIR CAN U BRING MORE VIDEOS ON PROJECT
Sure, ll do :)
Not able to connect to linked service SQL on prem. Tried everything both SQL and Microsoft authentication. The user was created as you did in part 2 - mrk with db_reader access and created key vault secrets for username and password. I also gave ADF the right to use these secrets. I passed in the Databasename and Servername. Not able to connect throught SSMS also with the user mrk and the password created.
i solved this issue, by using microsoft authentication. I opened my commandprompt and wrote 'whoami' this gave me the user name for my current local user. I also wrote the password i use to login to my pc and that worked
@@MultiKalleman Can you tell the steps you performed precisely?
@@TambeAjinkya When creating the linked service in data factory I used Microsoft authentication. To find the username I opened my cmd (search for this in windows search bar) and wrote whoami. For the password I used my login to my pc user
Check Server Authentication mode on your SQL Server instance and change it to SQL Server and Windows Authentication mode.
Hi I got below error for parquet. It was successful for csv. Please give solution.
ErrorCode=JreNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Java Runtime Environment cannot be found on the Self-hosted Integration Runtime machine. It is required for parsing or writing to Parquet/ORC files. Make sure Java Runtime Environment has been installed on the Self-hosted Integration Runtime machine.,Source=Microsoft.DataTransfer.Common,''Type=System.DllNotFoundException,Message=Unable to load DLL 'jvm.dll': The specified module could not be found. (Exception from HRESULT: 0x8007007E),Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'
Did you find any solution?
Add jre path to environment variable
Install Java Runtime Environment (JRE) .
Set Up Environment Variables:
JAVA_HOME: Set this environment variable to the root directory of your JRE installation. For example, if you installed JRE in C:\Program Files\Java\jre1.8.0_291, set JAVA_HOME to this path.
PATH: Add the bin directory of your JRE installation to the PATH environment variable. This will typically be C:\Program Files\Java\jre1.8.0_291\bin.
Click on the Environment Variables button.
In the System variables section, click New to create a new environment variable for JAVA_HOME and set it to your JRE installation path.
Find the Path variable in the System variables section, select it, and click Edit. Add a new entry with the path to the bin directory of your JRE installation.
my test connection @ 15:19 to my on prem sql server is failing, i have created the key vault correctly. any idea why? please help
Same for me. Did u get any solution?
@@Akash_Lonkar yes after many trial and error I bumped into solution that should have been mentioned in the video. It was just the authentication type. Use sql authentication instead of default windows authentication to login on ssm. Hope that helps
Yes it works for me. After changing to sql authentication, i had to setup the user name and password in SSMS also. Thanks@@mosesakwagiobe4438
Hello, how do I make the connection as I am using windows authentication. I have been stuck here. Thanks
Hope you sorted this out as you are into other parts now?
@@mr.ktalkstech Yes, thanks. I used Windows authentication option alonside my windows password
@@drizzle1866 How did you do it? what do you mean? did you change the sql script password to you windows password?
Hi, I'm unable to connect to database. I wrote my login, password and so on. Only difference was that I didn't login with key vault. What might be also the issue?
What's the error you are getting? please use the same credentials in SSMS to connect to the database, if its working in SSMS then it should work in Azure as well. Otherwise, create a new user and try again by following the steps mentioned in the video. Hope it works for you :)
@@mr.ktalkstech thank you sir. I coped with this problem. The issue was to change windows authentication to SQL server authentication. I'm currently following your project further. Thank for your feed back
Sorry to bother you again. After I run a pipeline I get error:
ErrorCode=JreNotFound,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Java Runtime Environment cannot be found on the Self-hosted Integration Runtime machine. It is required for parsing or writing to Parquet/ORC files. Make sure Java Runtime Environment has been installed on the Self-hosted Integration Runtime what should I do in this situation?
This happens when the Java runtime is not installed in the machine. (happens only when you copy the data in parquet/orc format, if you try to copy in CSV or others it would work)
If you want to use the parquet format, then try to install the java runtime using the below link to your local machine and then give it a try after restarting your system.
www.java.com/en/download/manual.jsp
@@mr.ktalkstech thank you very much, it worked. I can't wait to get to another episodes!
Hello bro, Thank u for the amazing content. When i try to create access policy. I have this information displayed on the screen,"Access policies not available.
The access configuration for this key vault is set to role-based access control. To add or manage your access policies, go to the Access control (IAM) page."
Please help me on this bro. Thanks in advance
Did you create a Key Vault from scratch or using an existing one? I guess the key vault is configured as azure role based control.
Please go the Key Vault -- Access configuration (under settings) -- Permission model, and see if the vault access policy is enabled or not, I guess for you "azure role based" should be selected. If that's the case, then please try to switch it to vault access policy- then it should work.
Otherwise, I think you can still use the role based control, instead of giving the access policy (secret access) to the ADF- just give the same "Key Vault Admin" access to the ADF managed identity using IAM.
@@mr.ktalkstech Thank u bro, It worked. Than u so much.🙏
Hello bro, after changing from Azure RBAC to Vault based, I have lost the secrets. I am clueless now. Please help me out bro.Thank u
Seems weird, can you create the secret again?
or use the role based approach as I mentioned above, switch back from vault based to role based and after that using IAM assign the "Key Vault Admin" access to the ADF managed identity
@@mr.ktalkstech I did bro, It worked.
Useful Channel 😍
Thank you :)
I am stuck on 15:22 minute where the test connection fails, with code saying “Lofing failed for user ‘user2’. SqlErrorNumber=18456, Class=14, State=1
I tried to use chat got to look for some solutions but I can hardly understand anything (I am using SQL Server 2017)
Please help
I have checked and the the details provided are all correct but can’t understand the error solution
Did you fix it?
I M GETTING THIS ERROR DURING CREATION OF LINK SERVICE DO ANYBODY HAVE IDEA OF THIS
"Input Payload is invalid, validation result - '["Connector or activity name: SqlServer_LINK_S, connector or activity type: SqlServer, error: Calling partner RP EvaluatePolicyAsync returned an invalid status code 'Conflict', ReasonPhrase Conflict"]'
"
SqlErrorNumber=229,Class=14,State=5
:i got this error when loading LT.address under pipline-source ,resolve this by executing query in local sql server
USE [master]
EXEC master..sp_addsrvrolemember @loginame = N'sai', @rolename = N'sysadmin'
GO
:in login name (sai) give your usename ,
it is for updating user role to system admin