0:26 Intro to Data Factory 2:50 Working with Azure Data Factory 7:33 Create Resources 21:16 Data Factory Navigation 28:15 Data Factory Resources 35:38 Create Linked Service 47:37 Copy Activity Wizard 1:05:29 Get Metadata Activity 1:09:28 Break 1:24:26 Break ends
🎯 Key points for quick navigation: 00:00:00 *🖥️ Introduction and Setup* - Introduction to the course and explanation of technical issues, - Overview of what will be covered in the tutorial: Azure Data Factory, related services, and resources. 00:01:25 *📅 Agenda Overview* - Breakdown of topics to be covered, including Azure Data Factory basics, resource hierarchy, integration runtimes, and potential coverage of Synapse pipelines. 00:02:51 *🎓 Introduction to Azure Data Factory* - Explanation of Azure Data Factory as a resource in the Azure portal, - Overview of capabilities such as pipelines and data flows for ETL processes, 00:05:23 *🏗️ Creating Azure Resources* - Detailed steps on creating a resource group and its role as an organizational container, - Overview of creating a storage account and enabling hierarchical name spacing for data lake storage. 00:10:22 *💾 Storage Account Configuration* - Explanation of creating containers within the storage account, - Step-by-step guide on uploading files into the data lake for use in the tutorial. 00:14:01 *🛢️ SQL Database Setup* - Instructions on creating a SQL server and SQL database within Azure, - Mention of configuration options and cost considerations, such as using the standard S0 provisioning. 00:17:27 *🔧 Provisioning Azure Data Factory* - Steps to create and configure an Azure Data Factory instance, - Discussion on the relationship between Azure Data Factory and SSIS (SQL Server Integration Services), 00:19:59 *🛠️ Initial Configuration and Setup in Azure Data Factory* - Discussing the initial setup process for Azure Data Factory, including network configuration options, - Mention of self-hosted integration runtime and its role in connecting on-premises and cloud resources. 00:21:10 *🔄 Azure Data Factory Interface Overview* - Introduction to the Data Factory interface, highlighting its similarity to Synapse Analytics, - Explanation of the different hubs (Home, Author, Monitor, Manage) and their functions within the tool. 00:28:29 *⚙️ Integration Runtimes and Their Importance* - Detailed discussion on the role of integration runtimes in managing compute resources, - Explanation of different types of integration runtimes: Azure, self-hosted, and SSIS package execution, 00:33:05 *🔗 Creating Linked Services in Azure Data Factory* - Steps to create linked services that define connections to data stores, - Explanation of how to configure linked services for Azure SQL Database and Data Lake, 00:40:36 *🔑 Configuring Authentication and Connectivity* - Explanation of how to authenticate and connect to Azure SQL Database, - Discussion on different authentication options, including managed identities and SQL authentication, 00:47:11 *🌐 Creating and Using Linked Services* - Steps to create a linked service for Azure Data Lake Storage, - Explanation of linking storage accounts and how to navigate directories within Azure Data Lake, 00:52:39 *🧱 Building a Simple Pipeline with the Copy Activity Wizard* - Step-by-step guide on selecting a source (SQL Database) and a destination (Data Lake), 00:58:18 *🛠️ Customizing Data Sets in the Copy Activity* - Customizing and renaming data sets created by the Copy Activity Wizard for better clarity, - Importance of meaningful naming conventions for reusing data sets in other pipelines, 01:00:53 *📝 Naming Conventions and Pipeline Creation* - Importance of meaningful naming conventions for datasets to improve clarity and reuse, - Example of a pipeline created and executed through the Copy Activity Wizard, showing the process from creation to final output in Azure Data Lake. 01:05:24 *🔍 Exploring Azure Data Factory Activities* - Overview of various activities available in Azure Data Factory, including running Databricks notebooks and executing stored procedures, - Introduction to the Get Metadata activity for retrieving file and folder details from a data lake. 01:07:45 *📊 Using Get Metadata Activity* - Step-by-step process of using the Get Metadata activity to extract file details like name, size, and last modified date from Azure Data Lake, - Explanation of configuring the dataset, previewing data, and selecting metadata fields for retrieval. 01:13:01 *🚀 Running and Debugging Pipelines* - Explanation of how to run and debug a pipeline within the Azure Data Factory interface, - Viewing the outputs of a pipeline after running the Get Metadata activity to verify the retrieved metadata from the file in the data lake.- 01:00:53 📁 Creating a New Dataset for Metadata Extraction 01:05:24 *📝 Configuring the Get Metadata Activity* - Implementing the Get Metadata activity to extract specific file properties, - Selecting desired metadata fields such as last modified date, size, and item name, 01:09:22 *📊 Utilizing Retrieved Metadata Outputs* - Accessing and interpreting the output values from the Get Metadata activity, - Understanding how to reference output parameters for use in subsequent activities, 01:12:07 *🔄 Integrating Stored Procedures with Pipelines* - Introducing the use of stored procedures to process retrieved metadata, - Configuring the Execute Stored Procedure activity within the pipeline, 01:14:39 *⏱️ Scheduling and Executing Pipelines* - Discussing options for running pipelines manually or on a schedule, - Exploring the impact of pipeline activities on execution time and performance, 01:33:58 *📄 Reviewing Metadata Extraction Results* - Checking the output from the Get Metadata activity, including last modified date, file size, and item name, - Explanation of how times in Azure Data Factory are in UTC, requiring potential adjustments. 01:35:09 *🗂️ Storing Metadata in a SQL Table* - Creating a SQL table and stored procedure to log metadata, - Introduction to using Azure Data Factory's expression language for dynamic data manipulation, 01:39:12 *🛠️ Debugging and Executing the Stored Procedure Activity* - Running and debugging the pipeline with the stored procedure activity, - Discussion on the importance of correct formatting for expressions and handling errors in the pipeline, 01:47:17 *🔍 Introducing the Lookup Activity* - Transitioning from using a stored procedure to a lookup activity to fetch data from a control table, - Explanation of how the Lookup activity works and its role in comparing metadata, 01:53:35 *🛠️ Setting Up the Control Table and Stored Procedure* - Creating a control table in SQL Server to store execution details, - Adding a stored procedure to retrieve the last execution date, 01:58:10 *🔄 Implementing the Lookup Activity* - Adding a Lookup activity to retrieve the last execution date from the control table, - Utilizing the output from the Get Metadata activity as a parameter for the Lookup activity, 02:00:28 *🤔 Handling Time Zone Discrepancies* - Identifying the issue of comparing UTC time from Get Metadata with local time from the control table, - Planning to adjust the time in the Get Metadata output to match the local time format for accurate comparison. 02:03:08 *🧠 Creating the If Condition Activity* - Introducing the If Condition activity to compare the last modified date and the last execution date, - Writing the expression logic to handle time zone conversions and evaluate the condition correctly, 02:09:13 *📝 Updating the Control Table After Execution* - Implementing a stored procedure to update the control table with the current execution time, - Discussing the importance of updating the control table to ensure accurate future comparisons, 02:14:54 *⏳ Managing Timing Issues in Pipeline Execution* - Addressing potential issues with UTC time and pipeline execution timing, - Exploring the use of `pipeline trigger time` as an alternative to UTC now, 02:18:12 *✅ Validating Pipeline Execution Logic* - Running the pipeline to validate if the execution follows the correct logical path (true/false) based on time comparison, - Successfully testing the `if condition` to ensure correct flow of the pipeline when conditions are met. 02:20:04 *🏗️ Introducing Data Flow and Its Debugging Features* - Overview of creating a data flow in Azure Data Factory, 02:23:52 *🔄 Setting Up Data Sources and Schema Adjustments* - Uploading and setting up data sources for the data flow, 02:35:39 *🧹 Cleaning Up and Adjusting Data After a Lookup* - Addressing duplication issues after performing a lookup, 02:38:10 *🗑️ Filtering Out Unnecessary Data* - Implementing a filter transform to remove rows with no list price, 02:41:14 *📁 Writing Data to a Destination* - Creating a new dataset on the fly to write the transformed data to a destination, 02:43:08 *🛠️ Executing a Data Flow via Pipeline* - Explaining how to run a data flow using a pipeline since direct execution isn't possible from the data flow UI, 02:45:01 *⏲️ Scheduling and Triggering Pipelines* - Exploring the different scheduling options available for triggering pipelines, including schedule, tumbling windows, storage events, and custom events, Made with HARPA AI
Thank you, Manuel. I really liked this into as it was clear and full of the useful content required for the beginner. The only thing that you suddenly jumped over is the difference between LookupActivity and Join. I went ahead and found the answer in the documentation really quick though, so thank you!
Hi, when we talk about the data transformation part, this looks like an inferior expercience to Power Query. I've got the impression that Power Query is much more user friendly, and you can more easily build more complex transformations with it. Can we use Power Query in Azure Data Factory? What are the limitations?
Hi, Great video for learning purposes. That's for sure.... Only during the creation of the Storage Account -> Azure came back with the message that said: Something went wrong, this is located at the name I want to create and the location of the server (Europe). And even after picking different names for the storage account. It doesn't take it. Has this something to do with the subscription maybe? Please let me know to finish this class 👍
storage account names need to be all alphanumeric characters with 3-24 characters and no spaces, dashes, or underscores. You might also be experiencing issues if a specific region is disallowing the creation of a certain resource.
Hola Manuel, Thank you very much for your time and your knowledge, it really is a very good Workshop, I have a question, at the end of the tutorial to record the results in the Destination Path, it records by parts and does not recognize the .csv name, it also records a _success. In the copy wizard there was not this problem, maybe you have an idea about it
Great stuff. How did you find the username and password to create linked service. I found the username but can't find passwrod. Don't remember creating one either
Please, I would like to start learning Microsoft azure is it okay if I start here with this course for an absolute beginner? If not could you tell me what course I will start with? Thanks
Hi, How to improve the complexity or increase in data volume? Kindly lead or advise me below mention queries(Link, Documents, website). 1- Develops and maintains scale able data pipelines and builds out new API integrations to support continuing increases in data volume and complexity. 2- Collaborates with analytics and business teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organisation. 3- Implements processes and systems to monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it. 4- Writes unit/integration tests, contributes to engineering wiki, and documents work. 5- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues. 6- Works closely with a team of front end and back end engineers, product managers, and analysts. 7- Designs data integrations and data quality framework. 8- Designs and evaluates open source and vendor tools for data lineage. 9- Works closely with all business units and engineering teams to develop strategy for long term data platform architecture. Best Regards, MA@Pakistan
Sharp, straight, quick and to the point. No blah blah... thank you
Thank you!
0:26 Intro to Data Factory
2:50 Working with Azure Data Factory
7:33 Create Resources
21:16 Data Factory Navigation
28:15 Data Factory Resources
35:38 Create Linked Service
47:37 Copy Activity Wizard
1:05:29 Get Metadata Activity
1:09:28 Break
1:24:26 Break ends
🎯 Key points for quick navigation:
00:00:00 *🖥️ Introduction and Setup*
- Introduction to the course and explanation of technical issues,
- Overview of what will be covered in the tutorial: Azure Data Factory, related services, and resources.
00:01:25 *📅 Agenda Overview*
- Breakdown of topics to be covered, including Azure Data Factory basics, resource hierarchy, integration runtimes, and potential coverage of Synapse pipelines.
00:02:51 *🎓 Introduction to Azure Data Factory*
- Explanation of Azure Data Factory as a resource in the Azure portal,
- Overview of capabilities such as pipelines and data flows for ETL processes,
00:05:23 *🏗️ Creating Azure Resources*
- Detailed steps on creating a resource group and its role as an organizational container,
- Overview of creating a storage account and enabling hierarchical name spacing for data lake storage.
00:10:22 *💾 Storage Account Configuration*
- Explanation of creating containers within the storage account,
- Step-by-step guide on uploading files into the data lake for use in the tutorial.
00:14:01 *🛢️ SQL Database Setup*
- Instructions on creating a SQL server and SQL database within Azure,
- Mention of configuration options and cost considerations, such as using the standard S0 provisioning.
00:17:27 *🔧 Provisioning Azure Data Factory*
- Steps to create and configure an Azure Data Factory instance,
- Discussion on the relationship between Azure Data Factory and SSIS (SQL Server Integration Services),
00:19:59 *🛠️ Initial Configuration and Setup in Azure Data Factory*
- Discussing the initial setup process for Azure Data Factory, including network configuration options,
- Mention of self-hosted integration runtime and its role in connecting on-premises and cloud resources.
00:21:10 *🔄 Azure Data Factory Interface Overview*
- Introduction to the Data Factory interface, highlighting its similarity to Synapse Analytics,
- Explanation of the different hubs (Home, Author, Monitor, Manage) and their functions within the tool.
00:28:29 *⚙️ Integration Runtimes and Their Importance*
- Detailed discussion on the role of integration runtimes in managing compute resources,
- Explanation of different types of integration runtimes: Azure, self-hosted, and SSIS package execution,
00:33:05 *🔗 Creating Linked Services in Azure Data Factory*
- Steps to create linked services that define connections to data stores,
- Explanation of how to configure linked services for Azure SQL Database and Data Lake,
00:40:36 *🔑 Configuring Authentication and Connectivity*
- Explanation of how to authenticate and connect to Azure SQL Database,
- Discussion on different authentication options, including managed identities and SQL authentication,
00:47:11 *🌐 Creating and Using Linked Services*
- Steps to create a linked service for Azure Data Lake Storage,
- Explanation of linking storage accounts and how to navigate directories within Azure Data Lake,
00:52:39 *🧱 Building a Simple Pipeline with the Copy Activity Wizard*
- Step-by-step guide on selecting a source (SQL Database) and a destination (Data Lake),
00:58:18 *🛠️ Customizing Data Sets in the Copy Activity*
- Customizing and renaming data sets created by the Copy Activity Wizard for better clarity,
- Importance of meaningful naming conventions for reusing data sets in other pipelines,
01:00:53 *📝 Naming Conventions and Pipeline Creation*
- Importance of meaningful naming conventions for datasets to improve clarity and reuse,
- Example of a pipeline created and executed through the Copy Activity Wizard, showing the process from creation to final output in Azure Data Lake.
01:05:24 *🔍 Exploring Azure Data Factory Activities*
- Overview of various activities available in Azure Data Factory, including running Databricks notebooks and executing stored procedures,
- Introduction to the Get Metadata activity for retrieving file and folder details from a data lake.
01:07:45 *📊 Using Get Metadata Activity*
- Step-by-step process of using the Get Metadata activity to extract file details like name, size, and last modified date from Azure Data Lake,
- Explanation of configuring the dataset, previewing data, and selecting metadata fields for retrieval.
01:13:01 *🚀 Running and Debugging Pipelines*
- Explanation of how to run and debug a pipeline within the Azure Data Factory interface,
- Viewing the outputs of a pipeline after running the Get Metadata activity to verify the retrieved metadata from the file in the data lake.- 01:00:53 📁 Creating a New Dataset for Metadata Extraction
01:05:24 *📝 Configuring the Get Metadata Activity*
- Implementing the Get Metadata activity to extract specific file properties,
- Selecting desired metadata fields such as last modified date, size, and item name,
01:09:22 *📊 Utilizing Retrieved Metadata Outputs*
- Accessing and interpreting the output values from the Get Metadata activity,
- Understanding how to reference output parameters for use in subsequent activities,
01:12:07 *🔄 Integrating Stored Procedures with Pipelines*
- Introducing the use of stored procedures to process retrieved metadata,
- Configuring the Execute Stored Procedure activity within the pipeline,
01:14:39 *⏱️ Scheduling and Executing Pipelines*
- Discussing options for running pipelines manually or on a schedule,
- Exploring the impact of pipeline activities on execution time and performance,
01:33:58 *📄 Reviewing Metadata Extraction Results*
- Checking the output from the Get Metadata activity, including last modified date, file size, and item name,
- Explanation of how times in Azure Data Factory are in UTC, requiring potential adjustments.
01:35:09 *🗂️ Storing Metadata in a SQL Table*
- Creating a SQL table and stored procedure to log metadata,
- Introduction to using Azure Data Factory's expression language for dynamic data manipulation,
01:39:12 *🛠️ Debugging and Executing the Stored Procedure Activity*
- Running and debugging the pipeline with the stored procedure activity,
- Discussion on the importance of correct formatting for expressions and handling errors in the pipeline,
01:47:17 *🔍 Introducing the Lookup Activity*
- Transitioning from using a stored procedure to a lookup activity to fetch data from a control table,
- Explanation of how the Lookup activity works and its role in comparing metadata,
01:53:35 *🛠️ Setting Up the Control Table and Stored Procedure*
- Creating a control table in SQL Server to store execution details,
- Adding a stored procedure to retrieve the last execution date,
01:58:10 *🔄 Implementing the Lookup Activity*
- Adding a Lookup activity to retrieve the last execution date from the control table,
- Utilizing the output from the Get Metadata activity as a parameter for the Lookup activity,
02:00:28 *🤔 Handling Time Zone Discrepancies*
- Identifying the issue of comparing UTC time from Get Metadata with local time from the control table,
- Planning to adjust the time in the Get Metadata output to match the local time format for accurate comparison.
02:03:08 *🧠 Creating the If Condition Activity*
- Introducing the If Condition activity to compare the last modified date and the last execution date,
- Writing the expression logic to handle time zone conversions and evaluate the condition correctly,
02:09:13 *📝 Updating the Control Table After Execution*
- Implementing a stored procedure to update the control table with the current execution time,
- Discussing the importance of updating the control table to ensure accurate future comparisons,
02:14:54 *⏳ Managing Timing Issues in Pipeline Execution*
- Addressing potential issues with UTC time and pipeline execution timing,
- Exploring the use of `pipeline trigger time` as an alternative to UTC now,
02:18:12 *✅ Validating Pipeline Execution Logic*
- Running the pipeline to validate if the execution follows the correct logical path (true/false) based on time comparison,
- Successfully testing the `if condition` to ensure correct flow of the pipeline when conditions are met.
02:20:04 *🏗️ Introducing Data Flow and Its Debugging Features*
- Overview of creating a data flow in Azure Data Factory,
02:23:52 *🔄 Setting Up Data Sources and Schema Adjustments*
- Uploading and setting up data sources for the data flow,
02:35:39 *🧹 Cleaning Up and Adjusting Data After a Lookup*
- Addressing duplication issues after performing a lookup,
02:38:10 *🗑️ Filtering Out Unnecessary Data*
- Implementing a filter transform to remove rows with no list price,
02:41:14 *📁 Writing Data to a Destination*
- Creating a new dataset on the fly to write the transformed data to a destination,
02:43:08 *🛠️ Executing a Data Flow via Pipeline*
- Explaining how to run a data flow using a pipeline since direct execution isn't possible from the data flow UI,
02:45:01 *⏲️ Scheduling and Triggering Pipelines*
- Exploring the different scheduling options available for triggering pipelines, including schedule, tumbling windows, storage events, and custom events,
Made with HARPA AI
Excellent intro, thank you. A great place to start if you've spent a lot of time doing SSIS and want to get into ADF.
Glad you enjoyed the video Jamie, thank you for learning with us!
Thank you it was awesome explanation and saved a lot of time to me to get onboarded with ADF
Great to hear!
Fantastic . It is gonna help me a lot on getting my DP-203
amazing content!!! Made me feel confident on ADF
Excelent intro to Data Factory. Thanks
Our pleasure! Thanks!
tHAT WAS Awesome session. best for Data factory
Thank you. Very good examples, that showed me a short way how to start with ADF.
thanks Manuel.
nice explanation...
nice
Glad it was helpful!
Beautiful, Thanks
Our pleasure!
This is a very good tutorial. Thank You!
Keep on sharing helpful content
Thank you! We sure will!
Great Video..Thanks
Thank you, Manuel.
I really liked this into as it was clear and full of the useful content required for the beginner.
The only thing that you suddenly jumped over is the difference between LookupActivity and Join.
I went ahead and found the answer in the documentation really quick though, so thank you!
Great Stuff. Thank you very much!
Glad it was helpful!
Should we use azure synapse or data factory this days?
Great Content! Thks!
Glad you liked it! Thanks!
you are great ....Thanks
Thank you very much . As a beginner this is awesome vedio
You are welcome 😊
fantastic! thank you!
Hi, when we talk about the data transformation part, this looks like an inferior expercience to Power Query. I've got the impression that Power Query is much more user friendly, and you can more easily build more complex transformations with it.
Can we use Power Query in Azure Data Factory?
What are the limitations?
Power Query is still the best, for ADF the Dataflow has more support than PQ.
Hi, Great video for learning purposes. That's for sure.... Only during the creation of the Storage Account -> Azure came back with the message that said: Something went wrong, this is located at the name I want to create and the location of the server (Europe). And even after picking different names for the storage account. It doesn't take it. Has this something to do with the subscription maybe? Please let me know to finish this class 👍
storage account names need to be all alphanumeric characters with 3-24 characters and no spaces, dashes, or underscores.
You might also be experiencing issues if a specific region is disallowing the creation of a certain resource.
good video thanks
Thank you!
very help ful
1:43:00 bm 1
Hola Manuel, Thank you very much for your time and your knowledge, it really is a very good Workshop, I have a question, at the end of the tutorial to record the results in the Destination Path, it records by parts and does not recognize the .csv name, it also records a _success. In the copy wizard there was not this problem, maybe you have an idea about it
u are good
will you be covering about how to recover information loaded on chrome goggle that I accidently deleted.
awesome
Thanks for watching!
The resource material link doesn't work. Please check
Should be good now!
Great stuff. How did you find the username and password to create linked service. I found the username but can't find passwrod. Don't remember creating one either
Please, I would like to start learning Microsoft azure is it okay if I start here with this course for an absolute beginner? If not could you tell me what course I will start with? Thanks
How did you get the username and password?
1:41:31bookmark
48:55
Hi,
How to improve the complexity or increase in data volume?
Kindly lead or advise me below mention queries(Link, Documents, website).
1- Develops and maintains scale able data pipelines and builds out new API integrations to support continuing increases in data volume and complexity.
2- Collaborates with analytics and business teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organisation.
3- Implements processes and systems to monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
4- Writes unit/integration tests, contributes to engineering wiki, and documents work.
5- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
6- Works closely with a team of front end and back end engineers, product managers, and analysts.
7- Designs data integrations and data quality framework.
8- Designs and evaluates open source and vendor tools for data lineage.
9- Works closely with all business units and engineering teams to develop strategy for long term data platform architecture.
Best Regards,
MA@Pakistan
Go foresttttþt