Azure Data Factory Beginner to Pro Tutorial [Full Course]

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024

Комментарии • 55

  • @andreikhveras169
    @andreikhveras169 2 года назад +3

    Thank you, Manuel.
    I really liked this into as it was clear and full of the useful content required for the beginner.
    The only thing that you suddenly jumped over is the difference between LookupActivity and Join.
    I went ahead and found the answer in the documentation really quick though, so thank you!

  • @cloudnineballoonsuk4059
    @cloudnineballoonsuk4059 Год назад +1

    Keep on sharing helpful content

  • @maselitics8184
    @maselitics8184 Год назад +18

    0:26 Intro to Data Factory
    2:50 Working with Azure Data Factory
    7:33 Create Resources
    21:16 Data Factory Navigation
    28:15 Data Factory Resources
    35:38 Create Linked Service
    47:37 Copy Activity Wizard
    1:05:29 Get Metadata Activity
    1:09:28 Break
    1:24:26 Break ends

  • @dataisfun4964
    @dataisfun4964 Год назад +1

    Beautiful, Thanks

  • @sivaiyer4017
    @sivaiyer4017 2 года назад +9

    Sharp, straight, quick and to the point. No blah blah... thank you

  • @thomasbernier655
    @thomasbernier655 Год назад

    Great Content! Thks!

  • @kingsadmin
    @kingsadmin Год назад

    fantastic! thank you!

  • @timamet
    @timamet 2 года назад +6

    Thank you it was awesome explanation and saved a lot of time to me to get onboarded with ADF

  • @prasenjitpaul3105
    @prasenjitpaul3105 Год назад +1

    good video thanks

  • @rajbh22
    @rajbh22 3 месяца назад +1

    tHAT WAS Awesome session. best for Data factory

  • @zobiakhan7531
    @zobiakhan7531 5 месяцев назад +1

    1:43:00 bm 1

  • @sebastiendebosscher
    @sebastiendebosscher 2 года назад +6

    Hi, when we talk about the data transformation part, this looks like an inferior expercience to Power Query. I've got the impression that Power Query is much more user friendly, and you can more easily build more complex transformations with it.
    Can we use Power Query in Azure Data Factory?
    What are the limitations?

    • @dataisfun4964
      @dataisfun4964 Год назад

      Power Query is still the best, for ADF the Dataflow has more support than PQ.

  • @kd6613
    @kd6613 Месяц назад

    🎯 Key points for quick navigation:
    00:00:00 *🖥️ Introduction and Setup*
    - Introduction to the course and explanation of technical issues,
    - Overview of what will be covered in the tutorial: Azure Data Factory, related services, and resources.
    00:01:25 *📅 Agenda Overview*
    - Breakdown of topics to be covered, including Azure Data Factory basics, resource hierarchy, integration runtimes, and potential coverage of Synapse pipelines.
    00:02:51 *🎓 Introduction to Azure Data Factory*
    - Explanation of Azure Data Factory as a resource in the Azure portal,
    - Overview of capabilities such as pipelines and data flows for ETL processes,
    00:05:23 *🏗️ Creating Azure Resources*
    - Detailed steps on creating a resource group and its role as an organizational container,
    - Overview of creating a storage account and enabling hierarchical name spacing for data lake storage.
    00:10:22 *💾 Storage Account Configuration*
    - Explanation of creating containers within the storage account,
    - Step-by-step guide on uploading files into the data lake for use in the tutorial.
    00:14:01 *🛢️ SQL Database Setup*
    - Instructions on creating a SQL server and SQL database within Azure,
    - Mention of configuration options and cost considerations, such as using the standard S0 provisioning.
    00:17:27 *🔧 Provisioning Azure Data Factory*
    - Steps to create and configure an Azure Data Factory instance,
    - Discussion on the relationship between Azure Data Factory and SSIS (SQL Server Integration Services),
    00:19:59 *🛠️ Initial Configuration and Setup in Azure Data Factory*
    - Discussing the initial setup process for Azure Data Factory, including network configuration options,
    - Mention of self-hosted integration runtime and its role in connecting on-premises and cloud resources.
    00:21:10 *🔄 Azure Data Factory Interface Overview*
    - Introduction to the Data Factory interface, highlighting its similarity to Synapse Analytics,
    - Explanation of the different hubs (Home, Author, Monitor, Manage) and their functions within the tool.
    00:28:29 *⚙️ Integration Runtimes and Their Importance*
    - Detailed discussion on the role of integration runtimes in managing compute resources,
    - Explanation of different types of integration runtimes: Azure, self-hosted, and SSIS package execution,
    00:33:05 *🔗 Creating Linked Services in Azure Data Factory*
    - Steps to create linked services that define connections to data stores,
    - Explanation of how to configure linked services for Azure SQL Database and Data Lake,
    00:40:36 *🔑 Configuring Authentication and Connectivity*
    - Explanation of how to authenticate and connect to Azure SQL Database,
    - Discussion on different authentication options, including managed identities and SQL authentication,
    00:47:11 *🌐 Creating and Using Linked Services*
    - Steps to create a linked service for Azure Data Lake Storage,
    - Explanation of linking storage accounts and how to navigate directories within Azure Data Lake,
    00:52:39 *🧱 Building a Simple Pipeline with the Copy Activity Wizard*
    - Step-by-step guide on selecting a source (SQL Database) and a destination (Data Lake),
    00:58:18 *🛠️ Customizing Data Sets in the Copy Activity*
    - Customizing and renaming data sets created by the Copy Activity Wizard for better clarity,
    - Importance of meaningful naming conventions for reusing data sets in other pipelines,
    01:00:53 *📝 Naming Conventions and Pipeline Creation*
    - Importance of meaningful naming conventions for datasets to improve clarity and reuse,
    - Example of a pipeline created and executed through the Copy Activity Wizard, showing the process from creation to final output in Azure Data Lake.
    01:05:24 *🔍 Exploring Azure Data Factory Activities*
    - Overview of various activities available in Azure Data Factory, including running Databricks notebooks and executing stored procedures,
    - Introduction to the Get Metadata activity for retrieving file and folder details from a data lake.
    01:07:45 *📊 Using Get Metadata Activity*
    - Step-by-step process of using the Get Metadata activity to extract file details like name, size, and last modified date from Azure Data Lake,
    - Explanation of configuring the dataset, previewing data, and selecting metadata fields for retrieval.
    01:13:01 *🚀 Running and Debugging Pipelines*
    - Explanation of how to run and debug a pipeline within the Azure Data Factory interface,
    - Viewing the outputs of a pipeline after running the Get Metadata activity to verify the retrieved metadata from the file in the data lake.- 01:00:53 📁 Creating a New Dataset for Metadata Extraction
    01:05:24 *📝 Configuring the Get Metadata Activity*
    - Implementing the Get Metadata activity to extract specific file properties,
    - Selecting desired metadata fields such as last modified date, size, and item name,
    01:09:22 *📊 Utilizing Retrieved Metadata Outputs*
    - Accessing and interpreting the output values from the Get Metadata activity,
    - Understanding how to reference output parameters for use in subsequent activities,
    01:12:07 *🔄 Integrating Stored Procedures with Pipelines*
    - Introducing the use of stored procedures to process retrieved metadata,
    - Configuring the Execute Stored Procedure activity within the pipeline,
    01:14:39 *⏱️ Scheduling and Executing Pipelines*
    - Discussing options for running pipelines manually or on a schedule,
    - Exploring the impact of pipeline activities on execution time and performance,
    01:33:58 *📄 Reviewing Metadata Extraction Results*
    - Checking the output from the Get Metadata activity, including last modified date, file size, and item name,
    - Explanation of how times in Azure Data Factory are in UTC, requiring potential adjustments.
    01:35:09 *🗂️ Storing Metadata in a SQL Table*
    - Creating a SQL table and stored procedure to log metadata,
    - Introduction to using Azure Data Factory's expression language for dynamic data manipulation,
    01:39:12 *🛠️ Debugging and Executing the Stored Procedure Activity*
    - Running and debugging the pipeline with the stored procedure activity,
    - Discussion on the importance of correct formatting for expressions and handling errors in the pipeline,
    01:47:17 *🔍 Introducing the Lookup Activity*
    - Transitioning from using a stored procedure to a lookup activity to fetch data from a control table,
    - Explanation of how the Lookup activity works and its role in comparing metadata,
    01:53:35 *🛠️ Setting Up the Control Table and Stored Procedure*
    - Creating a control table in SQL Server to store execution details,
    - Adding a stored procedure to retrieve the last execution date,
    01:58:10 *🔄 Implementing the Lookup Activity*
    - Adding a Lookup activity to retrieve the last execution date from the control table,
    - Utilizing the output from the Get Metadata activity as a parameter for the Lookup activity,
    02:00:28 *🤔 Handling Time Zone Discrepancies*
    - Identifying the issue of comparing UTC time from Get Metadata with local time from the control table,
    - Planning to adjust the time in the Get Metadata output to match the local time format for accurate comparison.
    02:03:08 *🧠 Creating the If Condition Activity*
    - Introducing the If Condition activity to compare the last modified date and the last execution date,
    - Writing the expression logic to handle time zone conversions and evaluate the condition correctly,
    02:09:13 *📝 Updating the Control Table After Execution*
    - Implementing a stored procedure to update the control table with the current execution time,
    - Discussing the importance of updating the control table to ensure accurate future comparisons,
    02:14:54 *⏳ Managing Timing Issues in Pipeline Execution*
    - Addressing potential issues with UTC time and pipeline execution timing,
    - Exploring the use of `pipeline trigger time` as an alternative to UTC now,
    02:18:12 *✅ Validating Pipeline Execution Logic*
    - Running the pipeline to validate if the execution follows the correct logical path (true/false) based on time comparison,
    - Successfully testing the `if condition` to ensure correct flow of the pipeline when conditions are met.
    02:20:04 *🏗️ Introducing Data Flow and Its Debugging Features*
    - Overview of creating a data flow in Azure Data Factory,
    02:23:52 *🔄 Setting Up Data Sources and Schema Adjustments*
    - Uploading and setting up data sources for the data flow,
    02:35:39 *🧹 Cleaning Up and Adjusting Data After a Lookup*
    - Addressing duplication issues after performing a lookup,
    02:38:10 *🗑️ Filtering Out Unnecessary Data*
    - Implementing a filter transform to remove rows with no list price,
    02:41:14 *📁 Writing Data to a Destination*
    - Creating a new dataset on the fly to write the transformed data to a destination,
    02:43:08 *🛠️ Executing a Data Flow via Pipeline*
    - Explaining how to run a data flow using a pipeline since direct execution isn't possible from the data flow UI,
    02:45:01 *⏲️ Scheduling and Triggering Pipelines*
    - Exploring the different scheduling options available for triggering pipelines, including schedule, tumbling windows, storage events, and custom events,
    Made with HARPA AI

  • @emmagoodonephilip2203
    @emmagoodonephilip2203 Год назад +1

    Please, I would like to start learning Microsoft azure is it okay if I start here with this course for an absolute beginner? If not could you tell me what course I will start with? Thanks

  • @jodida10
    @jodida10 10 дней назад

    How did you get the username and password?

  • @ChicoDavison
    @ChicoDavison 2 года назад +2

    Hola Manuel, Thank you very much for your time and your knowledge, it really is a very good Workshop, I have a question, at the end of the tutorial to record the results in the Destination Path, it records by parts and does not recognize the .csv name, it also records a _success. In the copy wizard there was not this problem, maybe you have an idea about it

  • @dirceudr65
    @dirceudr65 2 года назад +3

    Fantastic . It is gonna help me a lot on getting my DP-203

  • @rafael6693
    @rafael6693 2 года назад +2

    Should we use azure synapse or data factory this days?

  • @emmanuelokohe
    @emmanuelokohe 7 месяцев назад

    u are good

  • @sidd3586
    @sidd3586 3 месяца назад

    The resource material link doesn't work. Please check

  • @shourovnath9377
    @shourovnath9377 5 месяцев назад +1

    you are great ....Thanks

  • @jamieashton660
    @jamieashton660 10 месяцев назад +1

    Excellent intro, thank you. A great place to start if you've spent a lot of time doing SSIS and want to get into ADF.

    • @PragmaticWorks
      @PragmaticWorks  10 месяцев назад

      Glad you enjoyed the video Jamie, thank you for learning with us!

  • @rogerboessen7773
    @rogerboessen7773 8 месяцев назад

    Hi, Great video for learning purposes. That's for sure.... Only during the creation of the Storage Account -> Azure came back with the message that said: Something went wrong, this is located at the name I want to create and the location of the server (Europe). And even after picking different names for the storage account. It doesn't take it. Has this something to do with the subscription maybe? Please let me know to finish this class 👍

    • @austinlibal
      @austinlibal 6 месяцев назад

      storage account names need to be all alphanumeric characters with 3-24 characters and no spaces, dashes, or underscores.
      You might also be experiencing issues if a specific region is disallowing the creation of a certain resource.

  • @carlrodriguez1323
    @carlrodriguez1323 Год назад

    Great stuff. How did you find the username and password to create linked service. I found the username but can't find passwrod. Don't remember creating one either

  • @tonytilmon1525
    @tonytilmon1525 Год назад

    will you be covering about how to recover information loaded on chrome goggle that I accidently deleted.

  • @ahmedroberts4883
    @ahmedroberts4883 2 года назад +1

    Great Stuff. Thank you very much!

  • @samyboulos4642
    @samyboulos4642 Год назад +1

    Excelent intro to Data Factory. Thanks

  • @ZobiaKhan-mc1fs
    @ZobiaKhan-mc1fs 5 месяцев назад

    1:41:31bookmark

  • @abhinitkumar8274
    @abhinitkumar8274 2 года назад +2

    awesome

  • @beaddy101
    @beaddy101 2 года назад +1

    48:55

  • @quotesoflife7717
    @quotesoflife7717 Год назад +1

    thanks Manuel.
    nice explanation...

  • @przemeklelewski7608
    @przemeklelewski7608 Год назад

    Thank you. Very good examples, that showed me a short way how to start with ADF.

  • @m.fatihsirac844
    @m.fatihsirac844 8 месяцев назад

    amazing content!!! Made me feel confident on ADF

  • @USA2023HAAS
    @USA2023HAAS 2 года назад

    Thank you very much . As a beginner this is awesome vedio

  • @michaels1813
    @michaels1813 Год назад

    This is a very good tutorial. Thank You!

  • @edwardstorey5525
    @edwardstorey5525 Год назад

    Go foresttttþt

  • @bhavikapadidala7756
    @bhavikapadidala7756 Год назад

    Great Video..Thanks

  • @temesgenalemu5621
    @temesgenalemu5621 Год назад

    very help ful

  • @muhammadazam8422
    @muhammadazam8422 Месяц назад

    Hi,
    How to improve the complexity or increase in data volume?
    Kindly lead or advise me below mention queries(Link, Documents, website).
    1- Develops and maintains scale able data pipelines and builds out new API integrations to support continuing increases in data volume and complexity.
    2- Collaborates with analytics and business teams to improve data models that feed business intelligence tools, increasing data accessibility and fostering data-driven decision making across the organisation.
    3- Implements processes and systems to monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it.
    4- Writes unit/integration tests, contributes to engineering wiki, and documents work.
    5- Performs data analysis required to troubleshoot data related issues and assist in the resolution of data issues.
    6- Works closely with a team of front end and back end engineers, product managers, and analysts.
    7- Designs data integrations and data quality framework.
    8- Designs and evaluates open source and vendor tools for data lineage.
    9- Works closely with all business units and engineering teams to develop strategy for long term data platform architecture.
    Best Regards,
    MA@Pakistan