Azure Data Factory Mapping Data Flows Tutorial | Build ETL visual way!
HTML-код
- Опубликовано: 28 май 2024
- With Azure Data Factory Mapping Data Flow, you can create fast and scalable on-demand transformations by using visual user interface. In just minutes you can leverage power of Spark with not a single line of code written.
In this episode I give you introduction to what Mapping Data Flow for Data Factory is and how can it solve your day to day ETL challenges. In a short demo I will consume data from blob storage, transform movie data, aggregate it and save multiple outputs back to blob storage.
Sample code and data: github.com/MarczakIO/azure4ev...
Next steps for you after watching the video
1. Azure Data Factory introduction video
- • Azure Data Factory Tut...
2. Check mapping data flow documentation
- docs.microsoft.com/en-us/azur...
3. Helpful tips and samples
- github.com/kromerm/adfdataflo...
Want to connect?
- Blog marczak.io/
- Twitter / marczakio
- Facebook / marczakio
- LinkedIn / adam-marczak
- Site azure4everyone.com - Наука
Adam, I have been watching many of your videos. As someone new to Azure, i find your videos immensely valuable. Keep up your great work, really appreciate!
Awesome, thank you!
So nice of your talent explaining the data flow in simple way. Thank you so much Mr.Adam.
Just discovered the channel. Your material is hight quality. It's excellent work. I will go watch more. Thank you Adam !
Thank you. This means much :)
Hello Adam, pls let me know how to connect to dynamic crm .. Pls send detail to pradysg@gmail.com
I just find your videos while searching for ADF tutorials in youtube. The materials are fantastic and really helping me to learn. Thank you so much!!
Happy to help! :)
I love you, Adam!
I have been struggling with using expression builder in Data Flow. I can't seem to figure out how to write the code. This video just made it look less complex. I'll be devoting more time to it.
Thanks you so much Adam. I was able to crack an interview with the help of your videos. I prepared notes according to your explanation & 3 hrs before the interview i watched your videos again it helped me alot
Fantastic!
-1979 and ,12
This is why complex logic is needed. Nice tutorial :)
Great video! Most videos seem to focus mostly on the evertisement material straight from Azure. At best they show you the very dumb step of copying data from a file to DB.
This is the first video I saw where you actually show how you can do something useful with the data and close to real life scenario.
Thank you.
Excellent presentation of ADF Data Mapping...we love to see more Data Factory ETL videos. Thank you Adam!
Thanks, feel free to check my ADF playlist.
Outstanding !You just made Azure easy to learn. Thank you.
Awesome, thank you!
ADF is but just one part of about 100 significant tools and actions in Azure. :-(
Hi Adam, is it possible to create these pipelines as code as well? Or somehow create them from my actual Azure pipeline? It would be sheerly insane (but it is a Microsoft product) to require and maintain two pipeline one that’s yiur Azure pipeline for CI and CD and one for the ADF. I really would want the Azure pipeline to be able to fill/create the ADF pipeline. But I haven’t found anything yet.
Adam, Thanks for this excellent video. You explained almost every feature available there in data flows. Looking forward a video on Azure SQL DWH. I know it will be great to learn about it from you.
Glad it was helpful! I'm just waiting for new UI to come to public preview then the video will be done :)
Another awesome video. The best part of Mapping Data Flow was the Optimization...where we could do Partitioning.
Thank you! Glad you like it :)
Thank you so much for this. I subscribed immediately, very informative and straightforward azure info. Will definitely recommend your channel. Keep up the great work!
Awesome, thank you!
Awesome video. I've seen a lot of site & videos and they are so complicated, but all yours are very crystal and anyone can understand.
Thanks Omar :)
Very crisp and clear information, I watched many videos but Adam's contents are awesome!! Thanks dear!! All the best for future good work!!
Thank you so much 🙂
The best video about Azure Data Flows I can find. Thank you Adam!
Wow, thanks! :)
I am new to data and ETL stuff but your video's are too good. Excellent examples and very clear explanation so anyone can understand. Thanks very much.
Thank you, always happy to help!
Very well explained and demonstrated. Really helpful to get started with Data flows.
Your way of explaining is outstanding, after watching it feel like Azure is very easy to learn. kindly keep sharing good videos Thank You..
Thanks a ton :)
It must be very challenging to do all this thing in English for you I imagine, Adam! Congratulations for pushing through despite the difficulty. 🙂
Impeccable to know reg Mapping Data Flow, Thanks Adam!
My pleasure!
@ Work I'm having to build out a Data Mart with no training on my own. You are literally saving my hide with your videos. THANK YOU!
Glad to help! :)
Very good explaining the Data Flow. Thanks Mr.Adam.
Thank you so much Adam! this was very clear and great video and a big help for my interview and knowledge.
Very welcome! Thanks for stopping by :)
Nice one Adam. Cool one. Keep doing fabulous videos always fella.
Many THanks.
Hello Adam, I just finished this video. Very well done indeed. Thanks and regards. Bharat
Thanks Bharat :)
Features are very interesting. want to try with the different partitioning techniques. Thank you for sharing such amazing stuff
My pleasure! Thanks! :)
These videos are great. Helping me so much! Thanks Adam
Glad you like them!
Your videos are very informative and practical oriented. Keep doing .
Thank you, I will!
Awesome videos Adam, your videos are great help to learn Azure. Keep it up :)
Thanks, will do!
As useful another Awesome video Adam !!!. Excellent. It was to the POINT !!!. Keep up the good work which you have been doing for plenty of users like me. Eagerly waiting for more similar videos like this from you !!!.
Can you please have some videos for Azure Search ...
Thank you so very much :) Azure Search is on the list but there is so many news coming from Ignite that I might need to change the order. Let's see all the news :).
Your channel is totally underrated, man
Thank you Adam Dzienkuje, this is a great tutorial.
Your video content is awesome!!! Your video is very useful to understand Azure concept specially for me who just started Azure journey.
I would like to have one video where we can see how to deploy code from Dev to QA to Prod. How to handle connection string, parameter etc during deployment.
thanks again for wonderful video content.
ADF CI/CD is definitely on the list. It's a bit complex topic to get it right so it might take time to prepare proper content around this. Thanks for watching and suggesting ;)
Nice video Adam. Professional as always
Wow, thanks!
very well done on explaining principles of mapping data flows!!!
Thanks a lot!
Your videos are really great and helped me understand lot of concepts of Azure. Can you please make one using SSIS package and show how to use that within Azure Data Factory
So helpful! Thank you very much Adam!
This was explained very well. thank you.
You're very welcome!
excellent explanation with simple scenario. Thank you.
Glad it was helpful!
Thank you, Adam. As always, you rock.
I would say this is the best content I've seen so far!! Thank you so much for making it Adam!
Just wondering, is there a Crtl+Z or Crtl+Y command in case we did some changes in the dataflow and restore it to previous version?
Awesome, thanks! Unfortunately not, but you can use versioning in the data factory which will allow you to revert to previous version in case you broke something. Highly recommended. Unfortunately not reverts for specific actions.
@@AdamMarczakYT Excellent!! Thank you so much for your reply!
@@549srikanth I publish each time I create a significant new step in the pipeline and I use data preview before moving on to the next step. Also, you can , I think, export the code version of the entire pipeline. Presumably you can, then, paste that into a new Pipeline to resurrect your previous version.
Hi Adam, Thank for helping us in learning new technologies. You are awesome 👌🏻👌🏻👌🏻👏👏.
My pleasure!
Great video! Thanks Adam!
My pleasure!
I really like your tutorials. I have been looking for a "table partition switching" tutorial but haven't found any good ones. May be you could do one for us? I am sure it'll be very popular as there aren't any good ones out there and it is an important topic in certifications :-)
Wow ! Fantastic explanation.
Glad you liked it!
that was actually not so hard. thanks man, you're awesome.
No! you are awesome! :)
👍 Its amazing , Practical implementation of Data Flow.
Great! You are the best Adam.
Adam, great tutorial! Kudos!
Glad you liked it!
Thanks Adam !! very informative video.Liked it a lot..
Thanks and you are most welcome! Glad you hear it.
Brilliant way of explanation
Subscribed to your channel
Thank you, appreciated 🙏
Amazing Video, we want other parts !
Adam, Your content is always easy to grab, excellent work mate. Could you please explain how to create a pipeline which has a copy activity followed by a mapping data flow activity.
Thanks, just drag and drop copy activity and data flow blocks on the pipeline and drag a line from copy to data flow activity.
Hello Adam, thanks a bunch for this excellent video. The tutorial was very thorough and anyone new can easily follow. I do have a question though. I am trying to replicate an SQL query into the Data Flow, however, I have had no luck so far.
The query is as follows:
Select ZipCode, State
From table
Where State in ('AZ', 'AL', 'AK', 'AR', 'CO', 'CA', 'CT'...... LIST OF 50 STATES);
I tried using Filter, Conditional Split and Exists transforms, but could not achieve the desired result. Being new to the Cloud Platform, I am having a bit of trouble.
Might I request you please cover topics like Data Subsetting/Filtering (WHERE and IN Clauses etc.) in your tutorials.
Appreciate your time and help in putting together these practical implementations.
Wow,I like your video, I did it today. and I had good result. thanks for your good explanation.
Great job! Thanks!
very good explanation Adam. keep it up.
Thanks, will do!
@@AdamMarczakYT Adam do we have trail version of Azure for Learning purpose?
best video on azure I have ever seen❤❤
Thanks buddy ...Great work
My pleasure
Very useful. Thank you so much.
Glad it was helpful!
Great job! Thanks for all
Thank you too!
Excellent tutorials
Thank you Adam.
Appreciate you content. Thanks.
My pleasure! :)
Adam, excellent presentation of ADF concept. I find all your videos really helpful in understanding the ADF concept. One question in regards to the sink dataset in dataflow, how can I create dynamic folder in my blob storage based on the year, month and day when this dataflow was triggered?
Depends on what do you want to achieve. You can either set partitioning by date column which will split the data by date. Or if you want to put entire dataset in one folder using date then use formatDateTime expression like formatDateTime(utcNOw(), "yyyy/MM/dd") as path.
thanks for the great content!! you are the man :)
I appreciate that!
Thanks for the informative and detailed video adam, 😊👌. Your content is practical. Can you make a video on how load the data from Oracle table to azure data factory? It would be helpful for audiences.
Thanks! That's the plan :)
best tutorial ever... 💪🏻💪🏻💪🏻
Very nice tutorial 👍
Thank you! Cheers!
Features are very interesting and thanks for your clear explanation , could you explain bit more reusable data flows , I mean to use same data flow for multiple tables/files like reusable pipelines.
Thanks for suggestion, I will try to make video on data flow parametrization in the future. Thanks for watching :)
Love these videos so easy to understand, do you have a video on new XML connector
Great, thanks! Not yet, maybe in near future :)
Good explanation there.
Wow..lucid explanation..
Glad you think so!
Thanks for such good video
nice & detailed video.
Thank you!
Amazing videos.
Glad you think so! :)
Hi Adam, that's a great tutorial, many thanks for it. I have a question that can we write the transformation functions in different language like Python or R instead of Scala? If yes can you please share some details on it?
Unfortunately not right now :( If you need those then use Azure Databricks instead.
Or a python/R script in a batch process, right? Databricks would be better option of you need spark, since its also more expensive than batch
Adam, great video.I m new to Data Flow and I have one doubt, I want to implement File level checks in Data Flow but not able to do it. All tasks are performing data level checks like exist or conditional split. Is it possible to implement File level check like whether file exist or not in storage account?
For anyone wondering how to make the year check (or any check) in the second step more robust, you can exchange the following expressions using the 'case' expression as used below which says, if this expression evaluates as true, do this, else do something else.
Worth nothing here that in the first expression, there is only a true expression provided while the second expression has both true and false directives. As per the documentation on the 'case' expression: "If the number of inputs are even, the other is defaulted to NULL for last condition."
/* Year column expression */
/* If the title contains a year, extract the year, else set to Null */
case(regexMatch(title, '([0-9]{4})'),toInteger(trim(right(title, 6), '()')))
/* title column expression*/
/* If the title contains a year, strip the year from the title, else leave the title alone */
case(regexMatch(title, '([0-9]{4})'),toString(left(title, length(title)-7)), title)
Thanks Paul :) I used as simple example as possible for people who aren't fluent in scala but of course you always need to cover all possible scenarios. Sometimes I like to fail the transformation rather than continue with fallback logic as I expect some values to be present.
@@AdamMarczakYT Of course, I just wanted to see if I could take it a step further to align more closely with what would be needed in a production data engineering scenario and thought others may have the same idea. Thanks for the content! :)
Thanks, I bet people will appreciate this :)
Excellent content!! I just have one doubt, where I can find documentation about the functions of scala? I do not know anything about it. Just have subscribed your channel! Thanks a lot!!
Tank you Erick! I think the list of available functions are standard for Spark. I never checked if they all match but you can find them here spark.apache.org/docs/2.4.4/api/sql/
Adam, FOr using transformation do I need to learn scala. Or just refer the documentation you specified for scala functions and write the transformation?
Documentation should be enough. MDF is targeting simple transformations so in most cases documentation alone will suffice.
Great video.
Question: Under "New Datasets", is there a capability to drop data into Snowflake? I see S3, Redshift, etc.
I appreciate the video and feedback!
Lovely bro!!
Thanks 🔥
Thanks for the nice video. Do you know if there is anyway to connect to Dynamics F&O or D635/CE yet to use Data Flow as my source?
Hey, thanks for watching! Not that I'm aware of, I would assume this would be the case for Logic App to fetch data to Blob Storage and then trigger ADF processing.
docs.microsoft.com/en-us/azure/connectors/connectors-create-api-crmonline
very good explanation..keep doig
Thanks. Will do.
Video is excellent. I want to know the problem statement which Data flow is solving?
Please also explain how to use data analytics in pipeline flow.
This is best content . Thank u so much
Thanks you. As to your question can you elaborate on data analytics part? What exactly would you like to see.
Anything like how to make function or procedure and how to use this in pipeline to execute .basic flow of pipeline by using analytics.
Hey, I do plan to have implementation videos like this in future. Although pipeline of videos is long so I can promise anything right now. I added this to the list of potential topics :) thanks!
okay, Thanks
Great video's, thank you! Just a question; what is the difference between a storage account and a data lake? Costs and type of saved data?
Hi, thanks! You might be interested in checking out my video on Data lake ruclips.net/video/2uSkjBEwwq0/видео.html it goes into great detail on all the differences.
Great video thank you
Thanks again! :)
Hi Adam, so glad I found your channel. Your videos were a big help for achieving the AZ900 certificate. Now I am studying a lot to uplift my knowledge and get the Azure data engineer certificate. However, I have an important question! Data flows are expensive, sometimes clients don’t want to use this, are there alternatives to achieve the same result in azure data factory? Thank you very much!
Well you can't have the cookie and eat the cookie :) In my opinion it's not that expensive compared to other available tools.
@@AdamMarczakYT True! I am currently struggling with csv files that sometimes have extra spaces after the words in the header, this then gives error when doing a copy activity to Azure SQL Database. Do you have any idea to make my flow a bit more flexible so that it can deal with this? It needs some trimming in the header
I thought of doing a SELECT in a dataflow to then change to the correct header titles, but for this I need to know where the spaces will be in the future. So also not flexible.
Would you plan to make video for introduction of each transforamtion components? Thanks
Hi Adam, Thanks for making this videos, very clear and concise. I have a question (sorry not related to this video) regarding Conditional split - Can the output stream activities, run in parallel ?
They typically run in parallel as it's Apache Spark behind the scenes.
@@AdamMarczakYT Thank you !
Awesome again.
Glad you think so!
Outstanding.
Thanks Mike ;)
Hello Adam. Your video is impressive, as always, but I'm concerned about the source dataset. Question: Does the DataFlow activity only work if the datsource are connected to Azure SQL?
I tried using a previous dataset connected to the local server, but this dataset does not appear on the
Source settings / Source options / Source dataset in
DataFlow activity option. I tried with New option and it is only enabled to select the AZURE dataset. All options in the database are disabled. So I couldn't create a data set for SQL Server neither.
Hey mapping data flows currently support 6 data services for both source and sink.
docs.microsoft.com/en-us/azure/data-factory/data-flow-source#supported-source-connectors-in-mapping-data-flow
I'd check if you can trick data flows by using Azure SQL connector to connect to on premise SQL server, but I never personally tried.
Hi Adam, just watched two of your videos on Azure Data Factory, nice work. Any chance you can do one on ADF using REST API as a data source with a JSON output, then store in a SQL Server sink?
Great suggestion! I'll add it to the list of potential topics. :) thanks for watching ;)
@@AdamMarczakYT Please, a tutorial on this would be amazing!
Adam, is there a way to preserve the filename and just have it change the extension? For instance, I'm adding a column with datetime, but at the end I would like it to have the same file name, just parquet. Is there a way to do that?
Use expressions :) That's what they are for.
@@AdamMarczakYT Sorry if it was a dumb question, I'm still new to ADF. Ignore if it's too inane but is fileanem in the @pipeline parameter? I found one online but couldn't get it to parse.
Thanks Adam!
My pleasure!
Thanks!
Great Video. Can you use data from a REST Api as a source for a Mapping Data Flow or does the source have to be a dataset on Azure?
Here is the list of supported data sources for MDF docs.microsoft.com/en-us/azure/data-factory/data-flow-source?WT.mc_id=AZ-MVP-5003556 . Just copy data from REST API to Blob and then start MDF pipeline using that blob path as a parameter.