I don't usually comment on youtube, but Adam I will make an exception for you. Your videos are easy to follow and educating, but most of all, straight to the point. The amount of time allocated to the video is perfect. Best wishes and please continue making training videos.
Adam you are the best! Like everything in your content (tone, pace, crisp, concise, no nonsense, straight to the point, good coverage, well thought through, list goes on...). I'm one of those who rarely post a public comment; here you go... doing it for you.
I don't know why I understand everything which you teach without even repeating the video twice. Its so much clear and to the point and especially demo part, starting from the background to the practical everything you present is just wow. God bless you !!
I spent days scouring the web for documentation and I was left out cold, then I thought I give RUclips a try and I found yours up on the top, and I can't be happier.. thank so so sooo much Adam!! Keep up the excellent work.. very easy to follow, simple, rich content.. loved it!!!
This is first time i am commenting anyone on a RUclips videos. I have seen thousands may be. "The best videos and so easy to understand" keep it up Adam
Great video as per usual, your channel is my go-to for preparing for Azure Certification Exams! I do have a question regarding the ADLS Gen 2 hierarchy, as one of my friends is preparing for the DP-900 exam (regarding a practice question). The question asks you to match 2 of the following 3 terms ([Azure Storage Account], [File share], [Container]) in the following hierarchy (only one answer is allowed): Azure Resource Group - [TERM 1] - [TERM 2] - Folders - Files The suggested correct answer is: Azure Resource Group - [Azure Storage Account] - [File Share] - Folders - Files But I don't see any reason why the following answer would not also be correct (besides maybe because containers being called file systems in ADLS Gen 2?): Azure Resource Group - [Azure Storage Account] - [Container] - Folders - Files What is your take on this? Thanks for taking your time!
Great explanation of many things together and also explaining the differences and linkage between ADL, ADB, PBI, etc. Thank you very much Adam for this one.
Hey Adam !! You are becoming one of the BEST AZURE AUTHORITIES / SME on the Internet. Keep up the good work. Thanks for sharing your knowledge in such a simple way. Kudos !!
Wow, thanks, that's a nice thing to say. I wouldn't say so since I'm just a trainer, but I love your enthusiasm and appreciation. Thank you kindly my friend :)
Wow such a clear and simple explanation about Data lakes. Absolutely awesome thank you Adam for your great efforts for the community... More power to you...👍👍
Your videos are wonderful, sir! would love to see an in-depth one on Azure Monitor, perhaps how services such as these (storage/blob/data lake) can tie into it. I find the variety of monitoring options a bit overwhelming without knowing which are worthwhile. Have a great day! (please let me know if I just missed an Azure Monitor video somewhere in here)
Please note that since the release of the video there were some changes made to the service. For instance an immutable storage feature is now in preview for ADLS :) azure.microsoft.com/fr-ca/updates/immutable-storage-for-azure-data-lake-storage-now-in-public-preview/?WT.mc_id=AZ-MVP-5003556
Fabulous Work. Just one think always try to make you videos from production point of views. And it would be great it upload few new videos on "Data Mapping Flows" on Delta Lake, and Databricks features such such Mounting, Caching Streaming operations
I'm working on improving my workflow. By end of 2020 I want to have new streaming PC with better setup which would allow me to more freely create videos and reduce time required to make them. When this happens I will be able to make more videos faster and MDF in ADF is surely a big topic of interest to me. :) Thanks for tuning in!
Hi Adam! Great video! 1 question: I have created an issue reporting, inspection and ideas apps on one of my team in Microsoft Teams. How do I export that data into Azure Data Lake?
Hi Adam thank you for the video, it was great. I have just one question in relation to Hadoop compatible access? This means that it can be connected with Hadoop, or it uses Hadoop every time it has some action inside the Data Lake . Thanks a lot once again
thank you for the videos! i am starting with databricks and it super clear! do you have some videos of delta lake databricks like merge things? it will be awesome to learn more about it!
Hi Adam , thanks for your valuable time to create this video. I faced a problem while performing Add Role Assignment step , I saw that Azure has removed AzureAD from "assign access to" drop down list, . Please suggest any other approaches to mount data lake . Appreciate your efforts. Thanks
For Databricks mounting: Please note that Azure version as of today will copy the secret ID and not the password itself if you hit copy at the end of the line just like Adam does. Copying the secret password seems only possible immediately after the creation of the secret by the copy button that appears right after the password. Took me some time to figure this out..
Hehe! Good catch! Microsoft updated UI and now new keys have two columns Value and ID. Both have copy a button. Just make sure to use the copy button in the Value column :) Thanks!
Interesante representación de contexto de como manejar un lago de datos que hoy tenemos y como estos los podemos transformar en información y prepararla para la inteligencia artificial sobre ellos.
4:52 ADLS Gen2 supports soft delete for blobs. When enabled, deleted blobs are retained for a specified period before permanent deletion1. However, soft delete for containers is not supported during the upgrade process2.
Thanks. Actually I already have few data factory videos (4) using blob storage and SQL, but blob and ADLS are so similar that if you would watch those and change connector to ADLS you wouldn't notice the difference. For API, what do you mean?
@@AdamMarczakYT Hello Friend. Something on PAAS services. Also plz plz plz plz give full demo on Azure Site Recovery. Migrating OnPrem Infra to Azure. Please.
When creating containers, how do I know exactly how many containers I should make? For example, if I'm creating 5 apps that are completely independent of each other and the apps save pictures that the users take to the storage account, should I have 5 containers (1 for each app)? Or 1 container to support all apps?
Hey Justin. There aren’t any specific limits scoped around containers so this is a design decision. There aren’t any specific guidelines so you should match what feels right for your organization and use case. But there are storage account level limits so those could be a deciding factor between one and many storage accounts, check those out in here docs.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits?WT.mc_id=AZ-MVP-5003556
Very good introduction video, thank you. A quick question, why using AccessKey to mount Azure Blob Storage, using Service Principle to mount Data Lake? How do I use AccessKey to mount Data Lake?
Good question! There is pretty much no difference, I just wanted to show both approaches, service principal is recommended but access key will work too. Check out how to use access key for data lake in the docs: docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html
@@AdamMarczakYT Thank you. Another favor, will you be able to give a quick demo about Databricks? My impression about Databricks is all about in memory processing, good candidate for data streaming. Do you have a demo about from EventHub or ServiceBus to Databricks?
Really nice and informative video. Can you provide me some context on meta data storage as well. Like if i store 1 TB of data in my ADLS, then how much meta data will be generated and stored? I am looking more towards cost as BLOB storage doesnt charge you for meta data.Looking forward for your reply
It looks like we don't necessarily need to use databricks, because ADF now support "Data Flows," which are a kind of no-code data transformation process. What are your thoughts on that? Is ADF a good substitution for Databricks (its actually using Databricks under the hood) for more advanced data transformation jobs?
hey Shawn, very good question. Since Microsoft removed Data Flow step which allowed you to input your own script blocks I'd say for advanced scenarios I would use databricks since I would want to have full control. Microsoft also removed ability to provide your own link service for dataflow which in return also means if you want to connect to data sources within your private networks then public integration runtimes will not be able to connect (it can however connect to firewall vnet protected resources using managed identity), nor will you be able to add custom libraries to your dataflow (again, you don't own cluster so you can't control this) hence again narrowing down some scenarios. Net net, my opinion is that general directory of Data Flow is for simple cloud transformation scenarios at this point in time.
Great video. One query. When you wrote back to the lake at the end of the demo, it was in partitions. How can we write back in a single file without partitions?
Good question! Since databricks is based on Spark and Spark is using Hadoop file system, it is normal behavior for files to be split into partitions. You can force single partition by using repartition or coalesce function with parameter of value '1'. If you wan't to skip entire folder with all hadoop parts then you can google for some scala/python scripts to do it. In general paritioning is good practice so merging is not recommended for bigger files as they will need to be loaded into memory which will cause issues with bigger data sets. Most other technologies like SQL DW (synapse), data factory are able to read from partitioned data sets just fine.
Adding to Adam's , You can write it into a single partition using coalesce/repartition and and then using os.path, delete the files that doesn't have a pattern *.csv/parquet and rename the file .
Nice explanation... I have one question ....if I upload file into folder created in data lake gen2 , that file will follow herarchiel file system or flat namespace system ?
Hey Adam, do you have input for the below 3 questions, ADLS + CDM topic. Best Daniel Question #1: am I right that CDM does not have restriction on how the CDM folders in ADLS should be organized (folders, subfolders), (data grouping, data isolation). As I understand it can have any folder structure but the manifest / cdm / model json-s have to be placed in it. I saw examples on docs.microsoft but can different logic be implemented as well ? Question #2: access (read / write ...) can be defined by using Azure Active Directory, Access Control Lists there is no additional feature if a folder is a CDM folder, so there is no difference in the possibilities if a ADLS folder is a CDM Folder or not ? Question #3: is it possible to automate the entity.cdm.json file generation ?
Hey Adam that's a great demo. I want to know how you can programmatically put files in a folder based on the date of the file if I have year-->month-->day subdirectory structure, and then use a search pattern to only choose files of a particular month of year during processing of data within the data lake. Any ideas on how?
With databricks you can simply run something like val year = 2019 val month = 12 df.save.json(s"/mnt/data/$year/$month/myfile.json") and to get the files val files = dbutils.fs.ls(s"/mnt/data/$year/$month") note I wrote this on my phone so there might be typos, but you get general principle
When you say date of the file , do you mean ,last modified date of file or a date column in file ? if its last modified date , You can use input_filename() function to get the last modified date of file in a new column of dataframe and then accordingly you can get year, month & day as new columns and finally when you write back , just use parttitionBy() with year , month & day.
Thanks for the video. I have two ADLS instances, dev and prod. The data is sourced from various systems to prod and then migrated to dev instance as well. Is there any service or tool available to compare all the folders and files between these two containers on dev and prod.
If I want to use blob storage to store some files with low costs, and delta lake storage to store other files in a structured directory, do I need to create two separate storage accounts?
Another way to put it is: can you use an azure analytics workspace to connect to a data lake and search it using the kusto query language or are these 2 different animals?
Thanks Omar! I do plan to do one but SQL DW changed to Synapse Analytics so I'm waiting until they roll out 'unified experience' update so they video won't get outdated in month or two.
I don't usually comment on youtube, but Adam I will make an exception for you. Your videos are easy to follow and educating, but most of all, straight to the point. The amount of time allocated to the video is perfect. Best wishes and please continue making training videos.
Wow! Thank you Fernando for such a heart-warming feedback. More videos are coming!
Adam you are the best! Like everything in your content (tone, pace, crisp, concise, no nonsense, straight to the point, good coverage, well thought through, list goes on...). I'm one of those who rarely post a public comment; here you go... doing it for you.
I don't know why I understand everything which you teach without even repeating the video twice. Its so much clear and to the point and especially demo part, starting from the background to the practical everything you present is just wow. God bless you !!
I have been watching a lot of Azure videos. This one is the best and I will study more of your catalog. Thanks!
Awesome, thank you!
This was crisp and you have covered everything that the beginner should know. Thanks a lot!
I spent days scouring the web for documentation and I was left out cold, then I thought I give RUclips a try and I found yours up on the top, and I can't be happier.. thank so so sooo much Adam!! Keep up the excellent work.. very easy to follow, simple, rich content.. loved it!!!
Awesome, thank you!
Each and every second you took is informative.
Great learning from you.
I appreciate that!
This is first time i am commenting anyone on a RUclips videos. I have seen thousands may be.
"The best videos and so easy to understand" keep it up Adam
Your demos are fantastic. Love that you go step by step and don't skip things. Much appreciated.
Thanks so much! :)
This is excelent. I'm preparing DP-200 and 201, and your videos have a lot of information concentrated, summarized and explained very simply. Thanks!
Glad it was helpful!
Excellent video. Taught with simplicity and clarity, without any noise.
Glad it was helpful!
Great video as per usual, your channel is my go-to for preparing for Azure Certification Exams!
I do have a question regarding the ADLS Gen 2 hierarchy, as one of my friends is preparing for the DP-900 exam (regarding a practice question).
The question asks you to match 2 of the following 3 terms ([Azure Storage Account], [File share], [Container]) in the following hierarchy (only one answer is allowed):
Azure Resource Group - [TERM 1] - [TERM 2] - Folders - Files
The suggested correct answer is:
Azure Resource Group - [Azure Storage Account] - [File Share] - Folders - Files
But I don't see any reason why the following answer would not also be correct (besides maybe because containers being called file systems in ADLS Gen 2?):
Azure Resource Group - [Azure Storage Account] - [Container] - Folders - Files
What is your take on this? Thanks for taking your time!
Thanks Adam's wonderful video, it's really easy to understand the ADLS!
Glad it was helpful!
This is such a great tutorial especially that you share all teh difference with normal storage account. Its was extremely helpful for me
I came to this video not expecting to learn much. I was wrong. Very useful.
Great explanation of many things together and also explaining the differences and linkage between ADL, ADB, PBI, etc. Thank you very much Adam for this one.
My pleasure! Thanks!
Your videos are very easy to follow. Many thanks for your effort to create all the azure videos.
Thanks :)
Hey Adam !! You are becoming one of the BEST AZURE AUTHORITIES / SME on the Internet. Keep up the good work. Thanks for sharing your knowledge in such a simple way. Kudos !!
Wow, thanks, that's a nice thing to say. I wouldn't say so since I'm just a trainer, but I love your enthusiasm and appreciation. Thank you kindly my friend :)
Adam you are a rock star. Your videos are extremelly well done. Thanks and keep up the good work!
Glad you like this! Thanks!
Your videos are really worth watching Adam , really thanks for the beautiful content 😁want many more videos from your side.Thanks in advance.
Awesome! Cheers!
this is insanely complete! wow.
Coooool! Thank you! :)
Wow such a clear and simple explanation about Data lakes. Absolutely awesome thank you Adam for your great efforts for the community... More power to you...👍👍
My pleasure! Thanks for watching ;)
Hey Adam!
That was very informative and clear explanation about data lake👏 .Thank u a lot
You are amazing Adam !! How can one know all these things, Azure, Power BI, ADF, Data Lake. You are genius. Thanks for knowledge sharing !
Haha! Thanks :D
Never seen this kind of KT video's to public. Thanks Adam for spoon feeding video to improve Azure knowledge.
My pleasure!
Unbelievable tutorial. Thank you so much for helping me to find everything I look for at one place.
excluding Power BI of course. I am a tableau fan =D
Awesome, thanks!! :D
Hehe, thanks alright, we all have our preferences :)
you make very simple and easy explained videos well done Adam!
I appreciate that!
Your content is gold, thanks a lot for making these videos
Your videos are wonderful, sir! would love to see an in-depth one on Azure Monitor, perhaps how services such as these (storage/blob/data lake) can tie into it. I find the variety of monitoring options a bit overwhelming without knowing which are worthwhile. Have a great day! (please let me know if I just missed an Azure Monitor video somewhere in here)
Great suggestion, I surely have monitor on thelist! Thanks for watching :)
Thanks for lessson, your videos are very helpful to me.
Hi Adam... Barvo. Excellent work. I recently watched few of your videos and they are absolutely fabulous... Thanks
Awesome, thank you!
What a great explantion with practical.. you are Star
Thank you! :)
Very nice video explained clearly the concept, thank you so much Me.Adam🙏.
Glad it was helpful!
Thanks for the contribution, Adam!
Thanks! It's my pleasure!
Awesome demonstration how to create and connect ADLS and running Scala code with databriks
Many thanks!
Please note that since the release of the video there were some changes made to the service. For instance an immutable storage feature is now in preview for ADLS :) azure.microsoft.com/fr-ca/updates/immutable-storage-for-azure-data-lake-storage-now-in-public-preview/?WT.mc_id=AZ-MVP-5003556
Fabulous Work. Just one think always try to make you videos from production point of views.
And it would be great it upload few new videos on "Data Mapping Flows" on Delta Lake, and Databricks features such such Mounting, Caching Streaming operations
I'm working on improving my workflow. By end of 2020 I want to have new streaming PC with better setup which would allow me to more freely create videos and reduce time required to make them. When this happens I will be able to make more videos faster and MDF in ADF is surely a big topic of interest to me. :) Thanks for tuning in!
@@AdamMarczakYT Bro Why you stop making videos?
Great demo. Thank you Adam
What a great video. Thanks Adam!
great video...can you please also make a video of how we can move the Microsoft Navision data to the data lake
Thanks! Unfortunately Microsoft Dynamics is not my field of expertise. Nav is pretty old system too so it's hard to find any useful examples :(
This tut was a blast! Thank you
Very good and comprehensive tutorial, thank you!
You are fantastic !!! Thanks for sharing valuable content.
I appreciate that!
I really really thank you!! Your video makes me the week!
You are so welcome!
Amazing Demo!! Many Thanks!!
Hi Adam! Great video! 1 question: I have created an issue reporting, inspection and ideas apps on one of my team in Microsoft Teams. How do I export that data into Azure Data Lake?
Hi Adam thank you for the video, it was great. I have just one question in relation to Hadoop compatible access? This means that it can be connected with Hadoop, or it uses Hadoop every time it has some action inside the Data Lake . Thanks a lot once again
thank you for the videos! i am starting with databricks and it super clear! do you have some videos of delta lake databricks like merge things? it will be awesome to learn more about it!
Glad to hear that! :)
Very good tutorial, very helpful. Thank you.
Glad you enjoyed it!
Its excellent video for ADLS to connect to data bricks and power BI
Thank you mate :)
Great video. Thanks Adam!
Fabulous tutorial.wish to see more like these. Informative content ✌️. I'll be using this knowledge in my project.Much needed video.Thanks
Awesome, thank you!
Excellent and very practical tutorial, thank you...
Hi Adam , thanks for your valuable time to create this video. I faced a problem while performing Add Role Assignment step , I saw that Azure has removed AzureAD from "assign access to" drop down list, . Please suggest any other approaches to mount data lake . Appreciate your efforts. Thanks
Fantastic and useful video. Thanks!
Adam your tutorials are amazing! is it possible to copy metadata and files from sharepoint and ingest into data lake using ADF?
It should be possible using REST api, but I would advise against it. This is what Logic Apps were created for. Thanks for watching!
For Databricks mounting: Please note that Azure version as of today will copy the secret ID and not the password itself if you hit copy at the end of the line just like Adam does. Copying the secret password seems only possible immediately after the creation of the secret by the copy button that appears right after the password. Took me some time to figure this out..
Hehe! Good catch! Microsoft updated UI and now new keys have two columns Value and ID. Both have copy a button. Just make sure to use the copy button in the Value column :) Thanks!
Interesante representación de contexto de como manejar un lago de datos que hoy tenemos y como estos los podemos transformar en información y prepararla para la inteligencia artificial sobre ellos.
Gracias
A quick correction, you should use select.write.csv(...) at 23:20, otherwise you would write all columns from original csv to the new csv file.
Ah a good eye indeed. Coincidentally I noticed this as well yesterday as I was conducting training on this very topic. Cheers 😀
Excellent video, exteremly clear and concise. Thank you!
Glad it was helpful!
4:52 ADLS Gen2 supports soft delete for blobs. When enabled, deleted blobs are retained for a specified period before permanent deletion1.
However, soft delete for containers is not supported during the upgrade process2.
Very well explained 👍🏻
amazon tutorial, you explain so well, thanks
Great Video. Your explanation is very nice and easy to understand. Thanks very much.
Glad to hear that, thanks! :)
Thank you! Extremely helpful video, and very informative. :)
Glad it was helpful!
Wonderful demo. Can you please give us demo on Datafactory and API as well please.
Thanks. Actually I already have few data factory videos (4) using blob storage and SQL, but blob and ADLS are so similar that if you would watch those and change connector to ADLS you wouldn't notice the difference. For API, what do you mean?
@@AdamMarczakYT Hello Friend. Something on PAAS services. Also plz plz plz plz give full demo on Azure Site Recovery. Migrating OnPrem Infra to Azure. Please.
Super Adam!...Good for Analytical usecases
Awesome!
When creating containers, how do I know exactly how many containers I should make? For example, if I'm creating 5 apps that are completely independent of each other and the apps save pictures that the users take to the storage account, should I have 5 containers (1 for each app)? Or 1 container to support all apps?
Hey Justin. There aren’t any specific limits scoped around containers so this is a design decision. There aren’t any specific guidelines so you should match what feels right for your organization and use case. But there are storage account level limits so those could be a deciding factor between one and many storage accounts, check those out in here docs.microsoft.com/en-us/azure/azure-resource-manager/management/azure-subscription-service-limits?WT.mc_id=AZ-MVP-5003556
Great work, as always!
Dzieki Michał!
Very good introduction video, thank you. A quick question, why using AccessKey to mount Azure Blob Storage, using Service Principle to mount Data Lake? How do I use AccessKey to mount Data Lake?
Good question! There is pretty much no difference, I just wanted to show both approaches, service principal is recommended but access key will work too.
Check out how to use access key for data lake in the docs: docs.databricks.com/data/data-sources/azure/azure-datalake-gen2.html
@@AdamMarczakYT Thank you. Another favor, will you be able to give a quick demo about Databricks? My impression about Databricks is all about in memory processing, good candidate for data streaming. Do you have a demo about from EventHub or ServiceBus to Databricks?
Check this out docs.microsoft.com/en-us/azure/azure-databricks/databricks-stream-from-eventhubs
I have only intro video on databricks as of now.
Really nice and informative video. Can you provide me some context on meta data storage as well. Like if i store 1 TB of data in my ADLS, then how much meta data will be generated and stored? I am looking more towards cost as BLOB storage doesnt charge you for meta data.Looking forward for your reply
It shouldn't be much, unless you use Delta Table which contain all history of changes for your tables. Thanks for watching.
At 10:30, I understand read and write, but how execute works here? What is execute permission in adls?
Love it, many thanks Adam!
My pleasure!
Great video, just subbed.
Can you make more videos on Azure Data Bricks calling multiple notebooks, making RDBMS calls, logging etc.,
I will want to make some series in the future on databricks. :)
Well Explained
Thanks!
Thank you for information.
You are welcome
Really awesome !
can u make video on service principle? with a sample demo
maybe :)
@@AdamMarczakYT thanks. Also ACL :-)
It looks like we don't necessarily need to use databricks, because ADF now support "Data Flows," which are a kind of no-code data transformation process. What are your thoughts on that? Is ADF a good substitution for Databricks (its actually using Databricks under the hood) for more advanced data transformation jobs?
hey Shawn, very good question. Since Microsoft removed Data Flow step which allowed you to input your own script blocks I'd say for advanced scenarios I would use databricks since I would want to have full control. Microsoft also removed ability to provide your own link service for dataflow which in return also means if you want to connect to data sources within your private networks then public integration runtimes will not be able to connect (it can however connect to firewall vnet protected resources using managed identity), nor will you be able to add custom libraries to your dataflow (again, you don't own cluster so you can't control this) hence again narrowing down some scenarios. Net net, my opinion is that general directory of Data Flow is for simple cloud transformation scenarios at this point in time.
Great video. One query. When you wrote back to the lake at the end of the demo, it was in partitions. How can we write back in a single file without partitions?
Good question! Since databricks is based on Spark and Spark is using Hadoop file system, it is normal behavior for files to be split into partitions. You can force single partition by using repartition or coalesce function with parameter of value '1'. If you wan't to skip entire folder with all hadoop parts then you can google for some scala/python scripts to do it. In general paritioning is good practice so merging is not recommended for bigger files as they will need to be loaded into memory which will cause issues with bigger data sets. Most other technologies like SQL DW (synapse), data factory are able to read from partitioned data sets just fine.
Adding to Adam's , You can write it into a single partition using coalesce/repartition and and then using os.path, delete the files that doesn't have a pattern *.csv/parquet and rename the file .
Love your videos Adam!
I appreciate that! Thanks!
Nice explanation... I have one question ....if I upload file into folder created in data lake gen2 , that file will follow herarchiel file system or flat namespace system ?
Thanks. Everything in ADLS is handled under hierarchical structure.
Good tutorial. Easy to understand.
Glad it was helpful! 😀
I really Haapy with your explanation and presentations..it helps me a lot
Glad to hear that :)
Very clear explanation. Thanks!
Thank you so much :)
Hi Adam
Please provide the ADF series from basic to advanced level it helpful for me
Fantastic video
Excellent Adam!
Thanks as always, you are very active :) nice to see that.
@@AdamMarczakYT Azure is Interesting + your videos are great as it has clear explanation with Demo.
Hey Adam, do you have input for the below 3 questions, ADLS + CDM topic. Best Daniel
Question #1: am I right that CDM does not have restriction on how the CDM folders in ADLS should be organized (folders, subfolders), (data grouping, data isolation). As I understand it can have any folder structure but the manifest / cdm / model json-s have to be placed in it. I saw examples on docs.microsoft but can different logic be implemented as well ?
Question #2: access (read / write ...) can be defined by using Azure Active Directory, Access Control Lists there is no additional feature if a folder is a CDM folder, so there is no difference in the possibilities if a ADLS folder is a CDM Folder or not ?
Question #3: is it possible to automate the entity.cdm.json file generation ?
CDM as in a Common Data Model?
@@AdamMarczakYT yes yes
Hey Adam that's a great demo. I want to know how you can programmatically put files in a folder based on the date of the file if I have year-->month-->day subdirectory structure, and then use a search pattern to only choose files of a particular month of year during processing of data within the data lake. Any ideas on how?
Hey thanks. What technology? scala in databricks?
@@AdamMarczakYT I'm able to do that with adf but I guess it won't be bad if you have a way of doing that in scala also.
With databricks you can simply run something like
val year = 2019
val month = 12
df.save.json(s"/mnt/data/$year/$month/myfile.json")
and to get the files
val files = dbutils.fs.ls(s"/mnt/data/$year/$month")
note I wrote this on my phone so there might be typos, but you get general principle
@@AdamMarczakYT Right on. Thanks Adam!!
When you say date of the file , do you mean ,last modified date of file or a date column in file ? if its last modified date , You can use input_filename() function to get the last modified date of file in a new column of dataframe and then accordingly you can get year, month & day as new columns and finally when you write back , just use parttitionBy() with year , month & day.
Thank you, very nice video
very good for beginners..Thanks to you.
Thanks and welcome! :)
Thanks for the video. I have two ADLS instances, dev and prod. The data is sourced from various systems to prod and then migrated to dev instance as well. Is there any service or tool available to compare all the folders and files between these two containers on dev and prod.
Not that I know of. You probably would need to write PowerShell script yourself for this and compare their MD5.
Good hands-on intro, thanks!
Thank you! :) Glad you enjoyed it Terry.
Excellent presentation.
Thanks :)
If I want to use blob storage to store some files with low costs, and delta lake storage to store other files in a structured directory, do I need to create two separate storage accounts?
Excellent. Thanks!
Glad it was helpful!
Can you use data lakes with Azure Sentinel?
I see other SIEMs boasting their data lake backends..
Can you explain what kind of scenario are we talking about? I didn't have exposure to many SIEM systems.
Another way to put it is: can you use an azure analytics workspace to connect to a data lake and search it using the kusto query language or are these 2 different animals?
Don't think that's possible. Here is the list of supported data sources for sentinel docs.microsoft.com/en-us/azure/sentinel/connect-data-sources
Adam please what's difference betweem Blob and Data Lake(Blob storage VS Data Lake)
This is covered in the video. Is it not clear? Do you have any specific questions?
Excelent video. Can you upload some video about Azure SQL Data Warehouse ?
Thanks Omar! I do plan to do one but SQL DW changed to Synapse Analytics so I'm waiting until they roll out 'unified experience' update so they video won't get outdated in month or two.
@@AdamMarczakYT Thanks for the info., then I will be waiting for the ´unified experience´ update. :)