I really like the part where you show how to build a dynamic pipeline with parameter!! I find that many tutorial only show basic function but not in depth or scalable solution which is what is required in real world environment.
More questions from the live session that were not answered in the session. Q) Any Microsoft certification available that cover Microsoft Fabric? A) A new one has just been announced called DP-600: Implementing Analytics Solutions Using Microsoft Fabric! Stay tuned for content specifically around that in the future when it is generally avaliable!
A really fantastic session. Thank you. I have a question, and I don't know if it can be answered here. Is It possible to define a container to store my objects (Files, delta tables and others)? I understand that when you create your lakehouse, Microsoft provides you a specific location, but I would like to define my own area to store my objects. To be honest, I would like to use the concept of Medallion Architecture with landing zone, Bronze, Silver and Gold spaces. Thank you
You can absolutely define where you want to land your data in a Lakehouse. It's just a managed data lake so you still have the ability to create directories and subdirectories within the "Files" folder. I believe the best practice for leveraging medallion architecture in Fabric would be to have three separate workspaces for each layer. I would point you to Microsoft documentation for confirmation and their recommendations.
I was wondering if we could take a shortcut to a folder in the gen 2 instead of using the full Data pipeline, given that our data doesn't need any transformation before it goes into the data lake? Also, how can I access data frames specified in the notebook for Power BI visuals or reports, or notebooks are merely another data pipeline to ingest (aggregated or transformed) data to the lake house?
Yes. You could have made a shortcut to the folder in the external data lake but the data still would only be in their native format as stored in the data lake. If you want to use the delta format to truly build a Lakehouse with ACID properties, you would need to write them to the Lakehouse Tables. Notebooks can be used as another ETL tool in Fabric and can also be scheduled with a data factory pipeline as a part of a larger orchestration. While you can visualize data in a Notebook, I do not believe there is a way to access Spark Notebooks created in Fabric from Power BI reports at this time.
My company has some data on Azure SQL that refreshes once every 24 hours. I can write dataflow to bring selected rows into PowerBI. However I want to create some reusable summary tables for which I think Python would be great. Is it possible that in Lakehouse, 1- I create Dataflow that brings data from Azure SQL into a table 2 - Create scheduled pipeline that runs every 24 hours and runs the Dataflow that overwrite my table 3 - Use Pyspark or something similar to create the summary table 4 - Write that table to Lakehouse I am not sure how step 3 and 4 will be automated on schedule? And I am not sure if above is possible at all? Can you please help?
Thanks for the question! There could be several ways to handle what you're wanting to accomplish with steps 3 and 4. You could as stated use PySpark to write some sort of summary table to the Lakehouse or potentially a different Lakehouse that is used as a serving layer for analytics, aggregations, and summaries. You could also consider just using T-SQL DDL/DML in a Fabric Warehouse to copy data from the Lakehouse to the same type of serving layer in the Warehouse. Lakehouse does not support DDL/DML but the Warehouse does. A SQL view might serve a similar purpose though in the Lakehouse. As far as scheduling and automating this, you could use a Fabric Notebook inside of a pipeline as part of your orchestration, you could run also run a Script activity or Stored Procedure activity to automate the SQL portion of dropping the old table and repopulating it or just updating it with new data. Hopefully this gives you some direction of where and what you need to look for, if you have further questions let me know!
Hi, I have another question regarding running pipelines. Can we run a pipeline based on file modification or new files that arrived in our file folder? I want to run my pipeline if we have a new file or update an existing one. In this case, it is essential to run the pipeline just for new or updated files. Thank you
This is totally possible through using metadata properties. If the size of a file increases, if the last modified date of a file gets updated. Many different ways to handle this based on the ingestion of your data into your source.
Hi @austinlibal I dont know if you are still answering questions here. I have followed the video and doing every step with you. But when I came to the step were you go from get metadata -> filter -> to foreach1 and then invoke pipeline, when I click on run, I get this error: Activity failed because an inner activity failed; Inner activity name: Invoke pipeline1, Error: Operation on target Copy data1 failed: Lakehouse table name should only contain letters, numbers, and underscores. The name must also be no more than 256 characters long. I cant find any answers on how to solve this issue.
I want to learn MS Fabric, am I suppose to learn SQL and PBI before get in your MS fabric just want to check as I am coming from non technical background. If possible can you gude me with Road map.
If you want to get certified in Fabric (DP 600), you need to know quite a bit about Power BI including DAX, basic PySpark, and SQL. The following are NOT part of DP 600: Data Science and Real-Time Analytics.
Thanks for the feedback! I get excited when I talk data! Take advantage of the ability to slow down RUclips video to 0.75 speed and also turn on closed captioning and it might be a better experience for you!
I paused in between once in a while to sip some water for you , Austin :D Great session! Cheers! :)
😅
Good job, definitely enjoying, thanks a lot for sharing. blessings.
this is really commendable, thank you
Thank you
You're welcome!
Hope you enjoyed!
Awesome.
I really like the part where you show how to build a dynamic pipeline with parameter!!
I find that many tutorial only show basic function but not in depth or scalable solution which is what is required in real world environment.
Thanks! Glad you enjoyed!
More questions from the live session that were not answered in the session.
Q) Any Microsoft certification available that cover Microsoft Fabric?
A) A new one has just been announced called DP-600: Implementing Analytics Solutions Using Microsoft Fabric! Stay tuned for content specifically around that in the future when it is generally avaliable!
A really fantastic session. Thank you. I have a question, and I don't know if it can be answered here. Is It possible to define a container to store my objects (Files, delta tables and others)? I understand that when you create your lakehouse, Microsoft provides you a specific location, but I would like to define my own area to store my objects. To be honest, I would like to use the concept of Medallion Architecture with landing zone, Bronze, Silver and Gold spaces. Thank you
You can absolutely define where you want to land your data in a Lakehouse. It's just a managed data lake so you still have the ability to create directories and subdirectories within the "Files" folder. I believe the best practice for leveraging medallion architecture in Fabric would be to have three separate workspaces for each layer. I would point you to Microsoft documentation for confirmation and their recommendations.
I was wondering if we could take a shortcut to a folder in the gen 2 instead of using the full Data pipeline, given that our data doesn't need any transformation before it goes into the data lake? Also, how can I access data frames specified in the notebook for Power BI visuals or reports, or notebooks are merely another data pipeline to ingest (aggregated or transformed) data to the lake house?
Yes. You could have made a shortcut to the folder in the external data lake but the data still would only be in their native format as stored in the data lake. If you want to use the delta format to truly build a Lakehouse with ACID properties, you would need to write them to the Lakehouse Tables. Notebooks can be used as another ETL tool in Fabric and can also be scheduled with a data factory pipeline as a part of a larger orchestration. While you can visualize data in a Notebook, I do not believe there is a way to access Spark Notebooks created in Fabric from Power BI reports at this time.
Are you able to use autoloader within your Fabric notebook to connect to a OneLake folder?
My company has some data on Azure SQL that refreshes once every 24 hours.
I can write dataflow to bring selected rows into PowerBI.
However I want to create some reusable summary tables for which I think Python would be great.
Is it possible that in Lakehouse,
1- I create Dataflow that brings data from Azure SQL into a table
2 - Create scheduled pipeline that runs every 24 hours and runs the Dataflow that overwrite my table
3 - Use Pyspark or something similar to create the summary table
4 - Write that table to Lakehouse
I am not sure how step 3 and 4 will be automated on schedule? And I am not sure if above is possible at all?
Can you please help?
Thanks for the question!
There could be several ways to handle what you're wanting to accomplish with steps 3 and 4. You could as stated use PySpark to write some sort of summary table to the Lakehouse or potentially a different Lakehouse that is used as a serving layer for analytics, aggregations, and summaries. You could also consider just using T-SQL DDL/DML in a Fabric Warehouse to copy data from the Lakehouse to the same type of serving layer in the Warehouse. Lakehouse does not support DDL/DML but the Warehouse does. A SQL view might serve a similar purpose though in the Lakehouse.
As far as scheduling and automating this, you could use a Fabric Notebook inside of a pipeline as part of your orchestration, you could run also run a Script activity or Stored Procedure activity to automate the SQL portion of dropping the old table and repopulating it or just updating it with new data.
Hopefully this gives you some direction of where and what you need to look for, if you have further questions let me know!
Hi, I have another question regarding running pipelines. Can we run a pipeline based on file modification or new files that arrived in our file folder? I want to run my pipeline if we have a new file or update an existing one. In this case, it is essential to run the pipeline just for new or updated files. Thank you
This is totally possible through using metadata properties. If the size of a file increases, if the last modified date of a file gets updated. Many different ways to handle this based on the ingestion of your data into your source.
Hi @austinlibal I dont know if you are still answering questions here. I have followed the video and doing every step with you. But when I came to the step were you go from get metadata -> filter -> to foreach1 and then invoke pipeline, when I click on run, I get this error:
Activity failed because an inner activity failed; Inner activity name: Invoke pipeline1, Error: Operation on target Copy data1 failed: Lakehouse table name should only contain letters, numbers, and underscores. The name must also be no more than 256 characters long.
I cant find any answers on how to solve this issue.
Most likely it is in the parameterized Table name on the copy data activity within the child pipeline. 1:07:19
I want to learn MS Fabric, am I suppose to learn SQL and PBI before get in your MS fabric just want to check as I am coming from non technical background. If possible can you gude me with Road map.
If you want to get certified in Fabric (DP 600), you need to know quite a bit about Power BI including DAX, basic PySpark, and SQL.
The following are NOT part of DP 600: Data Science and Real-Time Analytics.
You speak very fast. Not every one is fluent in english.
Thanks for the feedback! I get excited when I talk data!
Take advantage of the ability to slow down RUclips video to 0.75 speed and also turn on closed captioning and it might be a better experience for you!
......