Finally, someone who has the good sense to present things simply and without jargon. I wonder, why does Microsoft give a different name to the same things? On the one hand, they seek to simplify and integrate everything, on the other hand, it seems that they are happy to shuffle the user with 3 or 4 different words for the same thing. Thank you so much.
You’re welcome Joāo, thanks for leaving a comment. Microsoft are notorious for poor naming choices and renaming things or picking clashing names for features and products.
Post Production Note: my comment re Merging Tables - as long as the "helper" tables aren't loaded then this is fine, it's only when merging tables (entities) that are also loaded in their own right that you need Premium.
"You've gotta refresh the dataflow, then refresh the dataset", looks sideways, "Hope you found that useful". Reminds me of Michael Scott speaking to the TV crew 😅 Really great intro that avoids all the jargon. I really don't understand why Microsoft has bad naming convention for arguably the same products/features.
Are there times where dataflows are not the best approach? Also in terms of speed, is this a slower process compared to running power query directly within the power bi file? Last question can Excel connect to a dataflow table? Thank you
They’re not always necessary. If you don’t need to re-use a table on multiple reports or your refresh against the data source isn’t slow then no real need. You’d just be adding an extra process and refresh step into your report update. Yes Excel can connect to dataflows.
Hi Wyn, are there limitation to how many files you can use in a dataflow. What I mean by this, I am saving numerous excel(xlsx format) files to SharePoint and the aim is to do my mashup or ETL process in the dataflow area and then merge the files as one table. We use Power BI Premium
There's no technical limit, the refresh just gets slower the more files you add. One approach is to have one dataflow that simply consolidates the files and then another that links to it and does any additional complex ETL
Hello, as I using dataflows more questions arise. If I have created a dataflow and then I am importing some tables to a power bi report, I have duplication of these data? Thank you in advance!
thanks Wyn, great video. but wondering is it possible to create a mySQL data flow. doesn't want to work for me. I'm trying to get a schedule refresh set up that isn't dependant on a personal gateway, i.e. works without my computer needing to be switched on
I want to ask one thing Let's say I have an existing dataflow that's using a sharepoint folder to grab data files Now I want to replicate this dataflow to grab files from another folder The files in the new folder have slightly different columns, I want to adjust the query. And then finally, I want to use an existing Power BI file to use the new data flow I already have my old visuals Can these visuals get updated with the new dataflow? Wherever there is a column error I can change it easily but will this approach work or would I need to start developing all the visuals from scratch? Thanks
Edit your existing Power Query to point to the new dataflow. Your visuals & measures however will break if you rename the columns that are used in them. So you’d then need to go into each visual to fix them up
@@AccessAnalytic Thanks a ton! You are a Godsend. One final question, the relationship model will not break will it? I can edit the visuals to refer to new columns but the relationships should not break? There is a very minor difference between the new and the old data flow, just a few columns not present in the new source file while a few new columns coming in. So that's about it. And as we want to keep both the dataflow running we don't want to mess up the existing for the new and also create a new one
Thanks tadstar. To build a dataset you need clean tables of data, you may have multiple datasets that use the same cleaned up table, so rather than doing the clean up multiple times you can do it once in a dataflow. Another reason is you may have a slow source system, and you have multiple datasets feeding off that system. Building a dataflow that pulls the tables you need and then building datasets off those rather than direct off the source can speed up your dataset refreshes and take the load off your source systems. Hope that helps a bit
It’s clean table(s) that can be re-used in multiple reports and edited like any other data source on import. Good if you want to centralise some tables to be re-used, or you have a slow data source that you want to pull from once / occasionally and then your data model refreshes will be quicker.
@@AccessAnalytic "It’s clean table(s) that can be re-used in multiple reports and edited like any other data source on import." for this part i could have done it in power query on pbi desktop then publish it for everyone to use right?
Sharing the data model is generally for people to build visualisations in “thin” reports. Sharing tables allows people to build data models from a common source.
I don’t think so. You might like to look into Fabric ( currently in preview ) where you can write Python to cleanse data ready for Power BI to consume learn.microsoft.com/en-us/fabric/data-science/tutorial-data-science-explore-notebook?WT.mc_id=M365-MVP-5002589
@@Mukeshkumar-cr3yc Yes, it is possible. When creating the Dataflow, select Excel Workbook as the option. Only concern I can foresee is scheduled refreshes when the user is available (maybe leaves the company)
It wasn’t showed that the whole point is that the single dataset can use multiple data flows, so essentially model will consist of data flows only (or mostly).
How would this work if the power query pieces are built in Excel instead of power bi? Or is this a reason to use power bi over excel even if you don't use the reports/create dashboards (if only using to clean data/create worklists/export data that via VBA). This would reduce the need to either repeat myself in multiple tools or export cleaned data for other tools to read as it would refresh all the data in a schedule?
Finally, someone who has the good sense to present things simply and without jargon.
I wonder, why does Microsoft give a different name to the same things?
On the one hand, they seek to simplify and integrate everything, on the other hand, it seems that they are happy to shuffle the user with 3 or 4 different words for the same thing.
Thank you so much.
You’re welcome Joāo, thanks for leaving a comment. Microsoft are notorious for poor naming choices and renaming things or picking clashing names for features and products.
Post Production Note: my comment re Merging Tables - as long as the "helper" tables aren't loaded then this is fine, it's only when merging tables (entities) that are also loaded in their own right that you need Premium.
Thanks! It's straightforward to understand and follow.
I appreciate you taking the time to let me know you found it useful
Refresh data flow and then dataset is a helpful prompt. Thanks for the content.
You’re welcome
Thanks a lot. My best video so far on Dataflow. Subscribed sharp sharp😅
Thank you. Welcome to the channel
Soon as I saw your sharp sharp comment, I knew you was Nigerian lol. I love to see it!
"You've gotta refresh the dataflow, then refresh the dataset", looks sideways, "Hope you found that useful".
Reminds me of Michael Scott speaking to the TV crew 😅
Really great intro that avoids all the jargon. I really don't understand why Microsoft has bad naming convention for arguably the same products/features.
Thanks, naming things is not Microsoft’s strong point. At least they renamed dataflow entities to tables 😀
Awesome video Wyn, can you do a video of the limitations for us pro license folks😂. Source being SharePoint as well
Cheers. The only real limitation is no linked dataflows. So you can’t load a table and also connect to it with another query / dataflow.
Thank you very much for the good video!
You’re welcome
that is great, Wyn. thanks for sharing. I will use it in my job.
Great, thanks for letting me know David
GOD bless you !
Are there times where dataflows are not the best approach? Also in terms of speed, is this a slower process compared to running power query directly within the power bi file? Last question can Excel connect to a dataflow table? Thank you
They’re not always necessary. If you don’t need to re-use a table on multiple reports or your refresh against the data source isn’t slow then no real need. You’d just be adding an extra process and refresh step into your report update.
Yes Excel can connect to dataflows.
Diagram View looks great
Yeah, it will hopefully make it to desktop eventually
thank you
You’re welcome Ahmad
awesome and thank you for sharing
You’re welcome SivaKumar
Hi Wyn, are there limitation to how many files you can use in a dataflow. What I mean by this, I am saving numerous excel(xlsx format) files to SharePoint and the aim is to do my mashup or ETL process in the dataflow area and then merge the files as one table. We use Power BI Premium
There's no technical limit, the refresh just gets slower the more files you add. One approach is to have one dataflow that simply consolidates the files and then another that links to it and does any additional complex ETL
@@AccessAnalytic many thanks CC
Hello, as I using dataflows more questions arise. If I have created a dataflow and then I am importing some tables to a power bi report, I have duplication of these data? Thank you in advance!
Yes, you have a centralised table in the dataflow and then one or more datasets will pull copies in.
How can I avoid the duplication? Creating a composite model I suppose, but what are the restrictions?
@eleftheriakoniari3392 I don’t see a need to avoid the duplication. Think of the data model as an in-memory cache.
@@AccessAnalytic What I ment is if we have duplication of data in the workspace
Having data in a Dataflow and in datasets is normal best approach. The duplication is not a problem
thanks Wyn, great video. but wondering is it possible to create a mySQL data flow. doesn't want to work for me. I'm trying to get a schedule refresh set up that isn't dependant on a personal gateway, i.e. works without my computer needing to be switched on
Is the data source online? If on a network server or computer then a gateway is always needed
Hello! Thank you for this video! How many dataflows can we have per workspace, if the workspace is backed by a premium capacity?
Hi, I'm not aware of any limitations.
@@AccessAnalytic Thank you so much for your prompt reply!
I want to ask one thing
Let's say I have an existing dataflow that's using a sharepoint folder to grab data files
Now I want to replicate this dataflow to grab files from another folder
The files in the new folder have slightly different columns, I want to adjust the query.
And then finally, I want to use an existing Power BI file to use the new data flow
I already have my old visuals
Can these visuals get updated with the new dataflow?
Wherever there is a column error I can change it easily but will this approach work or would I need to start developing all the visuals from scratch?
Thanks
Edit your existing Power Query to point to the new dataflow. Your visuals & measures however will break if you rename the columns that are used in them. So you’d then need to go into each visual to fix them up
@@AccessAnalytic Thanks a ton! You are a Godsend. One final question, the relationship model will not break will it? I can edit the visuals to refer to new columns but the relationships should not break? There is a very minor difference between the new and the old data flow, just a few columns not present in the new source file while a few new columns coming in. So that's about it. And as we want to keep both the dataflow running we don't want to mess up the existing for the new and also create a new one
@@remuslupinhp as long as the relationship columns and table names don't change then should be fine
@@AccessAnalytic Super Thanks!
I still struggle to see why this is better or gives you anything different from a shared dataset.. well presented video though . Thanks
Thanks tadstar. To build a dataset you need clean tables of data, you may have multiple datasets that use the same cleaned up table, so rather than doing the clean up multiple times you can do it once in a dataflow.
Another reason is you may have a slow source system, and you have multiple datasets feeding off that system. Building a dataflow that pulls the tables you need and then building datasets off those rather than direct off the source can speed up your dataset refreshes and take the load off your source systems.
Hope that helps a bit
@@AccessAnalytic many thanks, that really does clear it up for me, I didn't consider shared clean tables .. brilliant explanation... 🙏
No worries
How is it different from doing it in power query then export it i to the service and everyone can use the dataset??
Is it like anyone can modify it?
It’s clean table(s) that can be re-used in multiple reports and edited like any other data source on import.
Good if you want to centralise some tables to be re-used, or you have a slow data source that you want to pull from once / occasionally and then your data model refreshes will be quicker.
@@AccessAnalytic "It’s clean table(s) that can be re-used in multiple reports and edited like any other data source on import." for this part i could have done it in power query on pbi desktop then publish it for everyone to use right?
Sharing the data model is generally for people to build visualisations in “thin” reports.
Sharing tables allows people to build data models from a common source.
Nice many thnx great video🎉
Is it possible to use Python in the dataflow to transform data? Thanks
I don’t think so. You might like to look into Fabric ( currently in preview ) where you can write Python to cleanse data ready for Power BI to consume learn.microsoft.com/en-us/fabric/data-science/tutorial-data-science-explore-notebook?WT.mc_id=M365-MVP-5002589
@@AccessAnalytic Ok Thanks! The problem of the Fabric it's the price...
@tiago5a - yep, it should eventually be around USD $200 per month for the cheapest version when it comes out of preview I think.
When you refresh a dataset that uses a dataflow as a source does it run queries against the database?
No, the dataflow stores the data (in csv files in the background). So you need to refresh the dataflow in addition to the dataset
@@AccessAnalytic I just did a test. No impact on database when dataset is refreshed which is what I wanted. Great.
Hi Wyn- Im late to the party on this one-- I dont have a Pro Licence, is thata reason why i cannot create a new dataflow?
Yep sadly true
@@AccessAnalytic thanks Wyn for the clarification
No worries
Hello Sir . I wish to connect an excel to a data flow. Please help.
Not possible yet.
@@AccessAnalytic If we have onedrive business account can we not add excel still? via upload file(preview) option
Hi Mukesh, I’m not sure what you mean sorry.
@@Mukeshkumar-cr3yc Yes, it is possible. When creating the Dataflow, select Excel Workbook as the option. Only concern I can foresee is scheduled refreshes when the user is available (maybe leaves the company)
It wasn’t showed that the whole point is that the single dataset can use multiple data flows, so essentially model will consist of data flows only (or mostly).
That’s definitely a common scenario Daniel.
How would this work if the power query pieces are built in Excel instead of power bi? Or is this a reason to use power bi over excel even if you don't use the reports/create dashboards (if only using to clean data/create worklists/export data that via VBA).
This would reduce the need to either repeat myself in multiple tools or export cleaned data for other tools to read as it would refresh all the data in a schedule?
@McIlravyInc - excel can connect to dataflows. So yes centralise and re-use in your Excel and Power BI reports