It's all about Query folding. Sometimes you want to create a key in your tables in order to create a relationship. In dimension tables its ok to do this in PQ with merge columns operation even when query folding breaks, cause of the small number of rows. But when you do that in a fact table with 8.000.000 rows then it goes really slow in PQ. So I prefer doing that kind of merge columns for dimension tables with PQ and for fact tables with Dax.
Thanks for the post. I have been using Qlik for over 10 years and now learning Power BI. So, I'm trying to understand both DAX and Power Query. What you says makes a lot of sense to me - always push as much of the calculation burden back into the load stage of the process so that you have nice clean and complete data to work with when you start creating your reports and dashboards. Thanks again, Barnaby.
I'm barely less than 3 weeks learning POWER BI but I do have a real project that is already live using another tool and duplicating it with power BI for learning purposes. Based on my limited experience, I do all transformations and adding columns and calculations in power query. I do others in DAX and when I say others, I mean the simple ones that I just have to do to achieve the visualization format I want like groupings and sorting. So basically in my case, Visualization requirements drive my DAX usage.
Great topic, Ruth. This comment is not PQ/M-v-DAX per se, but a lesson learned (learning 🙂) about PQ/M. Early this year, knowing that I needed to learn PQ (before knowing there was M!) & Power Pivot (also unaware of DAX) & Power BI, I set my learning path as PQ in Excel, then PP in Excel, then PBI. I dove deep in PQ & took time to learn M as well. Glad I did, but I’m learning that I try to do too much in PQ/M. I built a monster Excel workbook. (In this case, “monster” means big & complex, but not bad.) It’s pretty awesome; it reduced 2-1/2 days of data analysis on a weekly cycle to ~30 minutes. (That’s total time, including human decisionmaking, not just crunching data.) But, I feel so sorry for my workbook b/c it works so hard. (The refresh often hangs & must be restarted.) I’m now adopting a new general philosophy to restrict use of PQ/M to ETL, i.e., use PQ/M to get the data from external sources into Excel, then use Excel, VBA, & PP/DAX to manipulate the data from there. We’ll see if that philosophy survives as I continue this learning journey, but I expect to feel less sorry for my workbooks with this approach. 🙂
Hi Jim, First of , Thanks for sharing your experience, it will help others! Now, I agree 100% with you, all ETL needs to be done (if possible ) in PQ and all calculations in DAX (again when possible). It is as bad to do a split in Dax as it is to do a rolling sum in power query. Now, I hope your excel files feels better soon and, most importantly, well done!! /Ruth
I agree with your points Ruth, but here are some reasons why I use DAX columns: Writing Time; When writing a column in PQ it wants to refresh the whole table (yes, I could write it just using a sample dev set) Auditing: having all the code in one place makes it easier. Moreover it is easier in DAX to see the formula of the column than ‘hidden’ in the PQ script. Refresh time; in Excel (as mentioned in other comment) the query folding doesn’t happen in a merge so the refresh time in massively increased. Another bonus of PQ columns is the code can easily by copied to another report
Happy anti-DAX Friday, Ruth! 😋 I spend more time in PQ now that I have learned that proper ETL makes data modeling MUCH more simple and efficient. Resurrection of (M)agic (M)ondays? 😯
anti-DAX on a Dax Fridays series 😂😂😂, I simply lost my mind!! Every Monday I do Power Query /M videos. I should probably recall them Magic Mondays again ;) /Ruth
Great video. Thanks Ruth! 😊 Just one query... what if we had to choose between 'DAX LOOKUPVALUE' and 'Power Query Merge' to add a calculated column. Should we still stick to Power Query or opt for DAX?
There is one important scenario where calculated tables (not columns) are the only way compared to PQ tables. I'm talking about waterfalls and incremental refresh policies. For waterfalls it's needed to append/union fact tables. Simply it not work to append incrementally refreshed table with other one using PQ. DAX Union is here life saver
@@CurbalEN I'm talking about Waterfall visual. Let's assume you have Budget and Actual Data, which are in separate tables first uploaded from excel on higher granularity and the second is actual, transactional data from database. I didn't find a way to create Bud vs Act Waterfall Visual without appending tables, whether it is M (Append) or DAX (Union)
If in case you have large number of unique values in the new column, you should consider calculated column rather than power query computed column.( it may increase the overall size when vertipaq will not be able to compress much because of very huge unique values). And if you are using Direct query connection both computed column in power query or calculated column will work like same for performance point of view.
It not sure I am with you on the first one, but on the second one, if you have a direct query connection, you can’t use power query: either you do it in DAx or at the source , the source being the preferred option. Happy Friday! /Ruth
@@CurbalEN Hey Ruth, My mean for the second one was not actually the computed column into metadata, but it was related to the actual SQL used explicitly like(column*2 AS double) in the advance section after clicking on direct query. Yes i read again my comment and it was my mistake in that flow. I am your Student. Thanks
In power query editor, all the operations like adding a new calculated column, cleaning, trimming, splitting... it all happens via dax? I am new into power bi. I also want to know when to add a measure via home tab vs the modeling tab... Thank you very much
No dax is used if you add a column in the Powerquery editor just M. Measures in modelling are dax. Dax calculated columns are at a row level as is the m columns. However, dax measures work of a set of data and you can use aggregations and joined data.
@@CurbalEN Thank you for your reply, could you please clarify if there is any difference when we add a column/measure from the Home Ribbon or the Modelling Ribbon.
Hi Ruth. Big fan. I follow your posts and tips since ever and love them. Great job. Sorry but shouldn’t this post be the Dax Friday’s #153? I am sorry again. This must be my OCD.. Irrelevant comment but I am kind of crazy by details heheheheh.. You are the best.
@@CurbalEN so the results in power Query are not "hard"coded, correct?. In that case it would make more sense to hard code the data before, if possible. In that case no performance would be needed for later
Yes, they are, but the engine that runs on power bi (vertipaq) has its requirements too and depending on how you write your DAX queries, you will need to do performance analysis too.
Perhaps because I've not been so much involved with very large data set, I had before now believed that DAX is the way to go for calculated columns. If you say Power Query is, then until I start handling PBI very well with large data sets...
As far as i know, calculated columns made it in dax or power query uses RAM for storage... The more cardinality has a column the more RAM it will consume... At the other hand measures are calculated on the fly everytime you see a visual. So when you change a filter or a tab, the measure are recalculated again. Dax also use a cache for visuals... But is very little compare to RAM
@@danielbalandra457 Hi there and Ruth. That"s the point calculated columns are not compressed and thus consume memory and result in reduced query performance depending on the size of the fact tables. Also, due to the fact that calculated columns are not present in the source system it can result in an additional effort to maintain/support the model. Long story short, whenever ever possible push to the source the development of calculated columns. Have a great weekend all!
There's article made by marco about this issue: if you have to create column in the same table then power query is the best but if you have to create column with multiple Tables then DAX is the best this's what i understand from the article and if I'm wrong let me know
@@DIGITAL_COOKING thats what i was telling in Another comment also given that article link. There is other issue if large no. Of unique value( increase overall size). If done in power query. www.sqlbi.com/articles/comparing-dax-calculated-columns-with-power-query-computed-columns/
It depends what kind ofoperations you do,of course. I've seen pivoting for millions of rows taking almost half an hour in Power Query while just seconds with DAX, that was amazing...
M codes syntaxes are difficult than DAX, Once you clearly understand the evaluation context you will love not only scalar functions but DAX query for Evaluate in DAX studio too.
@@pravinupadhyay2046 I wasn't serious. I was only trying to be 'light-hearted'. Apologies if I offended anyone. Of course, I like DAX.. it was meant to be joke but it didn't work!
Agree I prefer to put calcs in M for better storage and I find it more manageable. My rule is SQL(if a db source), M then dax. However, I recently had to move some calcs to DAX. It had folders of csv files and conditional columns based on M query merges and the refresh was very slow with lots of files. Switched the merges to joins in dax and then added calc columns with related() and refresh was much faster
It's all about Query folding. Sometimes you want to create a key in your tables in order to create a relationship. In dimension tables its ok to do this in PQ with merge columns operation even when query folding breaks, cause of the small number of rows. But when you do that in a fact table with 8.000.000 rows then it goes really slow in PQ. So I prefer doing that kind of merge columns for dimension tables with PQ and for fact tables with Dax.
Thanks for the post. I have been using Qlik for over 10 years and now learning Power BI. So, I'm trying to understand both DAX and Power Query. What you says makes a lot of sense to me - always push as much of the calculation burden back into the load stage of the process so that you have nice clean and complete data to work with when you start creating your reports and dashboards. Thanks again, Barnaby.
I'm barely less than 3 weeks learning POWER BI but I do have a real project that is already live using another tool and duplicating it with power BI for learning purposes.
Based on my limited experience, I do all transformations and adding columns and calculations in power query.
I do others in DAX and when I say others, I mean the simple ones that I just have to do to achieve the visualization format I want like groupings and sorting. So basically in my case, Visualization requirements drive my DAX usage.
Great topic, Ruth. This comment is not PQ/M-v-DAX per se, but a lesson learned (learning 🙂) about PQ/M. Early this year, knowing that I needed to learn PQ (before knowing there was M!) & Power Pivot (also unaware of DAX) & Power BI, I set my learning path as PQ in Excel, then PP in Excel, then PBI. I dove deep in PQ & took time to learn M as well. Glad I did, but I’m learning that I try to do too much in PQ/M. I built a monster Excel workbook. (In this case, “monster” means big & complex, but not bad.) It’s pretty awesome; it reduced 2-1/2 days of data analysis on a weekly cycle to ~30 minutes. (That’s total time, including human decisionmaking, not just crunching data.) But, I feel so sorry for my workbook b/c it works so hard. (The refresh often hangs & must be restarted.) I’m now adopting a new general philosophy to restrict use of PQ/M to ETL, i.e., use PQ/M to get the data from external sources into Excel, then use Excel, VBA, & PP/DAX to manipulate the data from there. We’ll see if that philosophy survives as I continue this learning journey, but I expect to feel less sorry for my workbooks with this approach. 🙂
Hi Jim,
First of , Thanks for sharing your experience, it will help others!
Now, I agree 100% with you, all ETL needs to be done (if possible ) in PQ and all calculations in DAX (again when possible).
It is as bad to do a split in Dax as it is to do a rolling sum in power query.
Now, I hope your excel files feels better soon and, most importantly, well done!!
/Ruth
I absolutely agree. If you can do it with PQ then do it with PQ but it's also because I'm more comfortable with PQ than with DAX ;). Thanks Ruth
Me too but thankfully in this case, it is a good thing ;)
/Ruth
I agree with your points Ruth, but here are some reasons why I use DAX columns:
Writing Time; When writing a column in PQ it wants to refresh the whole table (yes, I could write it just using a sample dev set)
Auditing: having all the code in one place makes it easier. Moreover it is easier in DAX to see the formula of the column than ‘hidden’ in the PQ script.
Refresh time; in Excel (as mentioned in other comment) the query folding doesn’t happen in a merge so the refresh time in massively increased.
Another bonus of PQ columns is the code can easily by copied to another report
Thanks for sharing Gray!!
And enjoy the rest of the weekend;)
/Ruth
Happy anti-DAX Friday, Ruth! 😋 I spend more time in PQ now that I have learned that proper ETL makes data modeling MUCH more simple and efficient. Resurrection of (M)agic (M)ondays? 😯
anti-DAX on a Dax Fridays series 😂😂😂, I simply lost my mind!!
Every Monday I do Power Query /M videos. I should probably recall them Magic Mondays again ;)
/Ruth
Great video. Thanks Ruth! 😊
Just one query... what if we had to choose between 'DAX LOOKUPVALUE' and 'Power Query Merge' to add a calculated column. Should we still stick to Power Query or opt for DAX?
Merges in power query are tough...if you have performance problems in M use DAX.
@@CurbalEN Thanks! 😊👍
There is one important scenario where calculated tables (not columns) are the only way compared to PQ tables. I'm talking about waterfalls and incremental refresh policies. For waterfalls it's needed to append/union fact tables. Simply it not work to append incrementally refreshed table with other one using PQ. DAX Union is here life saver
True, DAX is the only way to fake incremental refresh.
Unsure what you mean with waterfalls?
/Ruth
@@CurbalEN I'm talking about Waterfall visual. Let's assume you have Budget and Actual Data, which are in separate tables first uploaded from excel on higher granularity and the second is actual, transactional data from database. I didn't find a way to create Bud vs Act Waterfall Visual without appending tables, whether it is M (Append) or DAX (Union)
Oh, i see, i have never tried to do a waterfall chart from budget and actual, so that explains it, thanks!
/Ruth
If in case you have large number of unique values in the new column, you should consider calculated column rather than power query computed column.( it may increase the overall size when vertipaq will not be able to compress much because of very huge unique values).
And if you are using Direct query connection both computed column in power query or calculated column will work like same for performance point of view.
It not sure I am with you on the first one, but on the second one, if you have a direct query connection, you can’t use power query: either you do it in DAx or at the source , the source being the preferred option.
Happy Friday!
/Ruth
@@CurbalEN Hey Ruth, My mean for the second one was not actually the computed column into metadata, but it was related to the actual SQL used explicitly like(column*2 AS double) in the advance section after clicking on direct query.
Yes i read again my comment and it was my mistake in that flow.
I am your Student.
Thanks
We are all students! ;)
/Ruth
@@CurbalEN i have take reference form. www.sqlbi.com/articles/comparing-dax-calculated-columns-with-power-query-computed-columns/
Calculated columns consumes more speed and space in machine memory. Dax is always dynamic and reusable in Power BI report
True calculated columns are just for ICE cases ;)
/Ruth
In power query editor, all the operations like adding a new calculated column, cleaning, trimming, splitting... it all happens via dax?
I am new into power bi.
I also want to know when to add a measure via home tab vs the modeling tab...
Thank you very much
Hi,
Power query and DAX have different computational engines.
You can add a measure in both, no performance issues on either case.
Happy Friday!
/Ruth
No dax is used if you add a column in the Powerquery editor just M. Measures in modelling are dax. Dax calculated columns are at a row level as is the m columns. However, dax measures work of a set of data and you can use aggregations and joined data.
@@CurbalEN Thank you for your reply, could you please clarify if there is any difference when we add a column/measure from the Home Ribbon or the Modelling Ribbon.
Hi Ruth. Big fan. I follow your posts and tips since ever and love them. Great job. Sorry but shouldn’t this post be the Dax Friday’s #153? I am sorry again. This must be my OCD.. Irrelevant comment but I am kind of crazy by details heheheheh.. You are the best.
Right, sorry!! And done 😁
/Ruth
If the power query "formatting" is done. Do you still need performance for power bi?
Yes, you do.
@@CurbalEN so the results in power Query are not "hard"coded, correct?. In that case it would make more sense to hard code the data before, if possible. In that case no performance would be needed for later
Yes, they are, but the engine that runs on power bi (vertipaq) has its requirements too and depending on how you write your DAX queries, you will need to do performance analysis too.
Perhaps because I've not been so much involved with very large data set, I had before now believed that DAX is the way to go for calculated columns. If you say Power Query is, then until I start handling PBI very well with large data sets...
It is a good habit to do all cleaning with Power query and all calculations with DAX.
Have a nice weekend!
/Ruth
Ayobami Ologun Also if using partitions calculated columns are recalculated for entire table and not just the new data in partitions updated.
Hi Curbal! I had heard that DAX consumes CPU as and calculated columns in Power Query RAM. Is that correct?
Hi Luis, not sure what you mean, would you mind to rephrase the question?
/Ruth
As far as i know, calculated columns made it in dax or power query uses RAM for storage... The more cardinality has a column the more RAM it will consume... At the other hand measures are calculated on the fly everytime you see a visual. So when you change a filter or a tab, the measure are recalculated again. Dax also use a cache for visuals... But is very little compare to RAM
@@danielbalandra457 Hi there and Ruth. That"s the point calculated columns are not compressed and thus consume memory and result in reduced query performance depending on the size of the fact tables. Also, due to the fact that calculated columns are not present in the source system it can result in an additional effort to maintain/support the model. Long story short, whenever ever possible push to the source the development of calculated columns. Have a great weekend all!
There's article made by marco about this issue: if you have to create column in the same table then power query is the best but if you have to create column with multiple Tables then DAX is the best this's what i understand from the article and if I'm wrong let me know
I haven’t read it yet, will do soon!
/Ruth
DIGITAL COOKING can do a Merge with another table to create RELATED equivalent
@@Milhouse77BS yes , but i think you will have more RAM consumption, first read the article of Marco Russo about this topic
@@DIGITAL_COOKING thats what i was telling in Another comment also given that article link. There is other issue if large no. Of unique value( increase overall size). If done in power query.
www.sqlbi.com/articles/comparing-dax-calculated-columns-with-power-query-computed-columns/
"Power Query all day long." Ok, I hear that.
😂
It is more a: it depends in what you are doing , kind of thing ;)
It depends what kind ofoperations you do,of course. I've seen pivoting for millions of rows taking almost half an hour in Power Query while just seconds with DAX, that was amazing...
Yes, so many variables to consider when talking about performance...
/Ruth
How can anyone feel more comfortable with DAX compared to Power Query? What sort of human being is that? Are they half-human half-some other species?
M codes syntaxes are difficult than DAX,
Once you clearly understand the evaluation context you will love not only scalar functions but DAX query for Evaluate in DAX studio too.
@@pravinupadhyay2046 I wasn't serious. I was only trying to be 'light-hearted'. Apologies if I offended anyone. Of course, I like DAX.. it was meant to be joke but it didn't work!
😂😂😂 I agree with you!! #notfromthisplanet ;)
Happy Friday!!
/Ruth
Agree I prefer to put calcs in M for better storage and I find it more manageable. My rule is SQL(if a db source), M then dax. However, I recently had to move some calcs to DAX. It had folders of csv files and conditional columns based on M query merges and the refresh was very slow with lots of files. Switched the merges to joins in dax and then added calc columns with related() and refresh was much faster
Yes, joins in M are not super efficient... but apart from that, m is your best friend for cleaning data!
/Ruth