Combine Files from a Folder with Power Query the RIGHT WAY!
HTML-код
- Опубликовано: 26 июн 2024
- Most people combine files from a folder with Power Query all wrong. In this video I show you why it’s wrong and the way you should be doing it.
When Power Query combines files in a folder it automatically creates a bunch of queries for you. There are two queries that are important, the sample file query and the final query.
Most people use the wrong query to make their transformations and wonder why they’re struggling.
Check out the video to see examples of when and why you should use the sample file query vs the final query.
📑FILE DOWNLAD & STEP BY STEP written instructions here: www.myonlinetraininghub.com/p...
🎓 LEARN MORE in my Excel courses: www.myonlinetraininghub.com/
🔔 SUBSCRIBE if you’d like more tips and tutorials like this.
📢 Please leave me a COMMENT. I read them all!
🎯 CONNECT with me on LinkedIn: / myndatreacy
🎁 SHARE this video and spread the Excel love.
Or if you’re short of time, please click the 👍
💬 EXCEL QUESTIONS: Get help on our Excel Forum: www.myonlinetraininghub.com/e...
⏲ TIMESTAMPS
0:00 How to Combine Multiple Excel Files From a Folder
0:20 The Data & Where to Save it
1:08 Getting the Data From a Folder
3:01 The Important Queries
3:57 When to use the Sample Query
7:36 The Important Stuff
7:53 Loading the Data
8:51 Getting New Data - Наука
Rather than manually refreshing, go to Data Ribbon > Queries > Properties and set it to refresh whenever the file opens. This ensures you dont have to remember to refresh the data, and potentially have an error in your output.
Yes, great idea. 👍
Where do i get this option? I am not able to find Queries & Properties option. Can you please elaborate?
Love how you don't mince words, and keep it clear by comparing and contrasting, and telling WHY to do a thing as opposed just explaining WHAT to do, thanks!!
I appreciate that! 🙏😊
Great video, I always overlooked the sample file. Now I know it's worth. I also loved your example with transposing the column headers - I am sure I will be using that sometime in the future!
Great to hear!
Hi Mynda,
Your channel is the best Excel-related resource I've ever seen in my life.
Keep up!
Wow! Thanks so much for your kind words. Please share it with your friends and co-workers.
Before combining do it in sample / After combining do it in either sample query or in final query !!! Well said this is what i want !! Searched in so many channels regarding this difference finally got from yours channel 👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏 thanks 👍
Glad I could help. This is a fundamental point many people don’t realise.
Hi Mynda
You truly are a genius, I get a lot out of watching your uploads.
Use PQ almost every day, and never stumble over this, but instead use filtering out and renaming columns.
So grateful, thanks
So wonderful to hear 🙏
Excellent ! Very clear explanations as always.
Just used them and 💥... my work has been produced in half the time I'd previously planned.
Thanks you very much !
Wow! That's awesome to hear 👏
I'm still trying to get a handle on PQ, and you are helping immensely! I had a couple of 'light bulb' moments watching this video - thank you!!
That’s great to hear!
Great video Mynda. As always a clear explanation!
Thank you!
Another great video on how to make use of the great functionality in Excel!
Glad it was helpful!
Loved the way of promoting two rows as a header I've always ended up isolating the first two rows ,
Table to columns, Table column names , zipping and renaming. But thanks for the whole thing,
I still find brining in files form a folder with different names and structure very confusing and there's a lot of trial and error.
Glad you liked it 🙏 different column names is a tricky one because there are many scenarios, so it's difficult to have a one size fits all approach.
Another great video, thanks Mynda!
Cheers, Chris 🙏
Great video. I would add that you can combine pdfs as well from folder. I did this for point of sale reports in order to automate a journal entry go uploading. Dropped all weekly sales for each store into a discrete weekly folder. Grabbed the data from the folder. Repoint each week to current weekly folder. Power query has allowed me to save my clients time and money while improving accuracy.
Nice!
Absolutely awesome! Thank you for sharing this.
Our pleasure!
Thank you. Yours is the only video I was able to find that explained the need to apply transformations to the sample file to cause them to be applied to all files in the specified folder before combining the data. My issue is that my weekly CSV source data is formatted is such a way that there are data in multiple discontiguous tables arranged vertically throughout the file which need to be collected and then combined into one table. The only way I've been able to accomplish this is creating multiple duplicates of the original sample file, each of which take different data from different areas of the source file. I use the original sample file to get the first set of necessary data, the first duplicate to get the next set, and so forth. The resulting data sets from each query do not have matching columns but do need to be put together to create a complete table. So I insert an index column in each query's data set and then merge them using the original sample file. However, the final query table only shows the first set of data from the original sample file for each respective source file. All other data for each of the duplicate sample files is just repeated down the column for every respective source file. I imagine this has something to do with the fact that only the original sample file is respected in this manner, but I don't know if/how I can fix this or if I'm taking the wrong approach to this altogether. Any help would be appreciated.
Any transformation steps applied in the sample query are applied to ALL files before appending them. However, if your sample file has different column names to the other files, then the other files may not receive all transformations because Power Query will be looking for column names that don't exist in those files. If you have different column names, you might find this tutorial helpful: www.myonlinetraininghub.com/combine-files-with-different-column-names-in-power-query
always love your tutorials....excellent
Thanks so much!
Thanks for this video !! Help me a lot .
Glad to hear that!
Great for share me the tips , edit Transform sample files query if we want to clean and transform before combine.
Awesome to hear!
I just wanted to say a huge thank you for all the amazing tutorials you've shared! Your tips and tricks have completely transformed the way I use Excel. Every video is packed with useful information, and your clear, step-by-step instructions make everything so easy to understand. I've learned so much from you and now feel much more confident with my Excel skills. Keep up the fantastic work!
Thank you so much! 🥰 I'm so pleased I can help. Keep learning and practicing 💪
The best thing i notice besides the video is that you responded each and every comment. ❤
Always! 😊Thanks for watching.
absolutely outstanding video - superbly presented - detailed yet simple and clear and easy to follow - outstanding! Thank you so much.
Glad it was helpful! 🙏😊
Outstanding - solved a real problem I was facing in terms of pulling data into excel and added immense value to me - cannot thank enough!@@MyOnlineTrainingHub
This is very good explained and important. In the real world you almost NEVER get clean data, no matter the source. Transforming the sample file is such an important step. I had to learn it the hard way.
Thank you! Glad it was helpful.
Awesome, your are the best teacher and I can't wait to enrolled for your paid sessions.
Awesome, thank you! I look forward to teaching you more of the amazing things Power Query can do 😊
Very good one. Thank you.
Glad you liked it 🙏
The transpose trick is very useful to say the least. Thank you!
Glad it was helpful!
I use PQ at my workplace and end up doing many things manually... This video was extremely insightful... Thanks Mynda
Great to hear!
Thanks! This was useful. Your tip on naming the sheets exactly the same helped me resolve the error "[Expression.Error] The key didn't match any rows in the table." I kept getting in loading excel files.
Glad it helped! 😊
Thx a lot.
This is indeed a very video n explaination how its work.😊
Glad you liked it 🙏
Thanks, that will definitely help
Great to hear!
Many thanks,this is awesome.
Glad you liked it!
It's really useful !
Glad to hear that!
Your new look is a winning one and the content as usual outstanding!
Thanks so much 😊
I have always used databases instead of spreadsheets. Unfortunately I don't have access to a database server, so I have to keep it on my local machine, which does no one but me any good. I'm looking into putting some of the data I have into Excel instead (even though I personally consider this going backwards) so that other people have access to the dashboard and so the files are properly backed up. I greatly appreciate your videos, they are really helpful. I am very fortunate in that queries already make sense to me since I use them all the time in the applications I've created using a database, but I have found that there are actually some things that Excel handles better. I was surprised, shocked actually, at just how functional Excel can be. It can come close to the abilities of a real database. Thank you for your videos, they are appreciated.
Awesome to hear! I guess you could always use Access if you really wanted/needed a database. You can then connect to Access via Power Query to get the data into Excel for your reports.
@@MyOnlineTrainingHub Have done that some, I'll probably start doing it again.
You're a Master!
Thanks so much!
Thank you. I love you.
Glad my video was helpful.
Brilliant! Thx!
My pleasure!
Thanks for sharing.
My pleasure 😊
Thank u for this nice Video 📹
Glad you liked it 🙏
This might be an example I use when promoting ETLT in the "ETL vs ELT" debate. Thanks
😊ETLT! Yes.
Just missing a super cape! :) Awesome as always!
😁 thanks so much!
Great stuff. It would be great if you could also show how to manage data from folders which are availiable through sharepoint. There might be some authorization errors occurring.
This video covers getting data from SharePoint: ruclips.net/video/rcYRcsDjPMI/видео.html
Good tip!
Cheers 🙏
Excellent!
Thank you!
Power query is game changing I use it for document control and working out what is held on relevant company systems so it can actually be found. I do like how if data is not in a table in the source due to being in a form with merged cells, you can run a few queries with results next to each other which create a bigger table that all works together.
So pleased to hear you're making use of Power Query 😊
Very useful video
Glad you think so!
Wonderful !
Glad you like it! 🙏
I love your channel 💚💚
Thanks so much!
Thank you Mynda! I've made a few of those mistakes and this comes in handy for me. I really appreciate you.
Glad it was helpful 🙏
thanks a lot !!!!
You're welcome!
A great video... would be good if you could add the step to externalise the folder location.. ie have the location in a worksheet cell rather than coded within the power query.
Thanks! I cover that in my Power Query course: www.myonlinetraininghub.com/excel-power-query-course
I had no idea we can make transformations on the sample file.... Will check the improveness on load time on a huge report.
Hope it helps.
Thanks love you
Welcome 😊
What in case if we have different no of columns & data in different files !
Ideally the files should have the same structure. However, as long as the first file/sample file has all possible columns, it will work.
Excelent video, can we get the files to practice? I didn't found them in the article. Thanks.
It's there now: www.myonlinetraininghub.com/power-query-get-files-from-a-folder
Thanks for this, very helpful. To add a new scenario to the mix, each workbook containes 12 tabs (one each month) and there are 4 years (ie 4 workbooks), the end result, the same, one big table with everything on it. How do you go about this? Thanks in advance.
Here is a tutorial on getting multiple files containing multiple sheets with Power Query: www.myonlinetraininghub.com/import-multiple-files-containing-multiple-sheets-with-power-query
Hi Mynda, thanks for sharing. My question is : If there's a change in the raw data columns, like additional column in between the existing column or additional columns added after the existing column, how to make the PQ continue to work?
You can edit the query to allow for the new columns. If it's an ongoing thing, then it's more complicated to automate this and not something I can cover in the comments here, sorry.
Love your teaching style. When you are merging sources to update main sheet, is there a way to prevent "incoming" duplicates rows from overwriting what I already have? My goal is to be able to update an excel sheet with sources of data that sometimes have additional headers or not in the same order.
It’s not ideal to modify the table output of a query for this reason. It would depend on what you’re changing as to the best approach. You’re welcome to post your question and sample Excel file on our forum where someone can help you further: www.myonlinetraininghub.com/excel-forum
Excellent video. A basic question: What could be examples of transformations before the files are combined as opposed to after the files are combined? I am not clear about this difference. Thanks much.
Thanks! I give an example in the video of transformations that need to be done in the sample file.
Very informative and practical. Kudos to you.
Would you mind sharing the data files, thanks in advance.
The file download link is in the video description 😉
I am afraid,but file download link is not there in the description. Please have a look again, thanks
I learn a lot from your channel, thank you.
Great to hear 🙏
Great!!!!!!
Thank you!
Thanks a lot Madam 🎉🎉🎉...could you also do an example of combine or merge pdf bank statements??? please... thanks 😊
Glad it was helpful. Every bank statement is different. If they're in CSV format, you can use these techniques. If they're in PDF format, check out this video: ruclips.net/video/Xkew2GrXu9c/видео.html
Thanks for your input. I have pasted close to 400 lst files in a folder and used the power query. The power query is only showing the file name and not the data into it.
However If I extract one indvidual file. I am able extract.
Not sure what a lst file is, but you're welcome to post your question on our forum where someone can help you further: www.myonlinetraininghub.com/excel-forum
Wowow... I'm wait
I like this video. I have edited the sample file query before, and it is good to see it demonstrated.
All that stuff that is generated on the left-hand Queries pane in PQ looks daunting. It is always good to know I can ignore most of it!
Thinking about the section of the video at 7:37, if a transformation can be done before or after files are combined, my instinct is to put it after. I doubt it makes a noticeable performance difference, if any at all. But it seems like I am being helpful to PQ 😊, by making it do the transformation only once, rather doing it for each file in turn.
Glad it was helpful. Good point about the efficiency of performing transformations in the final query vs the sample file. I haven't got a dataset big enough to test it on to see if it has an impact. It would be unnoticeable on small datasets.
First question - how to get bank files sent from the bank periodically!? That would be so good as in Australia this data is never consistent from banks, but this is great Mynda thanks solves my need to collate all csv's in 1 automated place!
You’d have to ask your bank how you can get the data in a csv file.
Thank you very much for this content.
I wonder if there is some kind of limit in how many files can Power BI can fetch and consolidate from a folder.
I may have a scenario where the client uploads like 40 different files a day to be used in a report.
I'm not aware of a specific limit, but you might find the queries become very slow.
@@MyOnlineTrainingHub Thanks! Happy new Year!
Hi, I have transformed the data in transform file and now it is not getting refresh in main query. I have also checked the function sample query and it is linked to transform sample file. I have also observed that whenver I am updating steps in transform sample file, steps are not getting reflected file in function file.
Hi Mynda,
Very helpful video. Is it possible to do this if not all files are the same template? For example, I have an additional column of data in some sheets but not all sheets I am trying to run a query on. Thanks!
Yes. Make sure the file you choose as the sample contains all possible columns.
@@MyOnlineTrainingHub I suppose I phrased that incorrectly. I actually have an extra row in some tables in some sheets, but not all of them. Even when modifying the sample, I cannot get it to work with those sheets that do not have that additional row of data. Thank you so much for your prompt response
Very nice! If they decide to add a new column to source system and all my new CSVs have a new column I want to include, do I just set that in the sample file? For some reason, just dropping them in, the new column wasn't identified. Hopefully there's another video you can point me too. :)
Go to the 'Transform Sample' query and edit the Source step in the formula bar, removing the argument for 'Columns =n,". Removing the hard keyed number of columns will allow the query to get all columns. Also, make sure the file with the new/extra columns is the sample file.
@@MyOnlineTrainingHub Your better than chatGPT!
Great tutorial. If I am using this to combine bank transaction files, is there a way to clean up the description column? The description includeds receipt numbers, dates and purchase type that aren't required and make the column too wide. I am currently using Text to Columns and delimiter it using the dash -, then I delete the additional columns
Thank you! Yes, you can use Power Query to split the text by delimiters and delete the columns, that way you only need to set this up once and then Power Query will apply it upon refresh to each new file you add to the folder.
Mynda, if we have different header names and we have to combine all files with different headers? What will we do?
Great question. See this video: ruclips.net/video/tpK_xklbDf0/видео.html
Great video! What if the datasheets within the folder are cumulative rather than containing exclusive data? In other words, the datasheets in the folder each have the same data but the newest one has added the latest rows. Is there a different process for this?
In that case, I would only get that one file, rather than all of them. You can simplify the process and use the Get data > From Excel File connector.
I'm hoping this works for SharePoint folders too?
Dear Mynda,
Thank you for all the great work..I have been following you on LinkedIn as well
But I am in trouble..I want to develop a Training and Development Dashboard without Power BI etc..how can I? I am unable to find something easy and relevant
Even I face problem in updating a dashboard over and over again with Pivot tables
I don't have an example of a training and development dashboard, however in my Excel Dashboard course (www.myonlinetraininghub.com/excel-dashboard-course) I teach you the skills to enable you to build any dashboard. In terms of PivotTables updating, as long as they share the same source data/query data, when you refresh all (via the data tab) they will all update.
Hi Mynda,
I found this video very helpful, and actually made me work differently. But I have discovered a problem.
I am using your example but is there a way to pass the folder name and path in as a parameter?
Any help would be appreciated as it is driving me mad.
Yes, you can have a dynamic folder path. I don't have a RUclips video on it but I cover it in my Power Query course: www.myonlinetraininghub.com/excel-power-query-course
Very Nice, thanks! On MacBook Pro, I don't get the option to choose "From Folder". Thanks
Thank you! Power Query on Mac is still under development.
@@MyOnlineTrainingHub Thank You!
Great info thanks for sharing!
What if I have a new file being added to the folder but each new file includes the older info? is there a way to just add the new information? without having to manually remove the older file from the folder? Otherwise, my data keeps accumulating
You could add a filter in the query to ignore data today's date minus n days old.
@@MyOnlineTrainingHub Thanks so much for the suggestion! Any resource where I can see an example?
No, but if you get stuck you can post your question and sample Excel file on our forum where someone can help you further: www.myonlinetraininghub.com/excel-forum
@@MyOnlineTrainingHub I tried signing up but i'm getting an error saying my username is not registered
Sounds like you tried to login, because upon signing up you choose your own username.
Please try again, and reach out via email if you still have trouble: website at MyOnlineTrainingHub.com
what happens if your excels are invoices and the data is in not a pretty pivot table spreadsheet but the information is in different columns and rows? how do you get it to pull the data in specific cells and label them with a name, is that possible?
Yes, it'll be possible. Please post your question and sample Excel file on our forum where someone can help you further: www.myonlinetraininghub.com/excel-forum
When you press refresh the query will do all the previous years again as well, correct? is there a way only data from new source file copied to the data folder gets appended to the table created previously by power query?
Correct. In Excel there is no way to do an incremental refresh. Only Power BI has this feature.
@@MyOnlineTrainingHub do you have already a video how this works in power BI?
Hi Mynda,
I have added a new data to my folder and refreshed my power query in excel 365, but it isn't updating the new record. under "Applied steps" double clicked source & I can see its added to the source. But it isn't getting added to the final output of the power query table. Appreciate your feedback on this? I don't understand why refreshing isn't working.
Thank you!
Please post your question and sample Excel file on our forum where someone can help you further: www.myonlinetraininghub.com/excel-forum
Thank you so much for this tips! SHould the number of rows too the same in all files? Or just the columns? Also, after uploading a new file on the same folder, and clicking all refresh button, the new data from the new file won't show. What could be the problem? Appreciate your response Mam. :)
Just the columns.
Thank you! Also, after uploading a new file on the same folder, and clicking all refresh button, the new data from the new file won't show. What could be the problem? Appreciate your response Mam. :)
@@MyOnlineTrainingHub
Please check the filtering in the first step.....maybe you have filtered out some file and the new file is also filtered out later in that step 🤔
thank you SIR!
@@priyeshsanghvi8424
could the power query combine the protected password workbooks?
No. It can't get data from password protected workbooks.
I am right that this can only be done with local folders? i.e. not folders on SharePoint? I struggle with this, since the spreadsheet can then only be used by the person who created it. Or, as we do, you need to modify the 'Source'-step in all the queries. Any thoughts on this?
Oh and btw: Great video as always 🙂
Here is a tutorial on how to get files from a SharePoint folder: www.myonlinetraininghub.com/get-data-from-onedrive-or-sharepoint-with-power-query#folder
Does this method works for 3 different data sources in a single folder?
Yes, assuming those data sources are the same structure. It wouldn't make sense to append files that contain different types of data.
What happen if we combine live data? Example, we want to capture the student's height in 4 clases. Notice that there's new student register during the day which in class 1. Can the teacher update/add new line for the new student? And can the master data capture the new line item?
Yep. Just add the data to the source file and save it. Then refresh the query to pick it up.
Hi, I run Excel from Mac and it does not show the from Folder option. Is there a work around for Mac users? Thank you.
Not really. The workaround for Mac users is to install Parallels and then install Excel for Windows.
Love this but on a mac. can't find "folder" option to select data. ugh. also no combine option probably only because it allows me to select one worksheet at a time.
Yeah, unfortunately the Mac version doesn’t have the full functionality.
My Query with appended csv files from a folder have multiple rows of same column names (one from each file)
What's the best way to keep only one row?
Thanks. Subscriber :)
In the sample file query you can promote the first row as the header row. This will repeat the process for every file in the folder so they do not appear throughout the dataset once combined. You see this in the video.
@@MyOnlineTrainingHub Thanks!
Hi, can I do this, but each file has its own worksheet instead of combing all of the data on one worksheet?
You have to create a separate query for each sheet you want as an output. There's no automated way to create separate queries.
I have 1 m record on which I want to apply group by to get max value of each category and then same value should reflect against each category in separate custom column.....I tried it but it will take almost 2 hours and eventually system hanged....any suggestions pls
Maybe load it to the data model and then do the grouping in a PivotTable instead.
@@MyOnlineTrainingHub...thanks and i will try and update you ....but is their any solution in power query
How to make the number of files in the folder dynamic, so PQ always only grabs the most recent 5 files? I tried to sort the files by dates and keep top 5. But when I tested by adding the 6th file in the folder, the PQ loaded it. Thanks!
You probably need an index number that numbers the top 5 based on date. I don't have a tutorial I can point you to, but you're welcome post your question and sample Excel file on our forum where someone can help you further: www.myonlinetraininghub.com/excel-forum
Gtk you have a forum
what if seeing “ “ symbols with numbers when load the dada. How to get ride of them?
Do a 'Replace' in Power Query that replaces " with nothing.
Bravooooooo
Glad you liked it 🙏
COMO PUEDO DESCARGAR EL ARHIVO PARA PRATICAR
It's available here now: www.myonlinetraininghub.com/get-started-with-power-query
I don't understand, is there a benefit in performance if I made the changes in the Transform file?
If you try to unpivot after you've appended all the files you'll end up with a mess. Try it with the sample files for this video (link in description) and you'll see the problem.
@@MyOnlineTrainingHub thanks, I always learn something new with your videos, I asked because a work around I do is just filter data "does not equal to" and removed those headers but good to know there's other option error free and I had no idea I could edit transform file🙂
I have two files types in my folder and there is one common column that does not have the same name in both file types. How can I force alignment of the varied column headings? Sample file has "Receiving Country ", while the other type has "Receiving Country ​​"
You have to rename the column before the append of the tables.
I don't understand the interaction between the Sample query (which loads data one specific file) and the Combined query, which only loads names, file extensions, file paths, etc.??? I can't get my Excel to put the two together. :(
The combined query should load all the data. If it's not, then maybe you still have a double arrow at the top of one of the columns (usually called Content) that you can expand to get the underlying data. If you're still stuck, please post your question on our forum where someone can help you further: www.myonlinetraininghub.com/excel-forum
3:55
Glad you found something useful!
What if you want to include the name of the file in a column?
When you connect to the folder, you can keep the file name column, which is called 'Source.Name' as shown in the written step by step here: www.myonlinetraininghub.com/power-query-get-files-from-a-folder