Download the file ⬇ - goodly.co.in/combine-data-from-multiple-pdf-files-excel Tackle even the most challenging data-cleaning problems. Check out the M Language course and push beyond the user interface ↗ - rb.gy/a2zsnn
Thank you so much for such a great video! I was looking for google to solve this problem but didn't find any good solution. I am happy to watch your video which solved my problem.
Thank you so much! I get PO copies in PDF format ... There are different sections on the PO like Supplier address/buyer address, PO # section / PO issued date / Section for LE name of the Business unit from where the PO was issued, and then the Milestone description with the amount that needs to be billed once milestone is completed. How can I put them on a table from different sections?
Great job! next step could be to add a step cleaning up the column names, if those are written al little different or having sometimes artificial white spaces.
you can do it in last step as well without adding new Rename steps, as you can see in had coded version there is a two list one that need to match with data source and second list goes for new names that you want to rename with
Great Video... Question though...What if Power Query does not read the PDF in a workable format? I have an Invoice PDf that when I import into PQ, the columns get jumbled up & I am not able to clean the data for reconciliation. I have not used your method per this video yet, but I will. Any thoughts other than using 3rd Party Apps? Thank you!
Hello, Chandeep! I have a question: doesn't the use of the Table.Combine function (at 6:07) have the same effect as using the Table.ColumnNames + Table.ExpandTableColumn (that you showed next)? It seems like the same result and it would be simpler, but I don't know if I am missing something here. Thanks! Your videos are great!
Thanks as usual. But can you provide us an example if we need to cancel some data from that PDF at the rows? Also, every page has the name and id for each employee and we need to add both of them into column
Sandeep, Your videos are realy very deep, simple and practical sdetailing all steps from 0 to last. Can u show how to combine pdf files with password ptotected, which is known to user. One way is to open those files , print them as pdf and then store in the folder, which is cumbersome. whether there can be any short cut.
I have a pdf file having total of 50 pages (page001 to page050). Each page contains same structure of columns with different record but every page contains header and footer. I have to remove all those header and footer rows plus remove some unwanted columns before able to combine all 50 pages as 1 table. Created a function in powerquery to repeat those cleaning process to apply for all 50 pages but some pages detected lesser or more column numbers then the rest even though they are all the same structure if looked from pdf reader. How to deal with that issue?
Ahh....been facing a similar problem here, different number of columns detected even though they looked the same in the PDF reader app. Any solution would be appreciated
This was super helpful, but I have recently also started to use extraktAI, it saves me a ton of time tbh 🔥 but either way a great video guys, keep up the good work!
@goodly i have a PDF file wherein the data resides right below the columns there are 300 pages in that file how am goona get the data Please make video suggest otherwise
bro bro brooo, in previous videos instead of adding new columns and deleting after you used table.transformcolumns which I liked a lot and now I am using this practice thanks to you. Is there a reason why you did by adding new columns this time? (is it more faster, effective etc?)
I have invoice pdfs, 30 of them each month, with multiple tables and scattered data i tried a lot of manipulation but wasn't able to get the desired output
I had nested tables after grouping data. All the tables had the same number of rows (4). I needed one of the columns to be replaced with a fixed list of names (departments for example). I just could not get it to work. I was able to add an Index column (1,2,3,4) into the tables, expand them and then did a Merge from the external list of department names in another query. So the process was: 1. Create departments in excel sheet and make it a table. 2. Import this into a Query. 3. Import the other data from a table in Excel, grouped the data into nested tables, added a Index into these tables and expand. 4. Then I created a third query to then merge 2 and 3. 5. I am sure I should be able to get that list into the nested query tables the same way as I inserted the Index column into there. Please help.
I've used PQ to import PDF data a lot. One thing I found is that the "Print to PDF" printer built into Chrome based browsers are the easiest to work with. I have a full license for Acrobat, but the Adobe PDF printer produces some of the most difficult PDFs to work with. The Microsoft Print to PDF isn't much better. If anyone knows of settings to adjust in these printers to make the PDFs easier to work with, please reply!
HeLLo Goodly/All, Would be wonderful if Someone confirms: I was practicing along with Goodly. In the situation in the video, it seemed the following 2 are producing the same result. ✓ Table.ExpandTableColumn & ✓ Table.Combine Is My understanding correct? Thank You!
Download the file ⬇ - goodly.co.in/combine-data-from-multiple-pdf-files-excel
Tackle even the most challenging data-cleaning problems. Check out the M Language course and push beyond the user interface ↗ - rb.gy/a2zsnn
Thank you so much for such a great video! I was looking for google to solve this problem but didn't find any good solution. I am happy to watch your video which solved my problem.
Power Query Master. Deep explanations, going into detail of the matter. Outstanding explanation. Thanks Chandeep.
This is super helpful! Especially for those working in an auditing background! great content!
Thank you so much! I get PO copies in PDF format ... There are different sections on the PO like Supplier address/buyer address, PO # section / PO issued date / Section for LE name of the Business unit from where the PO was issued, and then the Milestone description with the amount that needs to be billed once milestone is completed.
How can I put them on a table from different sections?
Excellent Sir.. plz do more video on List Functions..
awesome! and with good practices
Really helpful!!! Thank you
Great job! next step could be to add a step cleaning up the column names, if those are written al little different or having sometimes artificial white spaces.
you can do it in last step as well without adding new Rename steps, as you can see in had coded version there is a two list one that need to match with data source and second list goes for new names that you want to rename with
Great Video... Question though...What if Power Query does not read the PDF in a workable format? I have an Invoice PDf that when I import into PQ, the columns get jumbled up & I am not able to clean the data for reconciliation. I have not used your method per this video yet, but I will. Any thoughts other than using 3rd Party Apps? Thank you!
Hard to say, look for some kind of pattern that you can use to split tex, replaced values, etc.
Amazing Chandeep!!
Thank you brother. This was the only tutorial that worked for me
Thanks! Very useful and perfect presentation 👏👏👏
Thanks for the awesome video.
Same as your example I have to just transpose each table before merging then.
Hello, Chandeep!
I have a question: doesn't the use of the Table.Combine function (at 6:07) have the same effect as using the Table.ColumnNames + Table.ExpandTableColumn (that you showed next)? It seems like the same result and it would be simpler, but I don't know if I am missing something here.
Thanks! Your videos are great!
Thank you very much for useful technics.
Wow, very informative content, you explained them very well.
Just curious, is this applicable if your pdf file is a scanned doc / form?
Thanks.
Thanks as usual. But can you provide us an example if we need to cancel some data from that PDF at the rows? Also, every page has the name and id for each employee and we need to add both of them into column
Great video. Thanks !
Can you create a video on CO pilot like chat gpt ( ex: get sales for X Year) in power BI
Sandeep, Your videos are realy very deep, simple and practical sdetailing all steps from 0 to last. Can u show how to combine pdf files with password ptotected, which is known to user. One way is to open those files , print them as pdf and then store in the folder, which is cumbersome. whether there can be any short cut.
Thanks Chandeep! Pls advise how to do the import such that each row of PDF becomes one cell in power query.
I have a pdf file having total of 50 pages (page001 to page050). Each page contains same structure of columns with different record but every page contains header and footer. I have to remove all those header and footer rows plus remove some unwanted columns before able to combine all 50 pages as 1 table.
Created a function in powerquery to repeat those cleaning process to apply for all 50 pages but some pages detected lesser or more column numbers then the rest even though they are all the same structure if looked from pdf reader. How to deal with that issue?
Ahh....been facing a similar problem here, different number of columns detected even though they looked the same in the PDF reader app. Any solution would be appreciated
Excellent 🎉...
This is so useful thanks
How to import the date cleanly if each pdf has different page number and has additional table? Thank you.
Please make a video on how we can bring in multiple bank statements with different format to a single power bi report.
This was super helpful, but I have recently also started to use extraktAI, it saves me a ton of time tbh 🔥 but either way a great video guys, keep up the good work!
I have a pdf, which has tables side by side instead of one below other, how can I combine this?
Eg: table 1, table 2, table 3
Table 4, table 5,
Fantastic
I have a pdf challan for TDS deposit. I am trying to combine multiple challans but I am not able to do.
@goodly i have a PDF file wherein the data resides right below the columns there are 300 pages in that file how am goona get the data Please make video suggest otherwise
Is there any possibility to get data from multiple PDFs which are password protected. If yes then could you please make a video on this?
bro bro brooo, in previous videos instead of adding new columns and deleting after you used table.transformcolumns which I liked a lot and now I am using this practice thanks to you. Is there a reason why you did by adding new columns this time? (is it more faster, effective etc?)
Just easier to explain 😉😜
@@GoodlyChandeep :D sometimes we are all lazy :D haha
Sandeep can yu make a video to combine json files with password prtotected into excel. It would be very very helpful. Regards
what if only one pdf have column names and others dont ??
I have invoice pdfs, 30 of them each month, with multiple tables and scattered data i tried a lot of manipulation but wasn't able to get the desired output
I had nested tables after grouping data. All the tables had the same number of rows (4). I needed one of the columns to be replaced with a fixed list of names (departments for example). I just could not get it to work. I was able to add an Index column (1,2,3,4) into the tables, expand them and then did a Merge from the external list of department names in another query. So the process was:
1. Create departments in excel sheet and make it a table.
2. Import this into a Query.
3. Import the other data from a table in Excel, grouped the data into nested tables, added a Index into these tables and expand.
4. Then I created a third query to then merge 2 and 3.
5. I am sure I should be able to get that list into the nested query tables the same way as I inserted the Index column into there.
Please help.
Super!! Bro
Bravo 👍👍
wonderful 🌹🌹
How to convert bank statement pdf to excel
I've used PQ to import PDF data a lot. One thing I found is that the "Print to PDF" printer built into Chrome based browsers are the easiest to work with. I have a full license for Acrobat, but the Adobe PDF printer produces some of the most difficult PDFs to work with. The Microsoft Print to PDF isn't much better.
If anyone knows of settings to adjust in these printers to make the PDFs easier to work with, please reply!
Can't work with me
from 6.44,the whole stuff become confusing
Noice!
Permanent off-world relocation
please need a help when import data from pdf some words in table not english show as"ϱΩϳϣΣϟΩϭόγϣΩϣΣϣϱΩϋ" how i can solve it
HeLLo Goodly/All,
Would be wonderful if Someone confirms:
I was practicing along with Goodly. In the situation in the video, it seemed the following 2 are producing the same result.
✓ Table.ExpandTableColumn
&
✓ Table.Combine
Is My understanding correct?
Thank You!
No, Table.Combine keeps the content of the nested tables. Table.ExpandTableColumn keeps all columns. Just try it you will see the difference