Bulk Combine PDF files to Excel without losing formatting & NO 3rd party software

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024

Комментарии • 376

  • @LeilaGharani
    @LeilaGharani  8 месяцев назад +2

    Get access to the complete Excel Power Query course here 👉 www.xelplus.com/course/excel-power-query/

  • @besthobbit
    @besthobbit Год назад +1

    Thank you SO SO much. People at work think it's a good idea to save their spreadsheets as MULTIPAGE PDFs and send them around the org instead of just sending the Excel file.... now I won't need to spend so much time making them an Excel file again.

  • @chh8860
    @chh8860 2 года назад

    At the 9:00 minute mark " ... let me know if you have ever imported files like this into Excel ...".
    No ... not yet. However, I have been thinking of doing something similar with many years of historical data. My concern is that the pdf line items and headings may change from year to year.
    Once again ... another intriguing lesson from Prof. Leila. Thank you!

    • @LeilaGharani
      @LeilaGharani  2 года назад

      It does have some limitations, but as long as there is a logic to the headers you can add PQ steps. You could for example add a mapping table for the headers or find a dynamic way to pinpoint the first row to import. If you do end up using this feature let me know how it went.

  • @ugabrew
    @ugabrew 2 года назад +68

    Wow! You have a way of designing these examples and walking us through them that deserves a Teacher-of-the-Year award. Possibly also a Medal of Freedom. We all owe you a big thanks!

  • @mr.c6674
    @mr.c6674 2 года назад +19

    This really depends on the layout of the PDF.

    • @BangaloreYoutube
      @BangaloreYoutube Год назад +1

      True I actually have some structured and unstructured PDFs I’m able to extract the table I want for structured using python but I’m unable to find something for unstructured (basically the ones you can’t copy text from) does anyone have any solution also if someone wants the code for the python extractor lmk.

    • @muhammed9971
      @muhammed9971 7 месяцев назад

      @@BangaloreRUclips link

  • @shivnarayan8532
    @shivnarayan8532 4 месяца назад

    This is extremely helpful especially when looking at multiple invoices

  • @rfdc
    @rfdc 2 года назад

    I need to try this. You know the countless hours I've lost over the years manually typing info from .pdfs to excel!

  • @MrSupernova111
    @MrSupernova111 2 года назад

    OMG! Every reporting period I have to go through a client PDF report with multiple pages that is a complete mess. I didn't know that I could import the pdf file to Excel using Power Query so I can manipulate as needed. This is a huge time saver!! Thank you!!

  • @villiamo3861
    @villiamo3861 2 года назад

    Excellent vid. The thing about filtering for the sample file was very good - not very obvious (to me, at least!) when you're looking at an error and thinking you've already filtered to get just pdfs!
    Thanks!

  • @doctorchem1945
    @doctorchem1945 2 года назад

    Hello Leila! I am a great fan of your videos, very understandable and knowledgeable.

  • @rajkumarbharathi3139
    @rajkumarbharathi3139 2 года назад +1

    Another way to do it is by making a function out of the transformation that is applied to one of the pdf, then apply the funcion to all the pdf, files. Then expand. In that way you do not have to worry about the sample data thing.

    • @KVNVY
      @KVNVY Год назад

      Can you please elaborate. Beacuse i have no Get Data option in my excel

  • @APG_Creativity
    @APG_Creativity 2 года назад

    Thanks Ma'am.....this option save my lot of time.... 😍😍😍😍😍👍🏻👍🏻👍🏻👍🏻👍🏻👍🏻👍🏻

  • @luisfernandocastillo4045
    @luisfernandocastillo4045 2 года назад

    Great video...!!! I would like to know how to manage error when source folder is empty.

  • @randomguy-jo1vq
    @randomguy-jo1vq 2 года назад +1

    leila, you didn't tell that power query gets super duper slow in case of millions of data. how can we make power query runs fast

  • @sourabhchaudhari
    @sourabhchaudhari 2 года назад

    Thanks a lot Leila! You made my big problem go away!

  • @traciesmobile683
    @traciesmobile683 2 года назад +1

    what if the PDFs are multiple pages but with consistent front page.

  • @rosy3078
    @rosy3078 Год назад +2

    This video is fantastic! Thank you so much for creating it. I have a question, my original PDF files have multiple pages, hence, PowerQuery created multiple tables in the "Combine File" dialogue. How do you select multiple tables to be included in the sample file, as if I don't, the combined file will only have the first page of each file. Thank you!

    • @alamshar
      @alamshar Год назад

      I have the same issue

  • @becometheexcelguy4197
    @becometheexcelguy4197 2 года назад +8

    Great video Leila! Just curious, why did you opt to use “ends with” instead of an exact match on the extension filtering?

  • @excelkiclass3424
    @excelkiclass3424 2 года назад

    This Video Really Grt For me that I always stuck with pdf to excel conversion. This is a grt learning video for me .. but there a issue comes when I have customer pic in this pdf file with different format and I also want to download in excel then there is ? Comes always how do I can convert into excel with picture .. where I can solve my query to get it done in minimum time ? How can I send the pdf format to you so that u can saw the data in pdf format which i want to Convert in excel? Please reply

  • @lorimiller623
    @lorimiller623 2 года назад +1

    I work for CPAs. I haven't had good results using Query to import pdfs because records client submit to us take up multiple lines and the pages have headers and footers. Even if Query sees a table, it sees different things on different pages.

    • @chispas4861
      @chispas4861 2 года назад +1

      I feel the same. These examples are so cherry-picked. No one has a series of pdfs that look like this. I have tried to replicate many of these videos with REAL pdfs and have never had any success. Too many other items on the reports to actually make them work. That is even considering additional query adjustments to filter-out unnecessary data. Looks great, but hardly ever useful.

    • @lorimiller623
      @lorimiller623 2 года назад

      @@chispas4861 Query (along with some fairly complex formulas) can work wonders combining Excel spreadsheets and emails. But for internal pdfs, couldn't you just ask for the data as a .csv file?

    • @LeilaGharani
      @LeilaGharani  2 года назад

      @chispas - I know what you mean. This is the way I felt when I watched tutorials during my corporate job. Here's the thing though with teaching through standalone RUclips videos. We have to keep it short and concentrate on one key takeaway. Maybe 2 in each video. In this video it was the functionality itself and potential errors with file types - specially the common trap many fall into with the "first file" as sample (this is the realistic part 😉)
      There are of course so many other traps - like column headers being different or you need a mapping table for column headers etc. Each of these are separate videos - that's why my Power Query course is over 18 hours long and I feel like I still haven't covered many things.
      I also agree with Lori that these tools have limitations and if there isn't a logic we can add to our Power Query steps, then it will not work the way we want and we'll have to find other ways to reach our goals.

  • @GarthShaner
    @GarthShaner 2 года назад +9

    Holy Moley! I am thinking about how much time this would have saved me on a project I did back in 2007. This is amazing! Thank you so much for sharing!

    • @rajkumarbharathi3139
      @rajkumarbharathi3139 2 года назад

      Same, i had to wite a vb sctipt that extarct invice data from some PDF's. It took me many months to do.

    • @dougmphilly
      @dougmphilly 2 года назад

      you telling me. i look at my work history of so much time wasted typing in data. arghh. well, at least i got time and a half once i hit 50 hours.

  • @johnmckown1267
    @johnmckown1267 2 года назад

    Wow. Very informative. But one case is not addressed. That's where an associate puts in a .pdf file which is not in the expected format. No, me being me, the associate would get a stern talking to. At about 150 decibels.

  • @Sylvester.vanWelij
    @Sylvester.vanWelij 2 года назад +3

    What to do if the information is split across several pages. For example I need to consolidate the second table on each page in each of the files, but the number of pages varies.

  • @otiliam2306
    @otiliam2306 Год назад

    Great ! I am looking for a bit more complex case: multiple pdf's with multiple pages having the same format, data to be extracted for each page. Would you make a vid on this case, thank you.

  • @raimundojs9547
    @raimundojs9547 Год назад

    Great! Thank you so much for sharing. Very useful!

  • @akash0413
    @akash0413 Год назад

    In my excel sheet I am not able to find get data from pdf please help me
    Your Video is very useful

  • @VaibhavShewale
    @VaibhavShewale 2 года назад +1

    damn it's useful!
    lol iknow some people who are charging for this XD

  • @ajbp6706
    @ajbp6706 9 месяцев назад

    Hi, this is very helpful but it does not seem to work for bank statements; your example to cut to spec: how about layers of complexities like the bank statement?

  • @bizholdburg1576
    @bizholdburg1576 Год назад +2

    Very nice explanation , clear and profesional. Thank You!

  • @yutheathou8384
    @yutheathou8384 2 года назад +1

    this method only work with all the pdf file have the same table format and properties. Can you try all the pdf file have differences table format and properties?

  • @aprashant1048
    @aprashant1048 2 года назад +3

    Thanks for sharing. Definitely handy, but a core assumption is that data needs to one level of rows and columns. If not, excel cannot handle multiple levels of rows and columns. For instance, and XML, which has n-depth, cannot be easily parsed by this built-in utility in excel. One needs to code it to handle multi levels

  • @nadermounir8228
    @nadermounir8228 2 года назад

    Thank you Leila for this great video 📹 very helpful 👍

  • @dougmphilly
    @dougmphilly 2 года назад +1

    i do this all of the time. very few if any of my colleagues know this. i convert pages of monthly brokerage investment statements into excel while everyone else is typing entry line by line.

  • @sigung01
    @sigung01 Год назад +1

    No such option in my excel, all I have is get data from text, or html

  • @tommacabulostoledo
    @tommacabulostoledo 2 года назад

    i love you, Leila! You are the best!

  • @sheilaababio5934
    @sheilaababio5934 Год назад

    Thanks, easy steps. However, all the text in column A combined together without any spaces between them. Is there a way to stop that from occurring? For example ,"Numberofinitiativescompletedfromtherecentin-housesatisfactionsurveyon HQfunctions"

  • @amazon5665
    @amazon5665 Год назад +1

    when I am trying to upload new pdf files, it is not sorting properly.

  • @AbsolutelyGeek
    @AbsolutelyGeek Год назад +2

    Awesome video, saves plenty of time, thank you!
    Is there a way that you know of to import multiple similar, but not exactly identical, pdf together? Like for example, they all have the same tables, but perhaps each has the same table in a different page\order, can you, is there a way filter the tables you want by header maybe?

  • @tha0304
    @tha0304 2 года назад +1

    It only work in Office 365 or older office?

  • @MK-ni6gs
    @MK-ni6gs 2 года назад

    Excellent video! Thank you!

  • @PIDLINK
    @PIDLINK 7 месяцев назад

    I have an issue and I am not sure if there is a solution: The table in the PDF has 4 columns; column A and C contains headers and columns B and D contains corresponding information for each of the headers. For example Column A-Row 1 is (NAME: ) and column B-Row 1 is (JOHN) ... etc. All of the PDFs have exact same format but I can't seem to be to import them to a table.

  • @vitaintrasferta5264
    @vitaintrasferta5264 2 года назад

    Really useful! Thanks a lot Leila!

  • @anil_madan
    @anil_madan 2 года назад

    Thanks for posting this video.
    Will this work even for pdf files that as password protected ?

  • @peterbebbington3435
    @peterbebbington3435 2 года назад +1

    I use this an awful lot but a problem I have are documents that are scanned invoice pdfs. The quality of the pdf isnt good enough so no data can be extracted. Any solutions?

  • @MrJumardayap
    @MrJumardayap 2 года назад

    Hi Leila, Hope you also have a solution for multi-page pdf file.

  • @MohAboAbdo
    @MohAboAbdo 2 года назад

    Thank you very much, my dear teacher

  • @Rogue-Bull
    @Rogue-Bull 2 года назад +1

    Good video but Wouldn't have watched the video if you disclosed the vital fact that its only for Excel 2021/365 at the start instead of end :/

  • @esymphonykea93
    @esymphonykea93 2 года назад +1

    What about when the column names aren't the same? I know how to fix this through SQL, but is there an excel solution?

    • @LeilaGharani
      @LeilaGharani  2 года назад +1

      Yes. If they have the same order you can demote the first row from being the header and then filter it out. If you need a mapping table to match the columns you‘ll need a bit on M code (covered in a separate video)

  • @thenotidevil
    @thenotidevil 2 года назад +2

    Super awesome as usual. Thanks a lot for sharing this with us all. it's really helpful

  • @leksdic
    @leksdic 2 года назад +3

    Leila I would like to suggest a feature I found on power query. It is the ability to anti-join two queries, in order to exclude items that exist in one of them from the other. This was brilliant for me cause I had a list with all my currently open orders and one list with the orders that got invoiced today and would get updated during the nightly system procedure. This would be simple if all entries in both lists were unique, but since the lists had orders with their products, they had the order number multiple times for each product. So appending the two lists and removing duplicates was a no no. Power query came to the rescue and not only it can exclude items that exist in one list from another, but now I can automate it. I can just provide the files with the entries and the managers, that need to watch the order fulfillment closely, have just to hit refresh.

  • @CarolinaGiraldo-l4w
    @CarolinaGiraldo-l4w 7 месяцев назад

    Hi, I have a question, Can this work, for Excel 2016?

  • @Ticky66MN
    @Ticky66MN 2 года назад +2

    Wow! That was amazing. Thank you so much Leila! I'll be applying this today.

  • @PlayHitsLive
    @PlayHitsLive 2 года назад

    जो भी इस वक़्त मेरा कमैंट् देख रहा है हम सब एक दूसरे के लिए अनजान है फिर भी मैं ईश्वर से प्रार्थना करता हु की आप के लाइफ में अगर कोई टेंशन चल रही हो तो वो दूर हो जाये | और आप हमेशा खुश रहे |☺️✨

  • @Jill_Liu
    @Jill_Liu Год назад

    Hi Leila Thanks so much for your video, i have a question, if the pdf has several pages and several tables, and want to just combine specific tables from different pages can this be done by power query? thanks so much.

  • @samsurimohdsalleh4793
    @samsurimohdsalleh4793 2 года назад

    Thank you Leila

  • @notmeitsyou6913
    @notmeitsyou6913 2 года назад +1

    Most mind-blowing thing learnt from this is excel file will be prioritised over pdf files as first file in PowerQuery.
    I would be doomed and might be furstrated to start the whole project over again.😱

  • @alwarhi
    @alwarhi 2 года назад

    Many thanks 🥰

  • @recipequeen7850
    @recipequeen7850 2 года назад

    Dear leila what if you wanted to import these pdf files in one workbook but in seperate sheets

  • @chrism9037
    @chrism9037 2 года назад +2

    Wow! I didn’t know this could be done. Thanks Leila!

  • @rush2004
    @rush2004 Год назад

    Thank You!

  • @JackDaniels-xq2hu
    @JackDaniels-xq2hu 5 месяцев назад

    @LeilaGharani great video , but the "Data Source Error corrupted file" is showing on an xfile that used to work fine , it is linked to a folder with pdf monthly reports, it reads the old pdf files perfectly but 2024 pdfs aren't linkable any more despite the can be opened in acrobat with no issues, anyone had this issue?

  • @excellover7172
    @excellover7172 2 года назад +3

    This is awesome.. A big thanks for your time and efforts. For me this task was impossible until I watched this video. May you live long in good health and peace so that you can enlighten us with your latest contents in excel. Always love to spend time to watch your videos. Thanks once again.

  • @darcyg852
    @darcyg852 6 месяцев назад

    This works great if the pdf is formatted in a table but if it's in the format of a PO with no format, it's rather messy.

  • @shobhikvyas7354
    @shobhikvyas7354 2 года назад

    Thanks a lot it is very helpful. Just wanted to know if it only picks up tabular data or it can copy exact the same format of text . I mean if pdf is in not in tables and there are text in that case it can copy paste in the same format as pdf?

  • @joshuablack3163
    @joshuablack3163 2 года назад +3

    This is awesome. I would normally combine the PDFs in Adobe Pro and then export them as an excel, but that's a one-time solution. Having the ability to refresh the file as more PDFs are added to the folder makes the process look much more streamlined.

  • @wildecorrea6477
    @wildecorrea6477 2 года назад +1

    🇧🇷👏🏻👏🏻👏🏻 Awesome! I love your videos!

  • @DevaGizmo
    @DevaGizmo Год назад

    But i have multiple pages in every single pdf then how should i get the combined data

  • @mohammadayubkhan
    @mohammadayubkhan 2 года назад +1

    Hi Leila ... The video got me excited as I was struggling to find a way to convert PDF data into excel. I am using Excel 2016 but in power query i saw options were limited to Cvs , text & Excel itself but no PDF , consequently I could not convert the data. Is there i way around this problem ?

    • @jeronimolegua6606
      @jeronimolegua6606 Год назад

      Hi! answers microft has a solution that worked for me: Power Query / Get & Transform -Get data from PDF missing

  • @obsinanchannel6499
    @obsinanchannel6499 4 месяца назад

    does it work for the pdf file which taken as a photo please

  • @fallofshadows2209
    @fallofshadows2209 6 месяцев назад

    Has anyone found a way to do this on Mac? Importing from PDF or from Folder isn't available.

  • @vashisht1
    @vashisht1 4 месяца назад

    I would use adobe pro version and convert all in excel and then combine using power query. The real world problems are way too complicated.

  • @carlrigefsky9193
    @carlrigefsky9193 Год назад

    I recently imported 2 PDFs into Power Query. I need to cleanup and sort each one separately. One has 3 pages the other has 10. Is there an easy way to use the same query since both files are in the same format? Thanks

  • @bill9912
    @bill9912 7 месяцев назад +1

    Thank you, Leila! You're the best! After watching several other videos on this topic that were too complicated, too wordy or just "too-something", I discovered your video. What a breath of fresh air! Concise and successful! Can't ask for anything more than that. Your video helped me to accomplish exactly what I needed it to do. Again, thank you!

  • @FirehawkVFX
    @FirehawkVFX 3 месяца назад

    What is the minimum version of excel needed to use this funciton?

  • @arineems6059
    @arineems6059 2 года назад +1

    This is therapeutic. I feel calm watching this video ❤️

  • @ronitpearl4872
    @ronitpearl4872 2 года назад

    Wow! Amazing

  • @builderific44
    @builderific44 3 месяца назад

    I have successfully used this several times but this time no luck. I hit the double down arrows to "Combine data". A window pops up that says "Evaluating Query", it disappears, and then nothing happens. Any idea why? Using the same pdf forms I have previously.

  • @timsulzberger6658
    @timsulzberger6658 Год назад +2

    Nicely done, Leila! Excellent explanation of a complicated problem. Thank you for explaining this in such a straightforward way.

  • @user-tm3mu3gg9d
    @user-tm3mu3gg9d 9 месяцев назад

    I was so happy to have found this tutorial video! Thank you! BUT I have a problem - I click on 'transform data' and then the 'double down arrows' and a window pops up saying 'we didn't recognize the format of your first file...' can you help me?

  • @tombrown1377
    @tombrown1377 10 месяцев назад

    Amazing! You just solved a major issue for us! Quick question... How do I filter out pdf files that don't contain "Table002" in the very beginning?

  • @jeetbains8888
    @jeetbains8888 Год назад

    Hi I was not getting option in my Excel for GET DATA for import any Pdf file. Can you Plz help on this

  • @raghav8888
    @raghav8888 2 года назад

    This is great. But i need help.
    I have set of word documents in SharePoint. I need a specific table (last one) from the document to be extracted into excel. But i'm unable to do as the document is originally in DOCX format with header and footer. Don't know how to exclude the same. I tried converting DOCX to PDF and then tried importing but looks like it doesn't consider the tables properly.

  • @lawrence5304
    @lawrence5304 Год назад

    hello Leila, how to import a folder with mutiple FDF files , which each PDF file has at least two tables into Excel? Thanks!

  • @fernandomartino4708
    @fernandomartino4708 Год назад

    Excelent video, but if you have 5 PDF and 1 table from 1 of those PDF gets messed and divides 1 column into 2 because of the PDF Formatting for example? in my case 1 PDF file have a table that the 2 column is divided in 2 when it's exported to the Power Query. and it messes up the hole table. do you know how can i correct it?

  • @yp4577
    @yp4577 Месяц назад

    Great explanation! However, I now need to import multiple PDF's wand they each have multiple tables. Power query only lets me select one of the tables in a given PDF file. Is there a way around this?

  • @mozramdonclips
    @mozramdonclips 7 месяцев назад

    Can you please help if there are multiple sheets on the same workbook? I want it to convert all tables on all sheets on 9 different pdf's (4500 ~pages) but it only pulls through the first of each. My computer is powerful enough for this.

  • @dilkashgazala831
    @dilkashgazala831 Год назад

    Hi Leila first of all a big thanks to you for this video however I am not able to solve my problem 100% as I have to extract tables having same structure but present in different pages in different pdfs. I am really struggling. Please help me how can I solve this problem🙏🙏🙏

  • @zaplaylist4674
    @zaplaylist4674 Год назад

    Hi Leila! I have a problem with my bulk pdf converted to excel. My additional pdf have 2 pages, however when refreshed only the first page was reflected to the powerquery sheet. Please help. Thank you

  • @DwightMacon
    @DwightMacon Год назад

    When I go to get data from file from folder, there is no from file or from folder? Is there a specific version of exhale that you need to do this?

  • @altheapereira2374
    @altheapereira2374 Год назад

    Does this work on scanned pdf with tables? I when click.on combine files it it gives me an error saying we didn't recognize the format of the first file please filter the list of files so itcontains only supported types (text, CSV,excel workbooks etc) and try again

  • @shilpidas1250
    @shilpidas1250 Год назад

    Hi, I have a 350 page pdf file which is scanned file basically. How to convert this in one Excel file without loosing cell format? The pdf has lot of merged cell. Pls help

  • @mattsimon931
    @mattsimon931 Год назад

    Any advice on what to do when a single pdf file splits each page into its own table, example - 79 page pdf file has 79 tables. Can I use Power Query to combine the pdf tables into 1 excel table?

  • @namratakasina1835
    @namratakasina1835 Год назад

    Hi! I was trying to convert the bulk pdf files to excel , the pdf which I have contains text and tables , I'm getting the error of "format not recognized ", kindly help on how to resolve this issue

  • @anandkk4952
    @anandkk4952 2 года назад

    Is there any alternative for sequence function. Excel 2019. Plz reply

  • @sharontang5173
    @sharontang5173 Год назад

    may I know what to do, if my Excel under Microsoft 2019 did not recognize PDF after I click the double down in 1:13?

  • @amishanarsingani2307
    @amishanarsingani2307 Год назад

    Doesn't work for me, says "the uploaded folder extension is not supported, provide supported extensions such as txt,csv,excel workbooks,etc "

  • @att7912
    @att7912 11 месяцев назад

    Very nice. I tried this and works very fine with single page pdf files or meaning excel extracts the information from one table in the page which one choses in the sample. I have been trying to bulk extract different types of table information from multiple page pdf files. Let us say in one page there are several tables in a multi page pdf and you have many pdf files which have the same table type inside. Is it possible to import all tables from all pages and from all pdf files at once? It would be very helpful if you do one video on with such examples 😉

  • @luckymunkondya1237
    @luckymunkondya1237 2 года назад

    Hi Leila Gharani I'm getting this error when I try to load a .pdf file - We encountered an error while trying to connect. Details: "Pdf failed to load with result: Corrupted"

  • @ArpanAnand-v7p
    @ArpanAnand-v7p 8 месяцев назад

    awesome! I am facing issue because some of my PDFs have 3-4 pages and it is only picking 1st page... what's the solution?

  • @MrRedcecil
    @MrRedcecil 6 месяцев назад

    How can i import choises input (from dropdown menu) in the pdf to excel?... btw... love your content...