Scrape Data from Multiple Web Pages with Power Query
HTML-код
- Опубликовано: 28 июн 2024
- Automatically scrape data from multiple web pages with Power Query using a custom function. Download the file here: www.myonlinetraininghub.com/s...
View my comprehensive courses: www.myonlinetraininghub.com/
Connect with me on LinkedIn: / myndatreacy Наука
That is absolutely brilliant!!! I have spent the last two days trying to figure out how the do it in Python and within 8 minutes you showed me a much easier straight forward way.
:-) so pleased it was helpful, Robert!
@@MyOnlineTrainingHub can i download epaper into pdf without coding?
lol... literally me too.. i got quite for until python was reading arabic webpages in hex and then i thew my laptop out the window!
Brilliantly framed and well communicated. Thank you again Mynda.
Thanks so much, Michael!
Thank you so much for this video. Very practical for my Data Analyts journey. I followed the steps and didn't tun into any errors.
So pleased you found it helpful!
It's like you have read my mind because I was looking to scrape data from web like this currently. Thanks for the tutorial it's really helpful.
My pleasure, Abdul!
Thanks Mynda, there is no way that I would not like this video. It's awesome.
Thanks so much, Awesh! And thanks for sharing it on LinkedIn :-)
I watched this video on this teachers day, and I believe you are one of the best teacher could help me on web scraping... 🤗
Wow, thank you!
Wow...Easily used this tutorial to query printer settings from every Zebra printer on my LAN. Very helpful!
Awesome to hear, David!
To think I was doing this manually 🤦🏽♂️. Thank you, this is a huge time saver!
Great to hear, Geoffrey!
Wonderfull tutorial! that was exactly what I Looking for. I was duplicating datasources for each week to scrap some web data. Thanks a lot!
So pleased it helped, Fabio!
This is an amazing way of working with web pages. I have seen people write lengthy macros and Python code for this.
Yes, Power Query is super easy to use. I wish more people knew of it's powers ;-)
Great video and so clear with the explanation! My researching will be much easier now!
So pleased it was helpful 😊
Great video mam!!! I was doing this before python then saved into csv then importing to PBI. Now I can do with PBI directly 👏👏👏
Wow, that's fantastic to hear :-)
I wasn't even aware that M/Power Query can be used to such extent. Thank you for the great insight!
Glad you enjoyed it, James!
Wow you're amazing! Can't believe this information is free! Thank you so much!
You're most welcome, Carlton!
I slightly adjusted this to scrape data from a folder full of PDF files. Excellent thanks!
Glad it helped!
Thank you!
It is hard to study in Korea because there is not much data about powerquery.
Thanks to this, I integrated several post api into a single query.
Pleased I could help!
Wow amazing what you can do! Thanks a lot mynda I’m always learning from you!
So pleased to hear that, Maria!
A bit beyond me at this point Mynda, Power Query is on my "to learn" list. Well presented.
Thanks, Dave! Power Query is amazing...I'm confident you'll think so too :-)
You are the legend! Helped me to solve this greyed out "change data source "button
Great to hear, Vincas!
That's remarkable; this is like the limit of most peoples Python learning, and most co-workers would consider them "dangerous" with those Python abilities. (in the most professional and excellent way of course!)
:-) Glad you liked it, Mark!
Thanks a lot for this tutorial! I could get mutiple api call in single query, best solution ever!
Glad it’ll be useful!
Excellent tutorial, super easy to follow. That’s brilliant 👍
Glad it was helpful! 🙏
Hi Mynda!Great Tutorial,Just Learnt Something New So I Can Have More Fun With POWER BI..Thank You :)
Great to hear you found it useful, Darryl!
Thank you very much. I'm from Turkey. Have a nice day.
Thank you! You too!
Very helpful, a lot less complicated excel formulas in my life now, shame that challenge has gone but I had to think a lot about my queries.
Don't be sad that the challenges have gone...there are plenty of new challenges awaiting; M code, DAX, dynamic array functions :-)
Amazing how it is easy to scrape web pages. Thanks for this excellent tutorial.
Glad you like it, Marcel!
Exactly what I was looking for, thanks ! great video
Glad you found it helpful 😊
Great video thanks this makes web scraping a lot easier. Thank you.
Great to hear!
Wooww thank you so much, took me months to find this function. I will try it in a more complicated webpage. thank you
Glad you can make use of it! 😊
YOU ARE THE BEST!! Saved me so much work!
So pleased I could help 😊
Great info, easy to understand. TYVM! I'd love to learn how to do all of this in Google Sheets. Power Query sounds cool!
Glad you liked it, Stephen! Sheets doesn't have Power Query.
Well Done Mynda
All Your Videos Are Useful
Cheers, Shakira!
I knew this was possible, but ran into some errors while trying to do it on my own. Thank you very much for the great tutorial. Now to let Power Bi Spin!
Glad it helped!
spot on. I was only able to do the first page. This gives me the ability to do an entire site.
Wow, this is clever and exactly what I needed. My mind is blown !!
Awesome. Glad I could help 😊
This is simply awesome, now I have to practice this technique.
Enjoy!
Clear Voice, Beautifully Explained Super-woman.
Thank you so much 🙂
Brilliant! Many thanks, Mynda.
Cheers, Ian!
Marvelous ! You make it so easier, Thanks a lot
Thank you! Glad to hear that!
Awesome use of M for us tiptoeing into the M Script!
Glad you liked it!
Mind blown. This is awesome. Thank you.
Glad you liked it 😊
I have never tried this, but I frequently convert data from the csv file to the html Datatable, Thanks Mynda.
Hope you can make use of it, Reda!
Web scrapping, oh I love it 😊
Great to hear :-)
Thank a lotu it was terrific, I'am from México.
Glad you enjoyed it, Arturo!
Great Video... Thanks for the efforts and sharing it. this will be very useful for many tasks...
Great to hear!
As always, an excellent tutorial
Glad you liked it!
Super cool video, thanks Mynda
Cheers, Chris!
Nicely explained, loved it.
Thanks so much, Rakesh!
Beautiful. It's solved my actual problem. Thx. :)
Great to hear!
Very useful! Thank you!
Awesome to hear!
thank a lot, that was really really useful. you solve my very big problem. 🙏🙏🙏🙏🙏🙏
Awesome to hear! 😊
Amazing, thank you 👌🏻👌🏻👌🏻
My pleasure, David!
Wow. Amazing video, thank you!
Thanks so much, Adam!
Excellent as usual...
Thank you so much 😀
First comment. thanks for the video
Winner :-) hope you found it useful, Anwar.
Thank you soooo much! You changed my life this weekend. Been struggling with Excel's limitations for years, and lost countless hours of my life sometimes without even accomplishing my goal. I only discovered the existence of Power Query last night with your video, and you blew my mind. A brilliantly well presented and comprehensive video on it too! It got me partway through my current problem, but now I'm stuck again if you can help?
I've created Query1 to gets multiple tables from each webpage with 10 records each , and includes a record ID. But each record has a link to a details page for more info for that record. The record ID is used within the URL string to get those details. Can I create a single query that collects the list of records and uses the ID to also collect the details for each record all in one go?
Also, with 30,000 records in total, it takes hours to refresh. However, as the historic records don't change, and have a historic date of filing, is there any way for future updates to only get and append the latest records (with a filing date after the last date of the previous dataset, whilst removing any duplicates, and append it to the list?
Finally, it would be great if a timestamp could be added in an additional column to denote the date when that query was run, so that I can easily see which data has been added and when. Is any of this possible with PowerQuery?
So pleased that my video was helpful! Please post your questions and sample Excel file on our forum where we can help you further: www.myonlinetraininghub.com/excel-forum
AMAZING!!! THANK YOU SO MUCH!!!
My pleasure 😊
This is top notch stuff thanks
Glad you enjoyed it :-)
Useful video. Thanks
Glad it was helpful, Ali!
Thus is AWESOME!!
Glad you found it helpful!
on point, awesome video
Glad you think so!
Thank so much for your all playlist videos tutorial..awesome...please add post tutorial for making network or system monitoring with excel..many thks..:-)
My pleasure :-) I don't have any data I can use on network or system monitoring, sorry.
@@MyOnlineTrainingHub tysm for your feedback, nope with many thank..
you are a genius👏
Thanks for your kind words, Djamel!
Gorgeous!
Thanks, Tammy!
I don´t know you, but I love you. thanks!
Glad it was helpful!
Hi Mynda.. another great example and technique. Thanks for sharing it :)) Thumbs up!!
PS - Any idea when the Add Table Using Examples feature will come to Power Query in Excel in Microsoft 365?
Thanks, Wayne! No idea when Excel will get Add Table Using Examples :-( it has been available in Power BI for quite a while now, but that doesn't seem to mean anything.
Amazing video and awesome ideas that I incorporated instantly! Quick question, how would you go about making each "page" into a separate query (each page a query on its own)?
Glad you liked it, Mohammed! To make each page a separate query, you'd have to create them one by one by pasting in the URL for each page, or copying the query and modifying the URL to point to a different page.
@@MyOnlineTrainingHub Thanks for your prompt reply!
I love you. You're a genius...
Glad you enjoyed it, Robert!
@@MyOnlineTrainingHub I spent a couple weeks a year ago trying to do just this. Code, code, code was all I got . !
I love Excel. I could give up my day job if I could and just do Excel....
Power query is amazing like you 😎
Aw, too kind, Arturo :-)
Thanks in a million.
My pleasure 😊
This is so cool!!!
Glad you enjoyed it :-)
Great example
Thank you!
Scraping with PowerBI, hopefully it'll be fully enabled in Excel!
Fingers crossed, Doug!
I still need to watch this video a few times. Our entire organization dont know this i bet
Glad it's helpful!
Hi! Your tutorial is very clear. However, what if the web page you are trying to access needs your credentials first? Do you know how I can go around that? Thank you!
Very good 👍🏻
Cheers, Graham :-)
This is amazing! Thank you so much! How do I enter more than one variable this way? The function that I created has multiple variables, but it only asks for the first variable. 😣
Please post your question on our Excel forum where we can help you further: www.myonlinetraininghub.com/excel-forum
Brilliant!!!!!
Thanks so much!
Wow, very interesting
Glad you think so!
Thanks hun!
My pleasure!
Amazing!
Glad you liked it!
Great video! Extremely useful
It works in my case, but only for first 19 sheets out of 89.
Starting from 20th sheet i get a blank page without any data, however i can see pages from range 20 to 89 via browser.
I would appreciate if you show how many pages could be exported in your exact example
Sounds like the web site is throttling the feed so you can't get the data. Not much you can do about this, other than try splitting the task into multiple queries and run them one at a time.
@@MyOnlineTrainingHub
Goods News 1 ..... Solving the throttle problem. When PQ and BI won't work, I use Instant Data Scrapper. It's a free Chrome extension and works 95% of the time. It let's you set a time delay to go to next page. I usually start at 12 seconds then decrease the delay 1 second every 100 pages or so to about 4 or 5 seconds. Most I've ever done it scrapped more than 40,000 pages on a website.
It scrapes only when the webpage is active. So if you navigate to a different webpage tab it pauses. To restart scrapping simply make that page active, ie displaying, and click Start Scrapping. To prevent pausing, simply drag the webpage to be stand alone before starting IDS.
Goods News 2 ..... it does something that PQ and BI don't do. It extracts embedded URLs. Say email addresses are embedded in people's names. PQ and BI will import the names (as plain text) but I've never figured out how to get them to extract the embedded email address. IDS does extract the embedded URL.
Bad News .... IDS doesn't connect to the website so you can "refresh" the query like you can with PQ and BI.
brilliant idea
Glad you think so!
Great 👌👌👌
Thanks 😊
Hi Mynda
Your videos are gr8, if you can make video Power Query and Google sheet data like this one.
I don't use Google Sheets ;-)
@@MyOnlineTrainingHub No worries
It's usefull. Thanks you. I am looking for silimilar data scraper software. Do you mind to show me how to work with power BI in the case with differences website please.
Hi, Professor, you have provide in a lot of effort for these videos, thank you, I have a question in excel, if we have for example in cell "a1" the number 10.00 m, how to have this number with the same format in cell "b1 "using a text function or some other function, thank you very much.😃👍
Thanks, Mohamed! I'm presuming the value in A1 is a number, in which case you can use this formula:
=TEXT(A1,"##.00,,\M")
Good evening Professor, I thank you for the answers that you sent me on the Internet. May God protect you. Suppose we have in cell a1 = "excel", in cell a2 = "is", in cell a3 = "fun", in cell b1 = 12.00m, in cell b2 = 10.00gr, in cell b3 = 15.00kg, use the vlookup function: vlookup = ("is", $ a $ 1: $ b $ 3,2, false), the result is 10, which means that This function didn't give me the full format of the number in cell b2 (b2 = 10.00gr), but my goal is to get b2 = 10.00gr and not b2 = 10. thank you so much.😃👍
Please post your question and sample Excel file on our forum where we can help you further: www.myonlinetraininghub.com/excel-forum
@@MyOnlineTrainingHub Good morning Professor, I have emailed you an excel file containing a question and comment on the question, thanks a lot for the help.
Thank you very much for bringing this brilliant video. I do have a question, if I am not used to using Power BI and still want to use excel to extract web data like you do in this video, how do I do that?
Hi Linda, Power Query in Excel doesn't have 'from web by example'. Your best option is to use 'From Web', but that will require the data in the web page to be stored in a HTML table. You'll know if it is, because you'll be able to see the table in the preview in Power Query.
@@MyOnlineTrainingHub Thank you very much for your prompt response.
like it thanks for sharing
My pleasure :-)
Wow. !
Glad you enjoyed it :-)
Excellent work. just a question, when i try to refresh it in the system it doesn´t allow. indicates "This dataset includes a dynamic data source. Since dynamic data sources aren't refreshed in the Power BI service, this dataset won't be refreshed", any workaround?
Thanks!
Welcome!
Thank you, Madam, for this useful info! But I need to know "How can I scrape pages which have infinite scrolling using power bi?". looking forward to your suggestion!
Hi Hari, if the pages are loading the data on the fly and the data isn't in an html table or visible on the page, then I'm not aware of a way to do that, sorry.
What a great tutorial.. I've tried your tutorial and it works!
However, when I upload the app to the Power Bi service and I set the automatic refresh schedule. There is a failure notification for automatic refresh for dynamic data. did you also experience this?
Currently scheduled refreshes for queries where the data source is part of a function aren't supported. You should be able to manually refresh though.
Awesome! I managed to import a table for 1 page from a URL. It is a list of books unfortunately the number of books per web page varies. Is there a way to handle the issue of generating each page number in this case? As a backup is there a method of exporting all pages to a csv file and Load & Transform the csv back into PBI or PQ?
Glad it was useful, Charles. In terms of figuring out the number of items on a page, I'm not sure there's any way to do that in advance of accessing the pages. Whether there's a way to export the pages to a csv file would be down to that website and whether it offers that as an option. It's not something Power Query can do.
Many thanks for the video! What if I have two variable names? My URL includes both a year and a quarter. I created the two variable names but how do I invoke the function to take all quarters from every year?
Make a table containing the string made up of the quarter and year components and whatever other characters form that section of the URL, and feed that into a single variable.
Thanks a lott!! I was wondering if the web page is updated would the loaded data in power bi update too (so basically if it's real time or not)
Only direct query datasets can refresh real time, however, you can schedule refreshes at set intervals.
I'm currently new to this stuff. I see that you might be able to customize queries for specific data scraping tasks. I'm interested in learning ethical data scraping techniques. How would it be possible to create custom scraping software? What computing language would you recommend learning in that case?
I can't answer that question, Louie. I know nothing about creating custom software.
@@MyOnlineTrainingHub Thanks Anyway
Incredible overview, thank you so much! Is it possible to do this if you have a site with multiple pages that uses the same URL? I'm trying to scrape data from a public site with multiple pages, but all of them use the same URL - there are no unique identifiers (e.g. page numbers). Any assistance would be greatly appreciated.
Glad it was helpful. Unfortunately, if the site's URL doesn't change, then you can't scrape the data with Power Query.
@@MyOnlineTrainingHub Understood. Could you recommend any other options that may be helpful? Thank you in advance.
Only to say that if you know JavaScript (I don't) you can write some code to change the 'page' displayed so you can get the data.
Thanks for this great video. What if you want to loop through dates?
You can use the same technique to loop through dates.
Thank you for the video, very informative...
How long does it take to scrape the data from the 21000 pages?
I never let it run to find out how long it takes. I'd expect the website would think it was a bot and throttle the query anyway.
@@MyOnlineTrainingHub - Hi, on this point only- i have a query when I tried doing this for 2000 pages, data is coming only for few pages and rest is showing null after the Invoke function where as data is there on the web for that page.