How to Web Scrape Data in Google Sheets! (Import HTML)
HTML-код
- Опубликовано: 19 июн 2024
- Access Tickerdata and my Spreadsheets: tickerdata.com/
In this video, I take you step by step on how to web scrape data in google sheets. I focus on following three main web scraping strategies in this video:
1. Scraping full data sets
2. Scraping individual metrics from data sets
3. Automating the web scraping process in google sheets
The HTML code I use to inspect tables is the following:
var i = 1; [].forEach.call(document.querySelectorAll('table'), function(x) { console.log(i++, x); });
If you are trying to inspect a list instead of a table, use the following HTML code:
var i = 1; [].forEach.call(document.querySelectorAll('ul,ol'), function(x) { console.log(i++, x); });
The google sheets codes used in this video are below:
=IMPORTHTML("en.wikipedia.org/wiki/List_of...","Table","2")
=IMPORTHTML("finviz.com/quote.ashx?t=AAPL","Table","7")
=(index(IMPORTHTML("finviz.com/quote.ashx?t=AAPL","Table","7"),7,2))
=(index(IMPORTHTML("finviz.com/quote.ashx?t="&A3,"Table","7"),7,2))
I hope this tutorial was helpful! Let me know if you have any questions. Don't forget to like and subscribe to the channel!
Link to download my portfolio tracking spreadsheets: / dividendology
Get 50% off of Seeking Alpha Premium!
www.sahg6dtr.com/9D5QH2/R74QP/
The microphone I use: amzn.to/3ngqbVz
*UPDATE
Finviz likes to occasionally change the order that their data tables are listed. At the time of this video, you had to import table 7. It is now table 9 that should be imported. So the new formula will look something like this:
=importhtml("@t
I haven't used this sheet in a while but I'm trying to give my investments a renewed focus. When I came back to it, the finviz web scrapes weren't working. Your comment solved this. However, now the data appears like this *1.50* so that the other cells don't recognize it as a number to complete the other calculations. Not sure why this is as I'm not seeing the asterisks on the table on finviz. I've tried the Find and Replace function but that is not working.
They constantly change the last part of the url to mess up the scrapping formula
you are a life saver. I was trying to build an web app to analyze data . You just save hours of work. Thanks mate.
Glad I could help!
Incredibly useful technic, I shared it with some friends that would benefit from it. Thanks for the videos, I just subscribed to your Patreon, fully deserved.
Thank you so much! I’m very glad it helped. I hope my future videos will provide you with lots of value as well!
Been watching all your videos. Great content man
Even 2 years later this video was super helpful. Thank you!
Glad it helped!
This is a great video and tutorial. I made a Google Sheet and followed the steps to a "t". It was very concise and informative! Thank you!
Glad it was helpful!
Holy crap. This is such a good tutorial, I'm impressed by how easy it is to follow/implement, and it looks really useful for a indicator/tracking automation project I've been thinking about for a while. If you don't have another Sheets/Excel/Technical tutorial type channel, you definitely should.
Thanks! I’m glad you enjoyed. I definitely take advantage of web scraping in google sheets for a lot of my portfolio trackers.
This video needs way more than 946 views. This is a tremendously useful technique!!! Thank you so much for sharing this.
Thank you! I’m glad it helped.
Yo, best tutorial ever! Super handy Table code and straightforward solutions! I used to do these in desktop excel with ''Data Query from Web'', no idea it calls ''Web Scrape Data''. I bet more people will find your video if you put the ''Data Query from Web'' somewhere in the description. Thanks for the video!
Thank you! I’m glad it was helpful! And thanks for the tip! I will definitely add that in the description so more people can find this.
It took all day, but I got all this in a Google sheet today. Table 8 has changed to 9 now. I had to keep changing the number until I found out it is now table 9. Then on last tab, I put in the tickers for all my portfolio just to be sure. Thanks man. Now I can start the PortfolioDividend Tracker Video.
Excellent! Great job!
Thanks for the extremely useful video.
Great share./ Always struggle with this. Thank you for a great explanation. Stay connected.
Very helpful. This was my first web scraping and I'm no computer expert by any means.
As of Nov 2023 this is now table10.
Nice!
Great tips. Didn’t realize Google sheets had this capability. Will be helpful to rather data
Glad it was helpful!
This way great thank you so much!
You're very welcome!
Any attribute or option to mantain the original format when you import the data from Finviz Table? Thanks in advance!
Thank you sir, could you direct me on how i can store the changing Apple price per minute in a row whenever an automatic update takes place.
I'm looking for a way to validate data that I get from yahoo finance using importhtml. For example if you look at an ETF in Yahoo finance and say I'm looking for dividend yield. So there might be 2 tables where the yield in in second table. So if there a way to say if (importhtml for table 2, A1="Yield", then importhtml...A2 field, so it brings back the yield only if it's the correct line? Great videos btw, learning a lot from your videos.
Great video, how do you scrape something that you can not find within the console? Are images scraped the same way? Im trying to scrape the heatmap from finviz. Its tables 11-14, but whatever series of numbers I can not find my info. Can you help with this? Its the image to the right of the losers.
Great video and well paced, I learned alot. Any chance you have ideas on how to obtain data for International stocks from countries such as in Singapore and Hong Kong?
Thanks! I will look into how to do this.
This is useful, thanks
What about the links that have the ticker at half part through the link? How should we alter the formula? So it's like a specific word dash ticker and after that slash and some other words.
I’ll try to address this in a future video!
How do i adjust the formulas is the value from A2 is in the middle of the url
Very cool, its working. I got a problem, i cant' use the data i guet from the function "IMPORTHTML" because these are text data.
How can i transform them into numbers data ?
Can anyone recommend how to scrape macro trends for a ticker’s free cash flow table? The console function doesn’t work on their site.
Hii, i fetch option data from money control. But here is erroe that "imported content is empty". How to solve this?
Thanks for the video! Do you know how to remove the stars (asterisk) in the output data? Simply using the find and replace function doesn't work...
Yep! You can use the substitute formula!
Do you know how we could scrape data from Macrotrends? I tried to run the line in the description, but it seems like they have their data in divs, and not on tables.
I’m cases where the data isn’t in a table or list, you will typically have to use the =importantxml function. I used this formula to import industry data in many of my portfolio tracker tutorials. I don’t yet have an in depth video on this formula yet though.
I need Ex-dividend date and Annual payout data. How do I get that?
Do you have any recommendations for scraping website tables that are larger than what importhtml can handle?
There are some options, but I'll have to find what's best. Maybe I'll make a video on it in the future.
Hi, thanks for this great video.
Question: How can I scrape data from what appears to be an embedded Google Sheets spreadsheet that is on a web page? Thanks.
I havent run into that before. I'll have to look into it.
Currently finviz is telling me N/A after some time even though I have everything correct. What could it be please?
how do i import a table that extends in 7 pages?
Hello Dividendology, would you help me on how to get the eps of stocks from last year,(2021) these year estimates(2022) and the following year 2023 estimates?
I am running a project and I feel I just hit a wall with those metrics .
thank you
Also , I will like to extract data from multiples sources and have it in an automated mode. The idea is to create a google sheet with a list of indicators that update themselves as soon as new data is posted. Any advice on how to proceed? Thank you
lit
This was great. I know nothing. And was able to figure this out. Thanks so much.
The only thing is I have the * symbol before and after my data.
Any ideas?
you can use the substitute formula to get rid of this!
@@Dividendology I don't know what that is.
Thank you. Really helped me get what I want.
That’s great!
I am trying to extract the status of the open or closed market that shows up on Yahoo Finance page. Having Zero luck..
Im creating a custom way to do this! I'll release a video soon!
I think they changed the way the script works now, for I am getting errors about a parameter error. I am sure I typed the correct formula. Thoughts?
Hey Kevin, can you send me your formula so I can review it and help you?
Dividend:
It seems finviz has changed the way to pull dividends from their website. I have corrected the spreadsheet and it now works. Here is the updated formula:
=substitute(SUBSTITUTE(index(importhtml("finviz.com/quote.ashx?t="&A4,"table",7),7,2),"*",""),"-","0.00")
@@Dividendology Thank you for the updated formula! I've been searched for this for quite a while.
Hi, thanks for your work, it's been very helpful. The only problem that I have is that my website data are not refreshing. How can I fix that ?
Thanks for your return
Google sheets will typically auto update every 20 minutes or so. I will look into this and see if there are other options for a quicker update time.
@@Dividendology Thanks indeed I agree for 20 minutes :) it's enough for me use :)
Should be able to just refresh the browser and have it update. Otherwise try to - go to File | Settings - and under the Calculation tab, you can use the pulldown to set refresh to On Change and 1 minute or somesuch... that works with the TIME related functions - so somewhere on your page just throw in a cell with "=now()" and the whole sheet may update again. You might need to exit and open the sheet again after the calculation tab change.
Why do European stocks not work in sheets? Example ams:nn
Working on an update for that!
Is there a way to remove the asterisks?
Yes! Use the substitute formula.
What about multiple table
Has anyone tried to scrape from Robinhood? I think that may be an unscrapable site? Are there restrictions to what can and can’t be scrapped.
It's possible. Some websites have restrictions.
Finvest again changed something, dividend data isn’t working. I tried everything you suggested, nothing work. Pls check.
I’ve made the correction! You now have to pull table 9 from finviz.
@@Dividendology You missed the formula in pinned reply.
how about for divs ?
check out my importxml video!
Does not appear that the formula works. I tried dozen of different table and nothing seemed to work. This was great and would be exactly what I wanted but since this was created it appears that something changed and this no longer works. If you could post the new formula format that would be awesome.
Hey Joe, I posted a video on how to make a stock watchlist in google sheets a few days ago that has the new formulas!
it doesn't work with some websites
Anyone else have the issue where it just says [TABLE]?
Yes. It is now table 9. Not 8
i guess this is when data is inside a table, but in case it isnt ?
The main focus of this video was for tables, but if the data you are wanting to scrape is in a list, the only change that you have to make to your formula is to switch "table" to "list" in your google sheets formula.
There are also other ways to go about web scraping in google sheets that I may cover in future videos.
@@Dividendology can we extract a single piece of data that isnt contained on a liis nor a table... like in a div or maybe an H1 ?
That is possible! I use the importxml command for that typically. I may make a video on that in the future.