Web Scraping: HTML Tables with Python

Поделиться
HTML-код
  • Опубликовано: 6 янв 2025

Комментарии • 122

  • @rverm1000
    @rverm1000 3 года назад +17

    Your web scraping abilities are much better than udemy im taking.

    • @dennyswandia4462
      @dennyswandia4462 3 года назад

      hello John. would you please do a video on scraping & extracting URLs and following those URLs to extract information from those URLs?tried following your video on false plants but I was scraping amazon. my code kinda returning empty CSV

    • @armanikayden3337
      @armanikayden3337 3 года назад

      i guess Im asking the wrong place but does anyone know a way to get back into an Instagram account..?
      I was stupid lost the account password. I would appreciate any tricks you can give me!

    • @dennyswandia4462
      @dennyswandia4462 3 года назад

      @@armanikayden3337 you remember your email address? You could just click forgot password and enter your email address. A link to reset your password will be sent to you

  • @DJMrTen
    @DJMrTen 3 года назад +6

    I've been trying to figure this out for way too long. This quick vid got me rolling in under 15 minutes! Thank you John.

  • @e.eiyayi2381
    @e.eiyayi2381 2 года назад +4

    You were able to transfer your knowledge spot-on, which unfortunately too many geeks fail at.

  • @bencole8301
    @bencole8301 3 года назад +2

    Hi John, I took an Udemy course in Python and got slightly off track (I didn't really want to say bored! but now I have, ooops!) about halfway through but your videos are awesome. Feel like your filling in all the gaps and I'm enjoying a very straight talking no messing approach you take to tutoring. Thanks for the videos.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      Thanks Ben! I’m glad you’re enjoying my videos

  • @ryan-zd4jm
    @ryan-zd4jm 3 года назад +1

    Thank you!! I love how you explain your mistakes in a way that actually makes sense.

  • @stonebanksyt
    @stonebanksyt 3 года назад +2

    Thanks for this tutorial..i was so confused earlier..i finished my project because of you..thanks a lot

  • @127bits7
    @127bits7 3 года назад +1

    dude this was the best video! thank you so much John!

  • @paulfearn7571
    @paulfearn7571 4 года назад +1

    good work john - very clear easy to follow video and voice over - thank you keep up the great work

  • @Guitarfreek27
    @Guitarfreek27 4 года назад +11

    Hello, first of all, great video. Question:
    The URL I am trying to scrape from keeps giving me "None" when I try to find the table class. Ever run into this problem? Any suggestions on what I can do to fix? Any help is much appreciated!
    My code is something like this:
    table = soup.find('table', class_ = 'class_in_question')

  • @areijkandi5424
    @areijkandi5424 2 года назад +1

    Perfect. This is what I am looking for. Thank you so much. You deserve 1000,000 likes

  • @alphagam3r933
    @alphagam3r933 3 года назад +1

    love you sir literally love you, this video is very useful in my college project, next level video

  • @dhiahmila9549
    @dhiahmila9549 2 года назад +2

    You could also use pandas instead of a loop: pd.read_table

  • @xedifice4421
    @xedifice4421 3 года назад +3

    Exactly what I needed, Thanks a lot!
    Go Gunners!

  • @cherico94
    @cherico94 4 года назад +2

    Thanks man. This really helped me out with what I was trying to figure out for days.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      Thank you I’m happy it helped!

    • @mhm6
      @mhm6 4 года назад +1

      Same I spent a whole day not being able to extract data from websites with multiple tables because I didn’t know how to access specific classes. Now I know it’s class_ = “ “
      Thanks!

  • @felipelandim2881
    @felipelandim2881 3 года назад +1

    Finally a Web Scraping of Soccer Tables!

  • @drravindraboojhawon3832
    @drravindraboojhawon3832 3 года назад

    How to scrape data (tables) from a webpage having different tabs which gets activated and which present data only when you click with your mouse? Thanks

  • @paulohsgoes1959
    @paulohsgoes1959 4 года назад +1

    Excellent job, John. Congrats!

  • @aqibsunesara
    @aqibsunesara 2 года назад

    My html does not have td class. The tbody has multiple TRs. Each TR has a TH and TD and I want to extract each of those TDs.text. Can you help?

  • @jimmykarago2598
    @jimmykarago2598 3 года назад +1

    Thanks for this I was able to scrape all the data I needed

  • @nahakuu
    @nahakuu 4 года назад

    I wonder how to get html table behind login, i am looking way hot to create a python app to gather data from our database, as they do not want us give access to SQL for easier data processing. I would use python to gather all data to excel.
    But i am failing to get to the table as i need to log in first.

  • @JamesTangGunner
    @JamesTangGunner Год назад +1

    CMYG
    We are top of the league! Say we are top of the league!
    Very helpful video. Love it

  • @melih.a
    @melih.a 3 года назад +1

    when you type in rows = team.find_all('tr') it doesn't register .find_all? I don't know what i'm doing wrong here

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад

      I think the website I used here has changed and won’t work with this method anymore, it’s not finding anything that matches your find_all

  • @geekboy77
    @geekboy77 3 года назад

    What shall I do if my table shows 50 rows per page and I want to extract all its data

  • @chamopediapedia4888
    @chamopediapedia4888 4 года назад +1

    How do you download a table and converted it into a csv?

  • @shacharbard1613
    @shacharbard1613 Год назад

    great video John. thanks!
    regarding extracting the teams and their current points, I tried "pl_points = row.find_all('td')[9].text" and it also worked.
    is this because what matters here is the "td" index? and the reason to include the class name is to have code which is clearer?

  • @kevj2001
    @kevj2001 3 года назад

    Hey, my soup is not able to find the required table, what to do :/ ?

  • @karmantan
    @karmantan 2 года назад +1

    Hi, what happens when an element is not visible in the page source, but only when you click on inspect?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      It’s loaded by something like Ajax via JavaScript - I have a few methods on my channel for this if you’d like to have a look, the newer ones are generally better

    • @karmantan
      @karmantan 2 года назад

      @@JohnWatsonRooney Just watched the video! That was really helpful, thank you!

  • @gatorpika
    @gatorpika 3 года назад +1

    Thank you! This is just what I needed.

  • @gawd2891
    @gawd2891 2 года назад +1

    How can I access which doesn't have any classes, same for the table tag

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      Use find all to get all the tables, then index the one you want, the same for the td tags

    • @gawd2891
      @gawd2891 2 года назад

      @@JohnWatsonRooney Thank you so much John 👏

  • @jordanleo
    @jordanleo 4 года назад +2

    what if the text we are looking for is not in a class, rather it is just South Australia? what would i do then?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      Hi - does the table not have a name or an ID of some description? If not you could try just using find all for ‘table’ and see what you get. You can send me the url if you’d like some more specific help

    • @jordanleo
      @jordanleo 4 года назад +1

      John Watson Rooney thanks that worked! Great video!

  • @sadettinbacanli4895
    @sadettinbacanli4895 3 года назад

    thanks a lot for this tutorial but I have a problem in my project in tds which came empty list. ı have written same code with you but I get empty list.

  • @LLFRA
    @LLFRA 3 года назад

    keep getting the error :AttributeError: ResultSet object has no attribute 'find'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?
    when looking for tbody?

  • @prastutnepal7137
    @prastutnepal7137 4 года назад

    Great video! I tried doing the same thing on fantasy premier league's website but when I viewed the page's source code, all I saw was cryptic lines of code which was nothing similar to what I saw on your video and I coudn't search for the HTML elements. The page's source included numbers separated by commas and nothing else. Please help me out on this.

  • @francesboy2
    @francesboy2 3 года назад

    I'm really sorry to ask but I'm at wits end. The html I'm trying to scrape has two tables but they have the same class name, and using find() only returns information from the first table. I can't use find_all() as it's throwing the following error:
    "AttributeError: ResultSet object has no attribute 'find_all'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?"
    How do I search through both tables? I don't know if it's just the html that I'm trying to scrape or what, but it's the most infuriating process to what should be a really easy project.

  • @ollie_har
    @ollie_har 3 года назад +1

    Hi there, I am trying to scan in the second table for a website and it has the same class. How would I clarify to Python that I want the second table and not the first?
    I have tried to add the title into another .find() field but it returns 'none'.
    Thanks in advance!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +2

      hi! I think you need to either find the table element specifically by its class or ID, or find_all and index the table you want using [0] or [1]

    • @ollie_har
      @ollie_har 3 года назад

      @@JohnWatsonRooney ahhh thank you for your help. Excellent video by the way, very clear and really well explained!

  • @sujithsaikalakonda4863
    @sujithsaikalakonda4863 Год назад +1

    Great explanation.

  • @shortsgrower
    @shortsgrower 4 года назад

    Thanks for the video, I am able to scrap the data from table, but I am unable to parse the Table information. I feel the all the item tags are seems to be similar, Can I get your help?

  • @robertocell3694
    @robertocell3694 3 года назад

    how can I extract information from this table
    how can I extract information from this table
    362198

  • @ronitpithani2661
    @ronitpithani2661 4 года назад +1

    what if the html table is formatted like such
    2020-08-07
    10.09
    0.00

    • @johnteres2339
      @johnteres2339 4 года назад +1

      I have a same problem. Did you find an answer?

    • @Nonameman98
      @Nonameman98 3 года назад +1

      if td = None:
      Continue

  • @revill0
    @revill0 3 года назад +1

    Very useful, you have helped me solve my issue, thanks!

  • @MingoDiMedici
    @MingoDiMedici 2 года назад +1

    What program are you editing in on the screen?

  • @tomcat9761
    @tomcat9761 3 года назад

    Great video! I subscribed!
    but what if I want to extract a specific range of columns? Like in 10 columns, I only want to extract from column 1 to column 7?

  • @RocknRollDina
    @RocknRollDina 4 года назад +1

    I used python shell nothing ran unless I added [0] at the end of the leaque_table line. Do you know why?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      The [0] is an index which means you were returning a list back - maybe something has changed on the website since, but if it works that’s ok!

    • @RocknRollDina
      @RocknRollDina 4 года назад

      @@JohnWatsonRooney thanks man

  • @dzeykop
    @dzeykop 3 года назад +1

    Thank you John, again great lecture

  • @haithinhtran5108
    @haithinhtran5108 2 года назад +1

    if no class then what do you do?

  • @skunkfog1333
    @skunkfog1333 3 года назад +1

    Thank you! This helped me a lot!

  • @TheSahil360
    @TheSahil360 4 года назад +1

    I got mine to work! How would you export the printout to a data table?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад

      Great work! I use pandas - I have a video on exporting script to csv on my channel that might help you

  • @sounakchatterjee9059
    @sounakchatterjee9059 4 года назад +1

    it was so easy to understand!! thanks!

  • @AmanpreetSinghCHD
    @AmanpreetSinghCHD 4 года назад +1

    Great, was looking for something similar, I am having an issue exporting to a csv, I am using csv.writer to export to csv, is their a better way to do it?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +3

      Hi! Sure, I use Pandas. It’s a library used for data science but it’s dataframes are really easy to export to csv. I have a video on my channel explaining how I use it!

    • @AmanpreetSinghCHD
      @AmanpreetSinghCHD 4 года назад +1

      @@JohnWatsonRooney Thanks will look into it :) cheers

  • @NarutoUzumaki-xn1pr
    @NarutoUzumaki-xn1pr 4 года назад

    Hi John,
    I faced below scenario:
    Ex: td has 'A' and it contains two values 1,2
    but when I tried to print both values using find. It is giving only 1 as output. Which means
    A 1.
    But I need output as
    A 1
    A 2
    Please let me know your thoughts

  • @jatinyadav8960
    @jatinyadav8960 3 года назад

    I find your tutorial very helpful

  • @bngtnsnyndn8840
    @bngtnsnyndn8840 3 года назад +1

    this is the website i tried to get the table data www.sahibinden.com/opel-omega
    when do the coding on anaconda it doesnt give any error but also doesnt show the output no result, do u think it might be bc of the website?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад

      hey! that site has a login, so your code will only see the login window and not the actual data

    • @bngtnsnyndn8840
      @bngtnsnyndn8840 3 года назад

      ​@@JohnWatsonRooney i got it.. thank you soooo much

  • @ayswaryagovindaraju2679
    @ayswaryagovindaraju2679 2 года назад

    This video is helpful. I want to now save the above table into a dataframe. Do you have any video where data from HTML is made into a dataframe?

  • @KayiEdits
    @KayiEdits 4 года назад +1

    you legend

  • @vamsi4864
    @vamsi4864 4 года назад +1

    Which editor are you using?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      VS Code - code.visualstudio.com
      It’s free and works on windows, Linux and Mac, the only extension I use is the Python one, it’s easy to find and setup

  • @Yeeeeeehaw
    @Yeeeeeehaw 2 года назад +1

    Thank you!

  • @penglipur_lara
    @penglipur_lara 4 года назад +1

    thanks man you really help me

  • @yogeshbane9647
    @yogeshbane9647 2 года назад +1

    what to do if td tag has no class

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад

      Go back up the tree and find an element you can access easily then index or find the td tag

  • @hadish4529
    @hadish4529 3 года назад

    hi
    tanks for your education
    i have a question , how we can scraping secure cookies from website? without use selenium module
    for example in this tat.exirbroker.com/mobile/index.html website i can only 2 cookies and i need all cookies scraping with request module
    tanks for your help

  • @victormartinsdias9427
    @victormartinsdias9427 2 года назад +1

    Very good

  • @buttert5091
    @buttert5091 3 года назад +1

    Thanks really helpful

  • @kodediego
    @kodediego 4 года назад

    great video.. so i did this to send straight to pandas table
    league_tb = []
    for team in l.find_all('tbody'):
    rows = team.find_all('tr')
    for row in rows:
    pl_team = row.find('td', class_='standing-table__cell standing-table__cell--name').text.strip()
    pl_points = row.find_all('td', class_='standing-table__cell')[9].text

    lister = { #creating a dict with data scrapped
    'club':pl_team,
    'points': pl_points
    }
    league_tb.append(lister) #adding it to list
    table = pd.DataFrame(league_tb)
    table.head(5)

  • @ugurdev
    @ugurdev 3 года назад +1

    John, a fellow Arsenal fan too! Not good this season either! :(

  • @fleimbeck9384
    @fleimbeck9384 4 года назад +2

    Thanks man !

  • @domukelis
    @domukelis 3 года назад

    only gives data for 20th team norwich whats up with that

  • @ssr765
    @ssr765 3 года назад

    don't trust, doesn't work

  • @emmanuelolorunbogun772
    @emmanuelolorunbogun772 3 года назад

    Nice video John, I really love your approach and explanation. But how do I get the data from a particular row/column of a table if a class isn't defined under the td tag?
    For example: the first table in this link en.wikipedia.org/wiki/List_of_African_countries_by_area

    • @bdcash
      @bdcash 3 года назад

      You can do table scraping really easily with pandas - ruclips.net/video/ODNMNwgtehk/видео.html (and it works fine with wiki pages. You just index the results with [0] or [1] etc until you find the table you want)

  • @leonardoalvarado7632
    @leonardoalvarado7632 3 года назад +1

    Thank you your video was very helpful!

  • @pini5076
    @pini5076 Год назад +1

    Thank you!