How to SCRAPE DYNAMIC websites with Selenium

Поделиться
HTML-код
  • Опубликовано: 19 ноя 2024

Комментарии • 321

  • @thewhiterabbit661
    @thewhiterabbit661 3 года назад +8

    Love how you kept this video short and consise, already spent 3 hours on tutorials scraping with requests and bs4 all to discover i need to scrap with selenium for my particular site anyway

  • @anastaciakonstantynova3047
    @anastaciakonstantynova3047 3 года назад +46

    Quick sharing of my story before I say huge, enormous THANK YOU 😊 🙏 I started a new job a few months ago and I had to do web scraping and stuff which was so new and terrifying to me. thanks to your videos I managed to go for it and do tasks which used to be completely beyond my understanding. So, John, THANK YOU MILLION TIMES for your efforts and work put into this channel

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +9

      Thank you! I’m glad I was able to help, and good luck with your new job!

  • @johannaaboyejii9590
    @johannaaboyejii9590 4 года назад +22

    this is by far the best tutorial on selenium and a cool tip on panda!! Thanks @John

  • @GNMbg
    @GNMbg 3 года назад +6

    I am a total beginner in coding and I find your videos very helpful, thank you

  • @JohnWatsonRooney
    @JohnWatsonRooney  3 года назад +16

    I made this video when I had 4 Subs. Last week rolled over 10k, thank you all!

  • @exploring_world4353
    @exploring_world4353 3 года назад +10

    Seriously, this is a fabulous video. You explained it very well. Please do full series of web scraping.😊

  • @startcode6096
    @startcode6096 2 года назад +8

    Thank you John for this, extremely helpful stuff. You explain everything so well, it makes me very excited to practice along. Also please consider recording another video showing a more complicated use case of browser automation using Selenium. Cheers !!

  • @McMurdo-Station
    @McMurdo-Station 2 года назад +2

    Hey John, I hardly ever comment on videos, but I wanted to let you know that this was exactly what I was looking for! And only 11 minutes? Great job, and thank you!

  • @richarddebose4557
    @richarddebose4557 3 года назад +6

    Agreed, this is the best, clear, practical and to the point tutorial out there. Thanks so much!

  • @aksontv
    @aksontv 4 года назад +5

    great explanation of each point, Sir please make a full series of web scraping including some advance stuff, thanks

  • @gerardocoronado5523
    @gerardocoronado5523 2 года назад +1

    Homeboy! with this simple video you helped me sort out almost every question I had after hours of useless content! 10 out of 10! You just gained a new sub

  • @swordartdesign
    @swordartdesign Год назад +1

    man, how happy when I found your channel, very clear explains, this is the true gold mine for me!!! Thank you!

  • @iakobkv271
    @iakobkv271 4 года назад +7

    Wonderful! Thanks man!
    I especially liked how you started from each video 'catalog' and then iterated inside for each video.
    In the future, would love to see more complicated examples for example, how to go to a window, collect data, close the window and things like this...

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +3

      Thanks for the feedback! I agree a more advanced video would be a great idea

  • @bisratgetachew8373
    @bisratgetachew8373 3 года назад +2

    Great content to learn web scraping. Not discovered by many yet, will be a huge hit
    thank you

  • @AbrahamMbaja
    @AbrahamMbaja 3 года назад +1

    The best selenium tutorial I've seen online

    • @AbrahamMbaja
      @AbrahamMbaja 3 года назад

      Can I make a private enquiry?

  • @klimpaparazzi
    @klimpaparazzi 2 года назад +1

    What a short and quick explanation of scraping with Selenium. You rock! Found this my searching, and just hit the subscribe button. Thank you!

  • @himalmevada7989
    @himalmevada7989 Год назад +2

    You gave us a very clear idea of how can we us selenium.
    Thank you brother

  • @milofer96
    @milofer96 9 месяцев назад

    Your videos are so easy to follow and learn from thank you a lot for them. I follow along the selenium and scrapy series and that was just the jump start I needed to star scraping all that I needed . Again Thanks

  • @xilllllix
    @xilllllix 3 года назад +1

    that period before the xpath is such a good tip!

  • @martpagente7587
    @martpagente7587 4 года назад +4

    Great content as always. Short. Precise and clear..

  • @kalyanishekatkar8337
    @kalyanishekatkar8337 3 года назад +2

    So easy to understand, BEST selenium video !

  • @kevinsteptoe
    @kevinsteptoe 2 года назад +2

    Thank you John - exactly what I needed and well explained in this short format. Excellent!

  • @Neil4Speed
    @Neil4Speed 4 года назад +2

    Great Video as always John, appreciate you taking the time

  • @lixuannlx1983
    @lixuannlx1983 2 года назад +1

    youre amazing, i spent 3 hours trying to figure out webscrapping but u helped me solve the issue with just a short video... IM SUPER THANKFUL

  • @jacobjulag-ay5639
    @jacobjulag-ay5639 3 года назад +4

    Super helpful! Allowed me to finish my project for work! Thank you!

  • @SauravdasDas
    @SauravdasDas 4 месяца назад +1

    this type of content is very rare sir...thank you sir

  • @harshitsharma1334
    @harshitsharma1334 3 года назад +2

    Sir, your videos are amazing.Really helped me clearing many many doubts in scraping.Thank you so much.May God bless you.!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      Thank you!

    • @harshitsharma1334
      @harshitsharma1334 3 года назад

      @@JohnWatsonRooney sir, if you can make a video on how to scrape a web page with infinite scroll as we move down..it'll be really helpful !

  • @brunosotelo9007
    @brunosotelo9007 Год назад +1

    Thank you for sharing this video! I'm able to start my own scraping project from a website that has grid results like in your example!

  • @omidasadi2264
    @omidasadi2264 2 года назад

    thanks John for sharing this videos and managing your time to do them, these are really helpful. I'm looking for all selenium videos you created, but found just 2 of them. I wish you help us again and create some new ones about selenium in a higher level.... thanks again

  • @SunDevilThor
    @SunDevilThor 3 года назад +14

    Rookie mistake: when creating the CSV file from the pandas dataframe, I accidentally put .py instead of .csv and the results ended up overwriting the entire script since it had the same name hahahaha. Luckily I was able to Command-Z it to undo.

  • @drewmartinez4453
    @drewmartinez4453 4 года назад +4

    This video made me love Python even more ....subscribed !

  • @pahehepaa4182
    @pahehepaa4182 3 года назад +2

    Damn! From 4 to 2.44k subscribers. Really good work !

  • @wangdanny178
    @wangdanny178 2 года назад

    thank you again for not making selenium imtimidating.

  • @SkySesshomaru
    @SkySesshomaru 3 года назад +1

    simple, effective, direct.
    amazing job.

  • @theinstigatorr
    @theinstigatorr 3 года назад +1

    You only had 4 subscribers? Nice to see your progression

  • @philallen6777
    @philallen6777 3 года назад +1

    Went through this today. Well explained again. Using VSCode. When browser (FFox) opens there is a google account dialogbox. Caused script to timeout before RUclips page loaded.
    To overcome this did import time and time.sleep(5) after driver.get(url).

  • @cyber_chrisp
    @cyber_chrisp 2 года назад +1

    This was so helpful man, THANK YOU. I didn't need the " . " in front, maybe they have changed that since this was posted. I did have to remove " /span " to stop a printout of my selected item, printing for each option though. ...Basically it printed the same name for each name in the list with " /span" still attached at end... I watched maybe 6 different videos, Thanks again! UPDATE: I had to put a break after the for loop to stop the repeated print out. Still learning 🙌

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      Hey thanks! I’m glad you got it sorted

    • @MrHi114
      @MrHi114 2 года назад

      Could you help me out? I copied the skript exactly ike in the skript but i only get one result. Howewer if i write in the loop "Print (video.text)" all the Informationen and more are coming. I tried to delete the "span" you mentioned but it dosent work for me :( - Beginner aswell here...

    • @cyber_chrisp
      @cyber_chrisp 2 года назад +1

      @@MrHi114 are you able to share the section of the code?

    • @MrHi114
      @MrHi114 2 года назад

      @@cyber_chrisp Sure: The loop is: " videos= driver.find_elements(By.CLASS_NAME, 'style-scope ytd-grid-renderer')
      for video in videos:
      title = video.find_element(By.XPATH, '//*[@id="video-title"] ')
      views = video.find_element(By.XPATH, '//*[@id="metadata-line"]/span[1] ')
      when = video.find_element(By.XPATH, '//*[@id="metadata-line"]/span[2] ')
      print(title.text, views.text, when.text)"
      If i run this loop. It gives me only the result of one video. But if i replace the "Print(title..." with "Print(videos.text)", it will give me all the raw data

  • @bosonsupremacy4530
    @bosonsupremacy4530 3 года назад +1

    Thank you so much, sir. That pandas dataframe technique is really helpful. I will share this video with my friends.

  • @Alfakillen
    @Alfakillen 3 года назад +2

    Good Work.
    Perfect level for a beginner. Easy to follow and understand. Learned a lot, thanks.

  • @Nolgath
    @Nolgath Год назад +1

    mine is exactly the same, and no results nothing 0, and I even did import time and added some time so the page would open because it is a heavy data page and did not work, seems like this just wont work for me

  • @sameernarkar5993
    @sameernarkar5993 4 года назад +7

    This is really helpful, thank you so much.

  •  2 года назад +1

    Thank you so much John. Appreciate a lot 🙏🙏🙏💜

  • @jingboli5494
    @jingboli5494 2 года назад +12

    Awesome video! One note: find_elements_by* methods are deprecated by selenium. The new method is find_element(), with an additional argument specifying the source and type (e.g. 'xpath', 'class name, etc). Additionally, I can't seem to scrape more than 30 elements with this method, is there a reason why?

    • @vrushabhjinde683
      @vrushabhjinde683 2 года назад

      Facing the same issue

    • @Chaminox
      @Chaminox Год назад

      same

    • @_hindu_warriors
      @_hindu_warriors Год назад

      Hi bro i m getting error as empty dataframe while scraping amazon reviews using this code how can i resolve this ?

  • @sheikh4awais
    @sheikh4awais 4 года назад +3

    Great tutorial. Can you also make a video about following the links and extracting data from inside the link?

  • @OmeL961
    @OmeL961 3 года назад +2

    That's exaaaaaactly what I was looking for. Thank you mate!

  • @vishalverma5280
    @vishalverma5280 3 года назад +2

    beautiful, You made me all revised in counted minutes ot time . Thanks John. Wish you can make a video to crawl through a lsrge list of URLs using supplied by an exl sheet.

  • @yazanrizeq7537
    @yazanrizeq7537 3 года назад +7

    Hello John! Thank you for all these videos. I've been working on some webscaping projects and literally been on your channel all day! I will recommend this to all my friends to subscribe!
    Quick question, for a website like glassdoor...what would you use for the XPATH for the company title and position? I can't seem to figure it out.

  • @serageibraheem2386
    @serageibraheem2386 3 года назад +1

    Brother u deserve 1 million subs

  • @nathannagle6277
    @nathannagle6277 4 года назад +5

    When you right clicked and copied the path it was like watching the person who discovered fire. #gamechanger 👏👏👏

  • @khawajamoosa8994
    @khawajamoosa8994 4 года назад +3

    Love this! ❤ please make more vedios on latest python modules

  • @marylynmunster8989
    @marylynmunster8989 2 года назад +1

    Thanks for all of your videos!

  • @harmandipsingh7617
    @harmandipsingh7617 3 года назад +1

    When i set the variables in for loop (ex title=video.find_element_by_xpath('.//*[@id="video-title"]').text) it only returns for the first video. IF I run the for loop and just say print(video.find_element_by_xpath('.//*[@id="video-title"]').text) I get all the videos. Whats messing up when I set them to the variable title? I copied yours word for word

    • @0991nad
      @0991nad 2 года назад

      I’m having the same issue, did you manage to find a fix?

  • @Dipanajan
    @Dipanajan 3 года назад +1

    hi, i am getting this error AttributeError: 'list' object has no attribute 'find_element_by_xpath'
    how do i solve this?

  • @brooksa.2982
    @brooksa.2982 3 года назад +1

    Hi John, I am attempting to use this to scrape prices and titles of Target products, but when I attempt to do so, I get the following error message:
    Traceback (most recent call last):
    File "", line 2, in
    title = games.find_elements_by_xpath('.//*[@id="mainContainer"]/div[4]/div[2]/div/div[2]/div[3]/div[2]/ul/li[1]/div/div[2]/div/div/div/div[1]/div[1]').text
    AttributeError: 'list' object has no attribute 'find_elements_by_xpath'
    could you please point me in the right direction?

  • @karkerayatish
    @karkerayatish 4 года назад +5

    This is good stuff... Can you get into more in depth stuff like controlling scrolls and a bit more complex js rendered page.. sites like Netflix and stuff

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +4

      Sure! I’m a planning a more advanced version of this vid to come.

    • @mattmovesmountains1443
      @mattmovesmountains1443 4 года назад

      @@JohnWatsonRooney along the lines of Yatish's question, do you know why mine only returned 8 results? What aspect of this script tells itself to stop scraping data? It doesn't end with an error per se, but it stops short of reporting all results. Is this in fact a scrolling issue? i.e. - is 'drive.find_elements_by_class_name' a function that only can scrape what the human eye could see on the page? BTW this is once again a fantastically explained and very helpful video. Thanks John!

  • @lautarob
    @lautarob 3 года назад +1

    Thanks for the enlighten video. Well done!

  • @paulohsgoes1959
    @paulohsgoes1959 4 года назад +3

    Excellent job!

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      Thanks for your kind comments! Glad you are enjoying my videos

  • @carloalbertocarrucciu8473
    @carloalbertocarrucciu8473 3 года назад +1

    Really clear, however "find" in my code gets only visible elements, but if I scrool i can retrieve more elements.... how can I scroll till the limit to get as much elements as possible? thanks

  • @neilgyverabangan6989
    @neilgyverabangan6989 3 года назад +2

    Thank you sooo much for this tutorial! Can you also do LinkedIn profile scraping..?

  • @blacksheep924
    @blacksheep924 2 года назад +1

    OMG that little .text is exactly what I was looking for. Thank you sir you solved my problem

  • @higiniofuentes2551
    @higiniofuentes2551 2 года назад +1

    Thank you for this very useful video!

  • @john-r-edge
    @john-r-edge 3 года назад +1

    Good videos on Selenium.
    Question - did you make that additional vid about doing headless Selenium on a Linux server? You mentioned that would part 3 of the Selenium vids. Cheers!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад

      I did! That one actually never got made - I am working on something else though that will do the same thing

  • @N1246-c2f
    @N1246-c2f 3 года назад +1

    how would you go about grabbing the number of comments under each video? would you need to click on one video and somehow grab the html tag for comment and pass it through a loop? I've tried doing something similar with beautiful soup and im running into a wall each time I attempt it

  • @Spot4all
    @Spot4all 4 года назад +1

    Nice, do more videos, like scraping website like dynamic, noiiz sound samples downloading, please try to put a video on how to download sound samples from noiiz sound instrument

  • @ingal.1
    @ingal.1 2 года назад +1

    Thank you so much John.

  • @fatihekici99
    @fatihekici99 4 года назад +2

    very simple explain thank, I gone try tomorrow

  • @superwolf1603
    @superwolf1603 3 года назад +1

    Hi john, I have been trying this method on Facebook, but I can't seem to get it to work

  • @Guzurti1995
    @Guzurti1995 4 года назад +4

    Great, straight to the point tutorial.
    When I include the "." in front of the xpath I get the following error: Message: no such element: Unable to locate element. When I remove the dot I only get the information from the first video.
    Do you know why this might be?
    Thanks

    • @sampurnachapagain2936
      @sampurnachapagain2936 3 года назад

      getting the same error when I tries to scrape other sites. Any idea why ?

    • @Sece1
      @Sece1 2 года назад

      I thought this would be the solution to my problem but it does not seem so. I am having trouble with Zillow and indeed and neither selenium nor Beautifulsoup works for me.

  • @vishalsingh-yj8bk
    @vishalsingh-yj8bk 3 года назад +1

    Good Sir! What if I also wanted to extract the "href" from the video title, how can I do that?.

  • @0610Ban
    @0610Ban 3 года назад

    Guys if you're having trouble, maybe it's the chromedriver.
    Great video. very helpful.

  • @Working55
    @Working55 3 года назад +1

    title = video.find_elements_by_xpath('.//*[@id="video-title"]').text
    AttributeError: 'list' object has no attribute 'text'
    any ideas why that happens ? I have to add [0] before .text to throw a result.

  • @exploring_world4353
    @exploring_world4353 3 года назад

    Hi, I have one problem, actually why I'm getting no such element exception. I inspected the elements. I tried XPath, CSS selector, id to get that element but still getting that exception. Can you help me with why that exception occurs?

  • @CaptainBeardDz
    @CaptainBeardDz 2 года назад +1

    Amazing tutorial

  • @jacobmcclelland8864
    @jacobmcclelland8864 6 месяцев назад

    Straightforward.

  • @d-rey1758
    @d-rey1758 Год назад

    How do you click on link elements such as "

  • @locopollo666
    @locopollo666 3 года назад +2

    Thanks, very Wells explained!! 👏

  • @Red999u
    @Red999u 3 года назад +3

    How long would this work for typically? Do websites change their divs often enough that would break this script?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +2

      Most established websites don’t change that often so usually good to go for a while, it’s just something to be aware of!

  • @GlennMascarenhas
    @GlennMascarenhas 4 года назад +2

    I'm trying to scrape a webpage that loads a table in steps of 10 entries as you scroll down the page. This method doesn't load the entire html script for me. There's 2000 entries in the table. How do I force it to load all entries?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +2

      Hi Glenn, you can get Selenium to scroll down the page for you, check out this stack overflow link and try some of their suggestions:
      stackoverflow.com/questions/20986631/how-can-i-scroll-a-web-page-using-selenium-webdriver-in-python

    • @GlennMascarenhas
      @GlennMascarenhas 4 года назад

      @@JohnWatsonRooney Thanks! This solution kinda worked for me. The page I'm trying to scrape apparently had infinite scrolling so I went ahead with the given solution. But the results that loaded depended on the sleep time value. Like, I couldn't always have all results loaded but sometimes it did. I guess, it also depends on my network. Tried to look for a workaround but I gave up. Nevertheless, I'm good.

  • @이석호-s4r
    @이석호-s4r 3 года назад +3

    This is great! thank you :)

  • @hassanalnajjar8881
    @hassanalnajjar8881 3 года назад +1

    thx for this video it was very useful < keep going dude

  • @juhijoshi4389
    @juhijoshi4389 2 года назад

    Hello, I am having issue when i run the code everything seems to be working fine but dataframe is empty. Any reason why this would happen?

  • @fazlaynur4509
    @fazlaynur4509 3 года назад

    Mr. John, i watch your all tutorial, by working this i face this problems

  •  3 года назад +1

    Great job! Could you have a video to guide through scrape booking reviews :>

  • @AKMailing
    @AKMailing Месяц назад

    Are there any significant differences in performance in finding elements between xpath, class, id, etc.?

  • @shirsangshudutta
    @shirsangshudutta 3 года назад

    very simple elegant way

  • @sonliste6394
    @sonliste6394 3 года назад +1

    Thanks a lot man very helpful

  • @abdulmoin3315
    @abdulmoin3315 3 года назад +1

    Sir can I know which type of data that client usually ask for.

  • @jiazz2546
    @jiazz2546 3 года назад

    hey John, thanks for your video, just a quick question : when I trying to print print(title,views,when), I only got one output instead of all.
    do you know whats wrong with my code
    videos = driver.find_elements_by_class_name('style-scope ytd-grid-renderer')
    for video in videos:
    title = video.find_element_by_xpath('.//*[@id="video-title"]').text
    views = video.find_element_by_xpath('.//*[@id="metadata-line"]/span[1]').text
    when = video.find_element_by_xpath('.//*[@id="metadata-line"]/span[2]').text

    print(title,views,when)

  • @brothermalcolm
    @brothermalcolm 2 года назад +1

    way to go from double digit views to tens of thousands!

  • @balaji9629
    @balaji9629 3 года назад

    when you print the data you got around 10 video details. But how to get all video details? Please answer

  • @MatthewMcArthur-i1s
    @MatthewMcArthur-i1s Год назад

    Great video, I was following along for my url and get a successful run in my for loop but no output is shown even though I have it printing my output. Have any suggestions for this?

  • @WhipReviews
    @WhipReviews 5 месяцев назад

    Nice video but how can I get the results into excel with the link of the thing you are trying to scrape?

  • @georgesmith3022
    @georgesmith3022 4 года назад +2

    hello, i just found your channel and subscribed. On channels that have modal pop-ups for GPDR consent, etc is there any way to use requests or do I have to use selenium? When I use requests, the function never returns.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад

      If the request is actively blocking the content on the page then unfortunately requests won’t work as we can interact with the page. Maybe selenium is the best bet

  • @nwabuezeprecious457
    @nwabuezeprecious457 9 месяцев назад

    How can I use Python to search for a list of serial numbers in a document column, employ a search toggle (similar to the RUclips search toggle), and subsequently extract the results obtained for each serial number?

  • @mr.z4075
    @mr.z4075 3 года назад +2

    Thanks man you helped me alot

  • @kamaleshpramanik7645
    @kamaleshpramanik7645 3 года назад

    Very useful video. Thank you very much.

  • @miggiesmalls19
    @miggiesmalls19 2 года назад

    Very helpful explanation!! Question: What if I wanted to get data within that link? (description or comments as example)

  • @dnetvaggos4443
    @dnetvaggos4443 4 года назад +4

    Gj dude nice work...

  • @06_it_anirudhbandari47
    @06_it_anirudhbandari47 9 месяцев назад

    When I used xpath , in the loop . It keeps on printing the first element instead of moving forward

  • @GlennMascarenhas
    @GlennMascarenhas 4 года назад +3

    I've found that selenium can get terribly slow while locating elements especially when you try to locate by xpath. Finding by class_name or tag name are seemingly faster. Still though it took about 15-20 mins to process 2000 entries and write to file.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      It is slow unfortunately, but sometimes the only option other than doing the task manually. Have you tried helium? It’s a selenium wrapper so won’t be faster but can be easier code to write

    • @GlennMascarenhas
      @GlennMascarenhas 4 года назад +3

      @@JohnWatsonRooney Thanks for letting me know about Helium! I watched your video on it and tried it out and it actually does seem faster (and easier for sure) than selenium (in case of my task) even though the underlying calls are to the selenium API itself. I guess it does it more efficiently than the script I wrote using selenium directly.

    • @buddyreg234
      @buddyreg234 Год назад

      Also sites could make REDISIGN. And you will have to rewrite all... And there is another approach, which could help with some subset of redisigns.