Web Scraping with Python: Ecommerce Product Pages. In Depth including troubleshooting

Поделиться
HTML-код
  • Опубликовано: 30 ноя 2024

Комментарии • 277

  • @m1sti_krakow
    @m1sti_krakow 3 года назад +29

    Easily the best video on web scraping in Python I've ever seen. Only 20 minutes, but it has more content than many 1hr+ tutorials. Also you've explained many useful cases (f.e. what if we don't have some element). Thank you!

  • @deepak7751
    @deepak7751 3 года назад +3

    Finally after browsing for 3 hours I found someone clearing doubts beautifully. Thank you for sharing such a nice video.

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      Thank you!

    • @deepak7751
      @deepak7751 3 года назад

      @@JohnWatsonRooney Sir Can i get your email id i am having a query for which i need your help. Thank you

  • @fatimaelmansouri9338
    @fatimaelmansouri9338 3 года назад +7

    This is excellent content. I've been browsing for hours looking for a clear and detailed explanation and was lucky enough to find your video. And only 20 mins long! Thank you for sharing!

  • @daveys
    @daveys Год назад

    Getting the classes, divs, spans is something that I find quite confusing but I think you explained it well here. Thanks for posting!

  • @irfanshaikh262
    @irfanshaikh262 2 года назад +1

    The concurrent.futures applied to this has accomplished the task like a wonder.
    Thanks again John

  • @ellisbenm
    @ellisbenm 4 года назад +2

    Really valuable stuff. First web-scraping vid I’ve seen that goes into building a database with the scrape contents.

  • @goodkidnolife
    @goodkidnolife Год назад +2

    Fantastic video, not only is it easy to follow along but explanations afford a genuine learning opportunity, rather that just a simple copy and paste. With myself being new to python, a big thanks is in order!

  • @Neil4Speed
    @Neil4Speed 4 года назад +4

    Great tutorial, just went through. An excellent progression from the last one as most of the scraping that I have wanted to do involves "digging in". I feel that I am finally learning as I noticed the issue with the rating as we were typing it through!

  • @SadamFlu
    @SadamFlu 3 года назад +1

    Bro... You're the man. That was so well explained! You don't fuck around, you just hit it!

  • @barzhikevil6873
    @barzhikevil6873 4 года назад +4

    Thanks John, that was a very helpful video. As an economics major, I really need to be able to gather lots of data and process it efficiently, so web scraping was just a natural thing to learn. Keep up the good work!

  • @JohnAtkinson-ww8qe
    @JohnAtkinson-ww8qe Год назад +1

    Nice! These methods are the exact reason why I started my journey today in learning python

  • @djuzla89
    @djuzla89 3 года назад +2

    Never subscribed so fast, your explanation during work is priceless, and the speed if just perfect

  • @tablesawart2728
    @tablesawart2728 Год назад +4

    I applaud you for your clarity. At 8:50 I run the program and get only this: '[ ]' (the two square brackets for a list)...
    Why??

  • @vvvvv432
    @vvvvv432 2 года назад +1

    Another great video, thank you so much ! Your scrape videos are much better and to the point than online training platforms. I was watching a web scraping from pluralsight yesterday and I learned 2% from what I learned here.

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      That’s great I’m glad you’ve learned from my video!

  • @aksontv
    @aksontv 4 года назад +6

    Thank you sir, and please add more advance tutorials in this playlist

  • @Kicsa
    @Kicsa Год назад

    I have only been watching tutorials but this is really inspiring since you used it in a real website, thanks for the great video!

  • @bipinsartape683
    @bipinsartape683 4 года назад +1

    no words to thank-you. you made BS4 so easy

  • @rich-xf3sh
    @rich-xf3sh 3 года назад +1

    You deserve way more subscribers, great work, hope you keep posting!

  • @salahgouzi2458
    @salahgouzi2458 4 года назад +1

    Cant believe you got 1k subs only, that was insanely informative, thank you

  • @pandharpurkar_
    @pandharpurkar_ 4 года назад +1

    thanks John. Stay healthy..! Good concept clearing skills you have

  • @jasonkesterson2402
    @jasonkesterson2402 4 года назад +4

    I was able to get the rating by splitting on the
    and taking the index of [0] I'm not sure if that s the best way but it worked. :)

  • @_RamjiG
    @_RamjiG 2 года назад +4

    need some help here! I executed until 4.30 time stamp but only got empty list but you got the link for items.

    • @sai_jonov
      @sai_jonov 8 дней назад

      same here, it is becuase the site is loaded dynamically

  • @rodgerthat6287
    @rodgerthat6287 4 месяца назад

    Hey dude jsut stared my first internhsip, and this video has been immensly helpful! I really appreicate the effort put in and all the useful tips. Thanks!

  • @phantomsixtrading7094
    @phantomsixtrading7094 3 года назад

    Awesome video. Very thorough instruction. Thank you for going slowly and speaking clearly. Easy to read your screen. Overall phenomenal video.

  • @ahomes6329
    @ahomes6329 3 года назад +2

    Hi. I'm new here. When I run print(productlist), I'm getting back an empty list. What could be the problem?

  • @victormaia4192
    @victormaia4192 3 года назад +2

    very insightful, was nice to follow, now I'll try to do something similar with my projects to extract infos from ads, thanks!

  • @fumanchuyn
    @fumanchuyn 3 года назад

    This video is the kind of video that you think people wont share this knowledge for free, it is hope yet!!! Amazing video m8 keep doing it

  • @finkyfreak8515
    @finkyfreak8515 3 года назад +1

    Wow John, you have a new fan here :D. Super helpful!

  • @amith_1923
    @amith_1923 2 года назад +1

    Just joined your channel, hope to learn more and thanks for the video

  • @studywithrobin2715
    @studywithrobin2715 3 года назад +3

    A few hours later, and your tutorial's already being implemented in my everyday web scraping / data cleaning work at the office. +1 Subscribed!
    Edit: Do you think you'll make a video combining the stuff in this video with using Jupyter Notebook to fill a CSV/Excel file?

  • @enngennng5633
    @enngennng5633 2 месяца назад +1

    The entire code does not work with me by using request method but I used WebDriver browser method and work fine but the problem is that every time the page should be uploaded? which takes long time? I also used headless mode to avoid uploading the page every time but still does not work? I think the page should be uploaded first to fetch the data but why in your code it does not loading and return the data fast? when I use request method it return empty or none or [ ] ? I use exactly the same code as you? probably the headers is the issue? which headers did you use?

  • @ephraimmotho887
    @ephraimmotho887 Год назад +1

    Always enjoy such practical tutorials... Thank you so much for your efforts💯❤

  • @sourabhrananawareyujfestbw9858
    @sourabhrananawareyujfestbw9858 4 года назад +1

    Best Video ever on web scraping ....#liked #commented #Subscribed #Love From India

  • @gomesgomes8206
    @gomesgomes8206 3 года назад +2

    What a great training video. Thank you John. You are a great instructor. Explained well, easy to follow, clear and uses a real life example (real life challenges one would come across). Lots of ahaa moments that I had been struggling with including the what if an element is not present how to keep your program running (using Try: Except:) as opposed to your program coming to a complete stop. How easy was that? Only took me 6+ days of searching.

  • @business5707
    @business5707 4 года назад +1

    John very valuable content. thanks to share with the community

  • @gisleberge4363
    @gisleberge4363 4 года назад +1

    Very useful...thanks for putting it all together in such a clear way and easy to understand!

  • @SeanWilston
    @SeanWilston 4 года назад +1

    Thank you John. Very clear and useful information

  • @ashishtiwari1912
    @ashishtiwari1912 4 года назад +1

    This is what I was looking for. Most youtubers just made a video about how to scrape the first page but didn't tell of how to fetch the data for each product and then do pagination.
    Now it's very clear to me. I am getting one error:" 'NoneType' object has no attribute 'text' " after 30 or 40 iterations. I wonder what does that mean? I tried checking the solution to it on stackoverflow but the code shows the same error.
    And yes this is very useful for beginners to Intermediate. Keep making such videos. I have subscribed to your channel.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      thanks for your kind words! it sounds like maybe you reached the end and got all the pages? see what happens when you go to the last page in the browser

    • @ashishtiwari1912
      @ashishtiwari1912 4 года назад

      @@JohnWatsonRooney Yes. I have got all the pages. I have one more question to ask. I am putting down the link below. I am trying to extract the company details such as Name, Telephone number etc. The html code shows that it is with a list tag and then within each list tag, there is span tag with itemprop. I am trying to use span and itemprop but I am not getting the result that I want to.
      idn.bizdirlib.com/node/5290

  • @willingwinning
    @willingwinning 3 года назад

    This was a very useful video, thank you! I would have like to see how you handled the 'in stock' element because some products didn't have it listed. I tried to use your idea ("try:" and "except:") to overcome this but some of the products which didn't have the "in stock" label were actually in stock

  • @ibramou6200
    @ibramou6200 Год назад +1

    Hi John Thank you for these amazing videos my question is how can deal with the variable elements such images and save them in one cell in my csv file
    like this:
    Images: img1,img2,img3

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад

      Hi - Yes, hwoever you cant seperate them with a "," it would need to be something else if its going to be a csv file still. If you look through all the images and concatenate the names together: images = img1 + "-" + img2 that would work

  • @adarshdessai6752
    @adarshdessai6752 3 года назад +2

    Amazing 😻 thanks bro. You have made a scraping lot easier.

  • @dhruvipatwa4050
    @dhruvipatwa4050 4 года назад +1

    This is so helpful. I am literally a beginner in Python. Would you recommend any other videos that I can watch learn basics of Python. Thank you so much!

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      Thanks for your kind words. If you are completely new at Python check one of the complete courses on YT on freecodecamp channel to help you get started. You don't need loads of knowledge to start web scraping

    • @dhruvipatwa4050
      @dhruvipatwa4050 4 года назад

      @@JohnWatsonRooney Thank you so much! Also, the website I am trying to scrap have product images nested in product images like this: . What should I be using for productlist = soup.find_all('li', class_='ssMAPPriceCheck'). I could be completely wrong at this but just gave a shot based on what I understood. Can you please help me? Thank you.

  • @icedgodz428
    @icedgodz428 4 года назад +2

    can you please go over what a user agent is? and why it was necessary to include it?
    Thanks

  • @shreyasdeodhare2574
    @shreyasdeodhare2574 9 месяцев назад

    John, your video was fantastic! I appreciate the clear explanation, but I'm curious: will your approach work for any website? Looking forward to your insights!

  • @pritam1047
    @pritam1047 3 года назад

    Please make one video on Scraping Full Products data from Woocommerce / WordPress websites

  • @DATA_MACHINE22
    @DATA_MACHINE22 2 года назад +1

    very beautiful and from scratch.👏👏

  • @datascienceanalytics5512
    @datascienceanalytics5512 2 года назад +2

    How can I avoid the problem: " Max retries exceeded with url"?
    NIce video tho, Thanks a lot!

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад

      Thanks! Best way in that case would be to slow down your requests using something like time.sleep()

  • @im4485
    @im4485 3 года назад +1

    Very nice... Straight to the point

  • @BodrumDrone
    @BodrumDrone 3 года назад

    requests-html and requests library doesn't support aws lambda function.
    when I use urllib its work!
    my method:
    html = urlopen(baseUrl)
    soup = BeautifulSoup(html.read(), 'lxml')

  • @TelstarB
    @TelstarB 2 года назад

    How do you choose the best library for the scraping? Time? Complexity? Grat video btw

  • @Nafke
    @Nafke 5 месяцев назад

    When you program web scrapers for work do you wrap the different parts of your code as functions and then call them from a main function like: get_Links(), get_products(), etc or just leave it as a long script because this simple enough?
    Also thank you so much for your content. I’m not a stem student but I was able to learn enough to build my own dataset for school even though I never programmed before. Thank you so much for taking all this time.

  • @rw569
    @rw569 3 года назад +1

    When I run the code at 4:26 for some reason all I get it [ ] - can anyone help?

    • @ceciliasanchez658
      @ceciliasanchez658 3 года назад

      productlist =soup.find_all('li', class_='product-grid__item')##the items now are in a list, just change this line , the class name has already changed

  • @roberttuttle4284
    @roberttuttle4284 Год назад +1

    Thanks for the lessons. I have a website that requires me to either input dates, then I get a list of links. I must click on each one and scrape a json file. I am doing this with Selinium because I am a beginner and Beutifulsoup doesn't seem to work. But its super slow. I think the site uses JavaScript. Is there a better way to do this than using a headless browser. Do you have a video that might help me.

  • @CinemaAcademy-cl4ym
    @CinemaAcademy-cl4ym 10 месяцев назад

    I have seen a lot of videos which are related to scrapping, i want to learn one more thing that how to scrap the reuters website using proxies. I want learn how to extract headline, date, paragraph for each article. Here is also load more articles button which is used for the next article, i also want to learn about it how to use it . Can you make seperate video for it please.

  • @kishonoi191
    @kishonoi191 4 года назад +1

    Ive started Taking Python more Seriously to improve my Hacking

  • @grub_taless7561
    @grub_taless7561 2 года назад +1

    Is there any other way to scrape data from all the pages other than using the page number from the url. The website I am trying get data from does not generate new url for every other page

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      I suspect it’s being loaded by Ajax / JavaScript- if you check my channel for some of my JavaScript scraping methods it should help

  • @yummywithali
    @yummywithali 3 года назад +1

    Thank you it was so useful. I have a question. I want to crawl product data and at the same get product description that is a link on another page. how we can crawl product description when it is on another link?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад

      Hi, sure, scrape the url for where the product description is and request that data from there within a for loop and add to your data

  • @MuhammadIsmail-jj1kk
    @MuhammadIsmail-jj1kk 2 года назад

    hi there, the scraper is fantastic I have done all but need to save data in excel

  • @Troglodyte2021
    @Troglodyte2021 4 года назад +1

    User Agents blocked me. I think I have to come back to your video again when I need them. Salute!

  • @chartalized.9533
    @chartalized.9533 3 года назад +1

    Cant thank you enough for this mate! So helpful. Love the clarity!

  • @MinistryofyouthLibya
    @MinistryofyouthLibya Год назад

    Thank you very much, I just when I print(productlist) I get [ ] instead of the list. what should I do? please help!

  • @athulyesudas
    @athulyesudas 2 года назад +1

    best web scraping video.. keep it up bro

  • @atsource3143
    @atsource3143 2 года назад +1

    Hi John, just wanted to know is there any way to scrap hidden div tags/elements using playwright, beautifulsoup etc?
    Thanks

    • @fernandodaroynavarro4231
      @fernandodaroynavarro4231 11 месяцев назад

      Hello @atsource3143, did you find the answer to this? I have the same problem about scraping hidden tags.

  • @mukulbahuguna9553
    @mukulbahuguna9553 2 года назад

    thanks for the info.. i have a question can you tell how to do scraping of e-commerce using java?

  • @archanjd4463
    @archanjd4463 3 года назад +1

    Excellent stuff! Straight to the point

  • @reymartpagente9800
    @reymartpagente9800 4 года назад +1

    Thankyou for a simplyfied content as always. We love your videos
    I hope you can make video also on how to scrape json data under javascript tag. I encounter more websites like this. This is a bit advance than your previous videos

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      Thank you for your feedback. Yes I am planning more advanced scraping soon including JSON and script tags

    • @shubhamkumar857
      @shubhamkumar857 3 года назад

      @@JohnWatsonRooney how do i scrap the random td without any class or id

  • @sakshikundu6379
    @sakshikundu6379 3 года назад

    I tried to do so but getting an empty list. after print(productlist) command.
    please help me.

  • @nathannagle6277
    @nathannagle6277 4 года назад

    Great video thanks! You should do a part two where you scape the pictures and export the CSV onto your own webpage.

  • @saadsarawan4846
    @saadsarawan4846 2 года назад +1

    Why is it that when we want to pass in the user agent in our header, are we using a request.get, shouldn't it be a request.post?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      When we send a vet request we include headers with it to identify ourselves, plus other information. It’s those headers we are adding the user agent too in this case. You would also include headers when you POST too, they go along with the request

    • @saadsarawan4846
      @saadsarawan4846 2 года назад +1

      @@JohnWatsonRooney Thanks a lot for clarifying. Your videos are amazing, they've literally been my guide to getting better at scraping. Thx man

  • @DailyRuun
    @DailyRuun 3 года назад

    Hi John.
    Thank you for your work/
    Can please suggest some software which will scrape (do the same thing as in your code ) ?
    I am not familiar with pyton installing and using it

  • @novarahman5049
    @novarahman5049 2 года назад

    Thanks so much for the tutorial, I'm attempting it with a similar website but the review number is placed within a tab to click placed as a button id, any idea how to pull the review number from there?

  • @HabibKhan-kj8um
    @HabibKhan-kj8um 3 года назад +2

    You're fucking amazing ! Kudos for such an awesome explanation. This is what I was looking for. Hats off to you

  • @alexlytle089
    @alexlytle089 4 года назад +1

    I really love your videos bro. For scraping webpages do you prefer beautiful soup or selenium ??

  • @aaronramsey4922
    @aaronramsey4922 3 года назад

    Please, tell me, what to do if several elements that come one after another have the same class and style? When I parse, only the first element appears, and instead of the next elements.
    2nd question! Let's say, first and second elements have the class and style, but beginning with the third one no class or style, and it stops appearing giving error. What to do?

  • @huguititi
    @huguititi Год назад

    Hi, i found this tutorial on Oct '23 and the example page seems to block all the posible User agents, i only get the 403 response

  • @rolf8107
    @rolf8107 8 месяцев назад

    hello, I had a question, which packages do you use within preferences because I get all kinds of error messages when using your code.

  • @mohammedzareefw203
    @mohammedzareefw203 4 года назад +1

    finally one good tutorial about store data scraping.. your contain is better than that paid udemy content on python scraping .. can u can make a video for news articles/blogs scraping bulk full article,

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +2

      Thanks for your kind words! Yes I have videos planned for covering scraping different sorts of sites in the near future

  • @funtoosh3864
    @funtoosh3864 2 года назад

    Good.... But how to scrape the pages 1 by 1 according to search till 2k-5k products and all...?

  • @dgy_are
    @dgy_are 7 месяцев назад

    Hey, i want to scrape a website, but i want to render the whole page in my own website, like the idea of iFrame, how to do that please?

  • @Blr046
    @Blr046 3 года назад

    This is really nice. Thank you so much for sharing your knowledge. I want to build a python web scraping project GUI based where the user will select which web site to scrape and initiate the scraping job on click of a button. Please let me know how to achieve this.

  • @AbdihanadMohamed
    @AbdihanadMohamed Год назад

    Looks like I get rate limited(getting a 403 Forbidden) ->(in the second loop)after looping over each link to get the name, reviews, and price. It got me all the links but I get a forbidden when looping over each link to get the data. Any tips?

  • @phillfairclough
    @phillfairclough 3 года назад

    Great stuff, been following for a few days now. How do you manage to trim or replace any urls that you scrape? Some of the sites I have been trying leave whitespace but have a %20 in the actual URL

  • @kel78v2
    @kel78v2 3 года назад

    Just started learning python and found this video. Easy enough to understand your workflow and steps. Can I just ask what additional steps would be required if the products are behind a login access? Mind helping?

  • @expat2010
    @expat2010 4 года назад +2

    Thank you but it would have been more valuable if there was some place to download the code from so we could study it at our leisure. I don't know about anyone else but I can barely read the code.

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +1

      Of course, I always try to remember to put the github link in - I’ll find it and add it in

    • @expat2010
      @expat2010 4 года назад

      @@JohnWatsonRooney Thanks. I'll check back :)

  • @aditya26a
    @aditya26a 2 года назад

    Great tutorial ... I also wanna ask you a question like When I write product schema markup for product its very lengthy and time consuming to write schema for links when there are 1K or more product are present is that possible to automate this process

  • @sanskarkaazi3830
    @sanskarkaazi3830 4 года назад

    Really good.. but how do we show the rating without the line breaks and only the ratings.. ? I thought you would show that at the end.. lol

  • @iambeszs
    @iambeszs Месяц назад

    Thank you for the video sir, you made my day!👍

  • @abby-cv4xc
    @abby-cv4xc 10 месяцев назад

    my url doesn't change with any of the actions. the page with the data and the page i see when i see first going to website has the same url. i fail at the first request. i don't know where to go from there. any suggestions?

  • @amazingmechskills
    @amazingmechskills 2 года назад

    Its a Very informative videos .
    But how to save in CSV?

  • @AV-nh8mp
    @AV-nh8mp 4 года назад

    Thanks for all John, you are the bets! I'm new in this area and I'm really happy from learn to you. I'm stuck with a paga that carge again and again untill the end, the page don't have number of pages that change ...could you help me? I saw another of your videos but couldn't find one that match ... thanks again

  • @jorgemarques2585
    @jorgemarques2585 3 года назад

    Hi John, great tutorial, followed by the letter, but in my case I have a list with same class with 3 items, by using 'li' and the class, I can only retrieve the first item and not the 3 of them and get the 3 together if I use 'ul' which is not ideal, how to go around this? Thank you very much.

  • @XxMrPlaystation3Xx
    @XxMrPlaystation3Xx 3 года назад

    .text.strip() produces an error for me, I’m unsure as to why?

  • @dragon3602010
    @dragon3602010 3 года назад +2

    does it worth it to do it with scrapy that kind of webscraping? thanks

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      I’d say use scrapy for any serious projects, but for more one time scraping jobs it doesn’t matter so much

  • @nandiniguntur4509
    @nandiniguntur4509 2 года назад

    what if we don't know how many pages are there? how to set the range for pages?

  • @alejandrofrank7900
    @alejandrofrank7900 4 года назад +1

    Oh man, this is insanely good, keep it up!!

  • @lukerobertson1000
    @lukerobertson1000 4 года назад

    Love it!! Thank you mate, very clear and simple to understand.

  • @hanibech
    @hanibech 3 года назад

    Please, John, I want plugins or extensions scraping product store site without requesting API

  • @lautje4919
    @lautje4919 Год назад +1

    Hi! I know that this is a really late comment but Ive been trying to follow every guide on the internet but somehow it wont return any contents of the site.I only get the divs with class, body, etc but no content inside of those. I use a log in and find_all. Like, I only get the top of the tree. Do you know what this problem is called/how to solve it? I know its pretty vague but I dont want to make you read a whole paragraph haha

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад

      Try saving/printing the whole response as text and check to see if the data you want is actually there - if not try the same site but use playwright to load it with a browser instead and see if that works

  • @kavitakhatavkar
    @kavitakhatavkar 3 года назад

    By any chance you have a video on how to scrape Nested doctype html website?

  • @stevennagliati6959
    @stevennagliati6959 4 года назад +1

    This was really useful and clear, thank you! I'm just getting started with web scraping and I see you have multiple videos and playlists on the subject - which ones would you recommend I'd start with? Cheers from Spain!

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад

      Glad it was helpful! Try my Modern Webscraping playlist I think you'll find some useful things in there - ruclips.net/p/PLRzwgpycm-Fio7EyivRKOBN4D3tfQ_rpu

    • @stevennagliati6959
      @stevennagliati6959 4 года назад

      @@JohnWatsonRooney brilliant will do that, thanks!