What I'd Add FIRST To a new Scrapy Project

Поделиться
HTML-код
  • Опубликовано: 12 дек 2024

Комментарии • 66

  • @linuxinstalled
    @linuxinstalled 3 года назад +6

    I wish this video had more exposure. I greatly appreciate that you took the time to put this series together. Being able to see these examples of the various mechanics behind scrapy has been hugely helpful. Thank you again.

  • @janekstern
    @janekstern 2 года назад +3

    You videos helped me understand scrapy more than any other resource, ty!

  • @davyroger3773
    @davyroger3773 3 года назад +7

    Thanks! the documentation did not go into enough depth and im glad someone made a comprehensive video on it

  • @shihlun5291
    @shihlun5291 2 года назад +1

    Thanks for the tutorial, after watching it, it gave me a better understanding of scrapy itemloader documents.

  • @victormaia4192
    @victormaia4192 3 года назад +6

    Great tutorial! very easy to follow, had no problems, about the typos, I'm the worst typer ever, but tabnine always saves my life.

  • @hendrikfeddersen6768
    @hendrikfeddersen6768 4 года назад +3

    Thanks a lot. The videos are very clear. Do you mind explaining please in one of your next videos the correct folder structure of a Scrapy project and what file goes where and why.

  • @gwulfwud
    @gwulfwud 3 года назад +1

    Thank you! I watched the previous video and then this, and it felt like I know so much about scrapy already. Really really good videos. Keep it up!

  • @woldemarkiev
    @woldemarkiev 2 года назад +2

    Great tutorial!! It really helps to understand

  • @amineboutaghou4714
    @amineboutaghou4714 4 года назад +2

    Another great video ! Very well done John 👏🏼

  • @codewithnacho
    @codewithnacho 3 года назад +3

    Awesome vid! It answered my questions with Item Loaders. Docs were confusing me haha

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +2

      I know! The docs are good but also, not so good haha

  • @justinames5439
    @justinames5439 2 года назад +1

    As the others have said, thanks for your time and effort, a great help. The links connecting to Amazon (e.g. the lighting link) are dead, and you might want to update them. On another front, have you added a video on caching? All in all, really well done, and, again, thanks.
    jA

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад

      Thanks, one of the issues with a lot of the scrapers I wrote is that always age well! I haven’t actually done anything in caching yet no, I’ll add it to my list

  • @yangvictor5349
    @yangvictor5349 2 года назад +1

    thank you for sharing

  • @vidproli4231
    @vidproli4231 3 года назад +1

    great tutorial, explain the exact thing I was looking for, thank you

  • @milank9857
    @milank9857 2 года назад +1

    Great explanation as always, really helpful tutorial

  • @byroncodes
    @byroncodes 3 года назад +2

    Hello John, could you do a video on how to host the scrapy scripts

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      Hi! Yes I've been wanting to cover this for a while, unfortunatley ScrapyD doesn;t work with the latest version of Scrapy, so the best alternative I could come up with was hosting the Spider on a Linux server and using a cronjob to run it every X hours. Would that be of interest?

    • @byroncodes
      @byroncodes 3 года назад

      @@JohnWatsonRooney Sounds great. Looking forward to that. I've been having challenges as to how best to host my scraping scripts, I know there's some among us who also face the same challenge. Thanks, your efforts are much appreciated

  • @cosmicblack
    @cosmicblack 2 года назад +1

    Great video. Thanks!!!

  • @RahulYadav-w1v4l
    @RahulYadav-w1v4l Месяц назад

    Amazing Tutorial 😍
    I do have a question - I am trying to get basic information from a shoe website and the spider is only returning half the items on the website because of the DUPEFILTER setting. Maybe there are same links for same shoe and different color or multiple items with the same link but if I try to change the filter setting, it goes into infinite loop. Is there a way around that?

  • @nadyamoscow2461
    @nadyamoscow2461 3 года назад +1

    Thanks a lot, what you do is amazing.

  • @carinafelnecan7802
    @carinafelnecan7802 2 года назад +1

    Thank you, I learned a lot from this video:)

  • @MohAmuza
    @MohAmuza 3 года назад +2

    I scraped a product and some items don't have some data so the result is a nonetype which means None,
    I created in the items.py a function to check if it is None print something:
    def check_gift(value):
    if value is None:
    return "No gift"
    else:
    return value
    but it don't work where is the problem?

  • @user8ZAKC1X6KC
    @user8ZAKC1X6KC 2 года назад

    I am having an issue where it seems like fetch(req) is the going a bit too fast, so it's only catching part of the page. Is there a way to slow it down? I can find it for when the crawler is working, but not for when you're scraping the shell. Thoughts?

  • @Scuurpro
    @Scuurpro 2 года назад

    How would change a stock item in item loader. It only returns "In Stock" or " " when things are out of stock. Would I create a function with a value and if else statement?

  • @vitalij09
    @vitalij09 3 года назад +1

    Thanks man!

  • @dcevansuk
    @dcevansuk 3 года назад

    Another Excellent Video!!!
    I have one question; This is working with the parent URL data, is there a way to also use ItemLoader() with the associated child URL scraped data to end up with one combined yield l.load_item()?
    It could be an interesting video.

  • @alessandr2
    @alessandr2 3 года назад

    Thanks for the tutorial!! One question, what part of the new code prevents the error to appear if there is no price info?? Thanks in advance !!!

  • @sheikhakbar2067
    @sheikhakbar2067 3 года назад +1

    Thanks a lot, that was very helpful.

  • @kevin_daang
    @kevin_daang 2 года назад

    If i wanted to include when a whisky bottle was sold out, how would i do it with the item loader?

  • @ferilukmansyah3037
    @ferilukmansyah3037 4 года назад +1

    thanks for best tutorial

  • @salimbo4577
    @salimbo4577 3 года назад

    thank you so much. is there a way i can scrap audio data like sound data ?

  • @JnWayn
    @JnWayn 2 года назад

    Nice to know what the competition is. I got a wisdom tooth. Is it possible with Scrapy to mark a checkbox, then click a button to get to the next page?

  • @NatureLover02005
    @NatureLover02005 4 года назад +1

    Excellent!!!

  • @abukaium2106
    @abukaium2106 4 года назад +2

    Great video. I wish a video of scrapy using proxy from you

  • @Daviuliano
    @Daviuliano Год назад +1

    Super nice, however I am struggling to understand how would that work with a dynamic website where I am following a GET method which returns a data in json format. I do a bit of working around and convert it to a dictionary - but can’t seem to get it to return an item… any ideas that can help me?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад +1

      I think you'd still need to parse through the JSON and then load it into the item loader and item, it's been a while since I've done that though so not 100% sure sorry

    • @Daviuliano
      @Daviuliano Год назад +1

      @@JohnWatsonRooney thank you… I managed to do it now. Had to yield them all individually. But it’s working 👍🏼

    • @fatihkarakus6189
      @fatihkarakus6189 Год назад

      @@JohnWatsonRooney when i import items
      I get an error like this: attempted relative import with no known parent package
      how can i solve this error

  • @thewheeldeal8439
    @thewheeldeal8439 3 года назад

    This is a great video thanks!
    Question: Can scrapy save item objects to pickle binary files? If so, how? I just find it really convenient to save my scraped data into pickled objects that can be used quickly in other files, but I can't find any doc on that for scrapy...

  • @abdulcute
    @abdulcute 3 года назад

    Best Vid for scrapy and best explanation @john Watson Rooney and others
    i have a one question along item loader that how we extract data if the element have more than one information (e.g. if element have two cell no then Item loader pick only first number not second one) as i learned from you previous vid we use getall()

  • @maheshsharma-zq2uc
    @maheshsharma-zq2uc 2 года назад

    Can you make one project with scrappy to extract stocks information along with historical data

  • @fatihkarakus6189
    @fatihkarakus6189 Год назад

    when i import items
    I get an error like this: attempted relative import with no known parent package
    how can i solve this error

  • @alexportugal3986
    @alexportugal3986 Год назад

    Hi, i just don't quite get why you use the itemloader part and all of that stuff when you can do it within the parse function. Seems to me that it gets more complicated to get the same result. Surely there is something I am missing

  • @dokanplugincustomization1587
    @dokanplugincustomization1587 3 года назад

    Awesome Playlist But i have one question ( products which are sold out they are not giving us any data in its price field i tried to place the alternative value something which you have done in previous vedio using try and except block ) But i failed to do so please guide me

  • @alfakih7247
    @alfakih7247 Год назад

    More scrappy blog please

  • @karthikkarthik100
    @karthikkarthik100 Год назад

    Thanks for the informative video, Can't we just write if next_page: instead of if next_page is not None ?

  • @leleemagnu6831
    @leleemagnu6831 4 года назад +2

    John,
    Another great video.
    In the title the first word should read Scrapy or the video won't come up in a search.
    Let me wish you a, well deserved, fantastic Christmas !
    e

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад

      Oh wow I didn’t notice! Thank you for pointing that out, I’ve changed it. Happy Christmas to you too!

  • @TheWhoIsTom
    @TheWhoIsTom 3 года назад +1

    Nice tutorial!! Would be nice if you would show how to store the data of THIS code (item loader) into mongo DB. :)

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      Thanks! Sure, I’m going to extend this project to cover more of Scrapy’s features, including pipelines and databases

    • @TheWhoIsTom
      @TheWhoIsTom 3 года назад

      @@JohnWatsonRooney Awesome. Thanks a lot :)

  • @Abdul_Rafay_Pal
    @Abdul_Rafay_Pal Год назад +1

    what would you recommend? splash or playwright?

  • @GordonShamway1984
    @GordonShamway1984 3 года назад +1

    Super

  • @KhalilYasser
    @KhalilYasser 4 года назад +1

    Amazing tutorial. Thank you very much. Can you share the code as usual?

    • @JohnWatsonRooney
      @JohnWatsonRooney  4 года назад +2

      Yes, sure I've updated my repo here: github.com/jhnwr/whiskyspider

  • @ShahidulsPerspective
    @ShahidulsPerspective 2 года назад

    How to save the URL of the extracted page when using itemloader.

  • @isabelsilva-wf8vg
    @isabelsilva-wf8vg 2 года назад

    how do I use this on the xpath, I tried but it didnt work exactily like this {l.add_xpath( ' title ' , ' .//h1[@class="product__title"]')}