Hidden APIs with Scrapy - easy JSON data extraction

Поделиться
HTML-код
  • Опубликовано: 10 сен 2024
  • I've shown this web scraping method before but never using Scrapy, and given that the Scrapy framework gives us some reaslly good features I thought it was about time I demo'd this. This is it in its most basic form.
    This Scrapy project will should you the basic methods for scraping API like data from a website, be it a proper API or the API endpoint you find when scraping a web site.
    Support Me:
    Patreon: / johnwatsonrooney (NEW)
    Amazon UK: amzn.to/2OYuMwo
    Hosting: Digital Ocean: m.do.co/c/c7c9...
    Gear Used: jhnwr.com/gear/ (NEW)
    -------------------------------------
    Disclaimer: These are affiliate links and as an Amazon Associate I earn from qualifying purchases
    -------------------------------------

Комментарии • 78

  • @tubelessHuma
    @tubelessHuma 3 года назад +8

    Good to see it in Scrapy. Your channel need more Scrapy tutorials. 👍

  • @isaialawaniyasana5209
    @isaialawaniyasana5209 3 года назад +5

    Awesome videos John. I wish I had found you before I paid money to learn everything you're explaining here more succinctly and free 👏

  • @brothermalcolm
    @brothermalcolm 2 года назад +1

    i requested for your scrapy x api video and voila it's right here, thank you!

  • @jessematherly5617
    @jessematherly5617 2 года назад +2

    Tremendous help - thank you so much.

  • @Mad0ba
    @Mad0ba 19 дней назад

    I try using this same method but I did not get an object attribute. The object values were blank eg, d[]. But if I check the response tab I see the values in the Json format. Any idea what could cause this?

  • @Scuurpro
    @Scuurpro 2 года назад +1

    I'm getting a 429 unkown error. What type of method should I use to slow down my scraper calls?

  • @decromax
    @decromax 3 года назад +1

    As always, detailed & clear explanation. Threw me off when Pycharm was fired up though 😅

  • @lfcatchall
    @lfcatchall 3 года назад +4

    Love your videos, really helped me get a project off the ground. Could you do a video on not overwhelming an API server with requests? What is the best way to slow the requests to the server down? I would like to hear your thoughts, process, etc. keep up the great work!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +4

      Great suggestion! I'll add it to my list of video ideas

    • @lfcatchall
      @lfcatchall 3 года назад +1

      @@JohnWatsonRooney thank you, and again really wonderful job you're doing with your videos. Be blessed.

    • @datascience7928
      @datascience7928 2 года назад

      There are several ways to slow down the request, the most commons are:
      1) Sleep after each run of a for loop
      2) Limiting the quantity of requests made, for example: each 10 requests, sleep more 30seconds
      Often I go with the first one that is the easiest to implement:
      from time import sleep
      for x in my_links.json()['data']:
      sleep(0.5)
      print(x['id'])
      each loop the code will sleep for 0.5 seconds, decreasing the flood of requests..

    • @mohamedbhasith90
      @mohamedbhasith90 9 месяцев назад

      @@JohnWatsonRooney Hi, can make a video or share guide on how to do the same process as in this video with POST Method? cuz, the website i'm trying scrape has this data in POST request.. can you help me pls?

  • @eslamabou-shashaa4652
    @eslamabou-shashaa4652 3 года назад +3

    Thanks allot 💞💞, amazing video 😍

  • @jamesmining1647
    @jamesmining1647 2 года назад +1

    seems to me every website has its own custom API and blocks access to these type of request to even HTTP GET data

  • @rostranj2504
    @rostranj2504 3 года назад +3

    Could you make a tutorial where you deploy the scraper on a VPS? I've seen many options like using scrapyd or running a cron job. I'd be helpful to see examples.

  • @wangdanny178
    @wangdanny178 2 года назад +1

    Hey john! Many guitars in the back. So any plans for a music youtuber soon?

  • @ХалилМаденбай
    @ХалилМаденбай 3 года назад +1

    congrats with 10K

  • @felixfys
    @felixfys 21 день назад

    helped me, thanks!

  • @abukaium2106
    @abukaium2106 3 года назад +1

    Thanks for this video.

  • @harshyadav2510
    @harshyadav2510 2 года назад

    hey sir if the scrapy.Request(-) is not showing any thing what should i do

  • @bigdatax6512
    @bigdatax6512 Год назад

    is this works for privat company network??coz i failed just like need to login..or something

  • @DittoRahmat
    @DittoRahmat 2 года назад +2

    Hi John,
    I tried scraping a website using hidden API like this, I succeed in parsing the first page.
    When I tried to loop the next page, it returned 403 error.
    Now, when I tried going back to parsing one page only, it also returned 403 error
    I have tried changing the user agent in the settings.py, but still no luck
    I can open the API endpoint link just fine in browser. So I think it's not an IP ban
    Can you suggest something ?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад

      I think you need to copy the cookie over from your browser and put it in the headers you are sending. Copy the request as curl and see what the headers are and try putting them into your code

    • @DittoRahmat
      @DittoRahmat 2 года назад +1

      @@JohnWatsonRooney it turns out I was actually blocked by Perimeter X when I manually visit the website. So assume this is IP ban right ?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад

      @@DittoRahmat Yes sounds like it, bot protection. Assuming you don't have a static IP from your ISP you can restart your router for a new IP, or just wait they don't usually last for that long

    • @brothermalcolm
      @brothermalcolm 2 года назад

      @@JohnWatsonRooney how do you put the copied cookie (and other headers) into the scrapy spider code can you cover this please?

    • @brothermalcolm
      @brothermalcolm 2 года назад

      @@JohnWatsonRooney because I'm having the same isssue where I'm able to get it to work following your earlier video using requests and insomnia but not in scrapy

  • @JackyVSO
    @JackyVSO 8 месяцев назад +1

    Is there a good reason to use Scrapy for this instead of the requests library? Isn't it bringing a gun to a knife fight?

    • @JohnWatsonRooney
      @JohnWatsonRooney  8 месяцев назад +1

      I try to show lots of examples and I like the easy expanding use case of Scrapy but yes here it’s not needed as such

    • @JackyVSO
      @JackyVSO 8 месяцев назад

      @@JohnWatsonRooney Good to know, thanks! Your videos are very useful.

  • @fred_vids
    @fred_vids 3 года назад +2

    Do you have a video or can you create a video on how to schedule recurring scraping? Ie, say having the scrape run every hour?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +3

      I’ve covered it in my cronjobs video- and am going to cover it again soon

  • @kevinz1991
    @kevinz1991 3 года назад

    super cool very well explained thanks so much. subscribed :)

  • @renatosardinhalopes6073
    @renatosardinhalopes6073 3 года назад +1

    Hello John, could you compare Python Requests and Python Scrapy? I just found out about scrapy but want to know the caveats between the two.

  • @daniyalmehmood2912
    @daniyalmehmood2912 Год назад +1

    Thank you man!

  • @rangabharath4253
    @rangabharath4253 3 года назад +1

    Awesome 👍

  • @hirisraharjo
    @hirisraharjo 3 года назад +1

    Awesome! But what if the website doesn't make any xhr requests? Is headless browser the only way (by clicking and pretending to be a user)?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад

      That is a way yes, but more last resort - if we can render the page with the headless browser and grab the html to parse that way it’s a bit better

  • @vickysharma9227
    @vickysharma9227 2 года назад

    You did with GET method.
    How to do this task if you have POST request/mthod?

  • @JohnDeanRue
    @JohnDeanRue 2 года назад

    I am trying to scrape foreclosure .com and response.body will print just fine but will throw errors when I try to load it as json.loads

  • @zahrastb3869
    @zahrastb3869 Год назад +1

    Great video! The hidden api I'm trying to work around is a fetch type though. And it's response is really not as clean as this one. I don't know how to work with it really

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад +1

      Try saving the response to a file and opening it up in a separate Python script and work out how to extract the data you need then reimplement it into your main project. Notebooks are good for this too

    • @zahrastb3869
      @zahrastb3869 Год назад

      @@JohnWatsonRooney how about I just use selenium? This seems like a lot of work, the text is so jumbled

    • @zahrastb3869
      @zahrastb3869 Год назад

      @@JohnWatsonRooney though I never worked with selenium before

  • @muhammadrehan3030
    @muhammadrehan3030 3 года назад +1

    Bravo

  • @sudhanshuyaa
    @sudhanshuyaa 3 года назад +1

    Hi John
    Can you please guide about instagram scraping
    Thanks

  • @zheyuan2394
    @zheyuan2394 Год назад

    Great Video. I am wondering if scrapy can get that long URL by itself instead of copy and paste by ourself?

  • @heisenbergwhite5845
    @heisenbergwhite5845 3 года назад +2

    Loved the video!
    Any plans for a web scraping course soon?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +2

      Yeah I am planning one, just not sure where or how to release it. Or just make it free on yt

    • @mandarraut9565
      @mandarraut9565 3 года назад

      @@JohnWatsonRooney You can try to upload on Udemy. As i have checked there is not much content available on Web scraping. And thanks for making short tutorial on Yt like always

    • @JG-ms4rb
      @JG-ms4rb 3 года назад

      @@JohnWatsonRooney would be great to learn how to do price comparison from website to website and how to track that data / store it.

  • @ХалилМаденбай
    @ХалилМаденбай 3 года назад +1

    Fine

  • @xiaohongchen8343
    @xiaohongchen8343 3 года назад

    That's a nice video. Like always. Hi, John, Can you post a video to show how to scrapy home depot product reviews? Thank you.

  • @user-ur1xd2sh6u
    @user-ur1xd2sh6u 3 года назад

    Thanks for video!!!!!!

  • @rahmatmuhammad8736
    @rahmatmuhammad8736 Год назад

    I did this but unfortunately the API is salted 😢

  • @Ahmed7255
    @Ahmed7255 2 года назад

    do you have example for POST request?

  • @univej5787
    @univej5787 3 года назад

    What is software where was GET editor?

  • @mushinart
    @mushinart 3 года назад +1

    God bless 😎👍🏻

  • @LLlikeme
    @LLlikeme 3 года назад

    John I came up with your youtube channel and it is an amazing resource! Right now I am working in scrapper project but I have issues with ng class elements in the website I have done my research but without luck. Can you recommend something or a video in your channel? (I coding in Python)
    Regards!

  • @HoustonKhanyile
    @HoustonKhanyile 3 года назад

    Hi John, My comment is unrelated to this video. I've been trying to scrap music data from streaming platforms like soundcloud for the data only not the actual music. to create a analytics platform for independent musicians. and these websites are loaded dynamically so its been giving me a problem. I tried everything from selenium to request_html but it is just not happening. Could you please do a video on it. So I can learn.

    • @brothermalcolm
      @brothermalcolm 2 года назад

      what's the website and fields your trying to scrape?

    • @HoustonKhanyile
      @HoustonKhanyile 2 года назад

      @@brothermalcolm spotify, amazon music, tidal & youtube music. the fields are name, song name, streams, date uploaded and so forth.

  • @shaunpx1
    @shaunpx1 2 года назад

    Can you do a video showing how to scrape wikidata info on people like famous programmers?

  • @rizalsofyans
    @rizalsofyans 2 года назад +1

    hi, i follow your content, and your content awesome at all! but, can i get tutorial scrapy scraping graphql & follow the link after first request? thank you!😉

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад

      Hey! Thanks. I’ll be covering some of those topics coming up, I’ll see if I can drop that in too!

  • @perticomanonalto
    @perticomanonalto 2 года назад +1

    This is really cool but also kinda illegal, I guess it depends on what data you are fetching

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      The legality is a bit grey but we are only getting data that is publicly available online, it’s not behind a login nor are we abusing the website with 1000s of requests. I think if you use the data for personal consumption ie don’t try to sell it it’s ok

    • @perticomanonalto
      @perticomanonalto 2 года назад

      @@JohnWatsonRooney thank you for the response!

  • @randyallen8610
    @randyallen8610 Год назад

    I need help scraping data from a website that has a firewall. Will pay

  • @fulkerknupp26
    @fulkerknupp26 2 года назад

    Can you make a video about android app scraping?

  • @madisopabul
    @madisopabul Год назад +1

    this are not hidden apis -.-

  • @psycode5569
    @psycode5569 3 года назад +1

    Hi John, I sent you an email. I'm having trouble with something I hope you can help me.