Python and Scrapy - Scraping Dynamic Site (Populated with JavaScript)

Поделиться
HTML-код
  • Опубликовано: 25 окт 2024

Комментарии • 202

  • @codeRECODE
    @codeRECODE  3 года назад +8

    Hi everyone, I need your support to get this channel running. *Please SUBSCRIBE and Like!*
    Leave a comment with your questions, suggestions, or a word of appreciation :-)
    I would love your suggestions for new videos.

  • @harshnambiar
    @harshnambiar 4 года назад +25

    You did this without even using docker or splash. That is pretty cool. 🌸

  • @julian.borisov
    @julian.borisov 4 года назад +20

    "Without Selenium" caught my attention!

    • @klarnorbert
      @klarnorbert 3 года назад +1

      I mean, selenium is not for web scraping(it's mostly used for automating web app testing). If you can reverse engineer the API, like in this video, Scrapy is more than enough.

    • @k.m.jiaulislamjibon1443
      @k.m.jiaulislamjibon1443 3 года назад

      @@klarnorbert but sometimes you have no other way other tan use selenium. Some webapp developer is so much clever to encapsulate ta funcion calls that page don't show xhr request. i had to use selenium for parsing data in a. webapp

  • @osmarribeiro
    @osmarribeiro 4 года назад +7

    OMG! Amazing video. I'm learning scrapy now, this video help me a lot.

  • @kenrosenberg8835
    @kenrosenberg8835 3 года назад +3

    wow! You are a very smart programmer, I never thought of making REST API calls directly and then parsing the response, very nice, there is a lot to learn in your videos, more than just scraping.

  • @igorwarzee
    @igorwarzee 3 года назад +3

    It really helped me a lot. Thank you and congrats. Cheers from Brazil!

  • @gamelin1234
    @gamelin1234 3 года назад +3

    Just used this technique to scrape a huge dataset after struggling for a couple of hours with requests+BS. Thank you so much for the great content!

  • @lambissol7423
    @lambissol7423 3 года назад +3

    excellent!! i feel like you doubled my knowledge on web scraping!

  • @moviesaddaNR
    @moviesaddaNR 4 месяца назад

    the video is too good as I am trying to learn scrapy but i thought it was far difficult for me to understand
    but you made it simple

  • @helloworld-sk1hr
    @helloworld-sk1hr 4 года назад +2

    Before watching this video I was doing this using selenium when I am watching your video then I am laughing myself what I was doing.
    This video has saved my day
    Your videos are amazing 🔥

  • @ruksharalam173
    @ruksharalam173 Год назад

    Wow, learning something new about scrapy everyday

  • @EnglishRain
    @EnglishRain Год назад

    FANTASTIC explanation!!

  • @sebleaf8433
    @sebleaf8433 3 года назад +4

    Wow!! This is awesome! Thank you so much for teaching us new things with scrapy :)

    • @codeRECODE
      @codeRECODE  3 года назад

      Thank you :-)

    • @mohamedbhasith90
      @mohamedbhasith90 11 месяцев назад

      @@codeRECODE Hi sir, I'm trying to scrape a website with hidden apis like you did in this video. but, the data is in POST request not in GET request like you have in the video.. I'm really stuck here.. can you make a video on scraping with hidden api using POST request? i hope you find this comment

  • @RonZuidema
    @RonZuidema 4 года назад +3

    Great video, thanks for the simple but precise instruction!

  • @yusufrifqi5006
    @yusufrifqi5006 2 года назад

    all of your tutorial is very helpful, big thanks to you, and i will wait for another scrapy content

    • @codeRECODE
      @codeRECODE  2 года назад +1

      Coming soon!

    • @yusufrifqi5006
      @yusufrifqi5006 2 года назад

      @@codeRECODE nice! I will waiting for scrapy asyncronus program

  • @carryminatifan9928
    @carryminatifan9928 3 года назад +2

    Beautiful shop and selenium is not for large data scraping.
    Scrapy is best 👍

  • @stealthseeker18
    @stealthseeker18 4 года назад +3

    Can you do web scraping if the website behind cloudflare version 2?

  • @lorderiksson3377
    @lorderiksson3377 6 месяцев назад +1

    This technique is fantastic. And thanks a lot for great content on your youtube page. Keep up the great job.
    But how to implement pagination? Bit of a shame it wasn't shown here.
    Let's say the schools are in a list of 25 items per page. 10 pages in total. How to do then?

    • @codeRECODE
      @codeRECODE  5 месяцев назад

      Shame is a strong word, no?
      I try to cover one single topic in one video. Pagination itself is a topic by itself. I have a video on that too.
      If you rather learn in a structured manner, you can try my course for a week.

  • @cueva_mc
    @cueva_mc 3 года назад +2

    This is amazing, thank you!

  • @Chris-vx6eb
    @Chris-vx6eb 4 года назад +6

    This took me 2 days to figure out. If you're having trouble with json.loads(), I found out that the json data i scraped was actually a byte string type, and so i had to decode it BEFORE using json.loads. So where he had (9:47)
    *raw_data = response.body*
    replace with: *raw_data = response.body.decode("utf-8")*
    then continue on with: *data = json.loads(raw_data)*
    TO CHECK IF YOU NEED TO DO THIS, RUN THIS TEST:
    *raw_data = repr(response.body)* #repr() is a built in function that (1) turns python objects into printable objects, so you can see what you're dealing with and (2) in my case, if it prints out your object, you can find out if you have a byte string because you will get a 'b' infront of your string.
    *print(raw_data)*
    output>>> b'{ {data:...}, otherdata: [{...},{...}] }'
    if you have this b, use the method I described above. Hope I saved someone time, stackoverflow doesn't have a question for this yet (:

    • @codeRECODE
      @codeRECODE  4 года назад +2

      @chris - Good Catch!
      Short answer: replace response.body.decode("utf-8") with response.text
      Detailed answer:
      Let's understand text and body
      response.body contains the raw response without any decoding
      response.text contains the decoded response text as string
      In this video, response.body worked because there are no special decoding required
      Your method is correct. Even better approach would be use response.text as it actually is TextResponse which is encoding aware object.
      Bonus tip: install ipython and you will have a much better python console
      Good luck!

    • @Chris-vx6eb
      @Chris-vx6eb 4 года назад +1

      @@codeRECODE awesome, thanks!

    • @tokoindependen7458
      @tokoindependen7458 3 года назад +1

      Bro paste this article on website, so many people can easily to find out

    • @pythonically
      @pythonically 2 года назад

      raise TypeError(f'the JSON object must be str, bytes or bytearray, '
      TypeError: the JSON object must be str, bytes or bytearray, not tuple
      this same error?

  • @tunoajohnson256
    @tunoajohnson256 4 года назад +1

    This is a great tutorial. You taught me a lot and my app runs way faster than using Selenium now. Many Thanks, I hope to encourage you to keep teaching!

  • @nadyamoscow2461
    @nadyamoscow2461 3 года назад

    Many thanks! I`ve learned a lot and it all works fine.

  • @BreakItGaming
    @BreakItGaming 4 года назад +2

    Sir Please Complete this Series To Complete Advanced level.I have looked at many youtube channels but i did'nt find any series which is complete one.
    So it is my kind request.
    Anyway thanks for stating such initiative.

    • @codeRECODE
      @codeRECODE  4 года назад

      Glad that you liked it. I will add more videos in the future for sure :-)

  • @joaocarlosariedifilho4934
    @joaocarlosariedifilho4934 4 года назад +4

    Excellent, sometimes there is no reason to use Splash we need only to understand what and how js is making the requests, Thank you!

    • @codeRECODE
      @codeRECODE  4 года назад +2

      Exactly! It's must faster and the web server doesn't have to send all those CSS, JS, Images etc. Everyone is happier :-)

    • @shashikiranneelakantaiah6237
      @shashikiranneelakantaiah6237 4 года назад

      @@codeRECODE Hi there, I am facing an issue with a website, I can hit the first page but from then on if I make any request it redirects back to first page itself. It would be of great help if you could summarise as why this behaviour occurs on some sites. Thanks. And if I make the request to the same url with scrapy-splash I am getting lot of time out errors.

    • @codeRECODE
      @codeRECODE  4 года назад +1

      @@shashikiranneelakantaiah6237 - double check that you are passing all the request headers, except cookies and content-length
      cookies will be handled by scrapy.
      content-length will vary and will break things instead of fixing it

    • @shashikiranneelakantaiah6237
      @shashikiranneelakantaiah6237 4 года назад +1

      @@codeRECODE Thank you for replying, will give it a try, please do more videos on scrapy, your way of explaining the topics are excellent. Once again Thank you.

  • @curdyco
    @curdyco 15 дней назад

    Please see this comment
    I am trying to scrape data from almost exactly this setup, but the website i am trying it on requires me to select sections
    1. Going to the website
    2. Then i have to select one out for options
    3. Then I get a scrollbar (which i don't have to interact with as it only has one option)
    4. Then clicking submit
    5. then i get a table of 500 rows and in the last column of each row, I have "view" button that gives me data in pop ups
    In all 5 steps the URL of the page remains the same.
    clicking on buttons in a bit of a challenge but what is more hard is the reponses for each "view" have the SAME url. I don't know how to deal with it, i can give the website link so you are able to better understand but i desperately need the guidence here

  • @emmanuelowino4291
    @emmanuelowino4291 3 года назад +1

    Thanks for this, It really helped , but what if instead of a json file it returns a xhr response

    • @codeRECODE
      @codeRECODE  3 года назад

      Nothing changes. JSON and XHR is just browser's way of logically grouping information in this case.

  • @charisthawhite2793
    @charisthawhite2793 3 года назад

    your video is very helpful, deserve to subscribe

  • @hayathbasha4519
    @hayathbasha4519 3 года назад

    Hi,
    Please advice me on how to improve / speed up the scrapy process

    • @codeRECODE
      @codeRECODE  3 года назад

      You can increase the CONCURRENT_REQUESTS from default 16 to a higher number.
      In most cases, you will need proxies if you want to scrape faster.

  • @orlandespiritu2961
    @orlandespiritu2961 3 года назад

    Hi can you help me write a code that grabs hotel data from Agoda using this? I’ve been stuck.Running out of time for an exercise. Just started learning Python 3 weeks ago.

  • @jagdish1o1
    @jagdish1o1 3 года назад +1

    It's an awesome tutorial. I've learned alot thanks. I have a question, I want to set a default value if there's no value.
    I've tried with pipelines but item.setdefault('field', 'value') in process_item but it's not working.

    • @codeRECODE
      @codeRECODE  3 года назад

      def process_item(self, item, spider):
      for field in item.fields:
      if item.get(field) is None: # Any other checks you need
      item[field]="-1"
      return item

  • @cueva_mc
    @cueva_mc 3 года назад

    is it possible to parse the "base_url" instead of copying it?

    • @cueva_mc
      @cueva_mc 3 года назад +1

      Or is it possible to parse the XHR urls from python?

    • @codeRECODE
      @codeRECODE  3 года назад

      I am not sure what you want to ask, can you expand your question?

  • @AmitKumar-qv2or
    @AmitKumar-qv2or 4 года назад +1

    thank you so much sir....

  • @RahulT-oy1br
    @RahulT-oy1br 4 года назад +3

    You just earned ₹7000 in 30 mins. Wowza

    • @codeRECODE
      @codeRECODE  4 года назад +5

      Thank you, but let's be honest. This is NOT a get rich quick scheme. There is work involved in learning, analyzing the site, and finally, finding someone who will pay YOU for this task. Involves hard work :-)
      That being said, this is one of the fastest paths to actually earn money as a freelancer.

    • @RahulT-oy1br
      @RahulT-oy1br 4 года назад +1

      @@codeRECODE Any particular freelancing or online short-term internship sites you'd recommend?

    • @codeRECODE
      @codeRECODE  4 года назад +3

      @@RahulT-oy1br Any of the freelancing sites is fine. Practice with jobs already closed. Once you are confident, start applying for new jobs

    • @fabiof.deaquino4731
      @fabiof.deaquino4731 4 года назад

      @@codeRECODE great recommendations. Really appreciate all the work that you have been doing! Thanks a lot.

    • @zangruver132
      @zangruver132 4 года назад

      @@codeRECODE Well I have never done freelancing nor do I have any idea. Can you still suggest me atleast one or two sites for me to start web scraping freelancing in India? Also do I need any prior experience?

  • @Ankush_1991
    @Ankush_1991 4 года назад

    Hi Sir the video is great because of its simplicity and clarity. I am a beginner in webscraping and I am stuck at a point for very long time now can u help me. How do We contact you for our doubts please mention something in ur video descriptions.

    • @codeRECODE
      @codeRECODE  4 года назад +1

      you can post your doubts here or comments section of my website. It is not always possible to reply to every question due to sheer volume though. I am planning to start a facebook group where everyone can help everyone else. Let me know how it sounds.

  • @kamaralam914
    @kamaralam914 Год назад

    Sir, in my case i am using it for india mart and not getting any data on the response tab!

  • @felinetech9215
    @felinetech9215 4 года назад +1

    I followed along all your videos to be able to scrape a javascript generated webpage, but the data I want to scrape isn't in the XHR tab. Any suggestions sir ?

    • @codeRECODE
      @codeRECODE  4 года назад

      Check the source of the main document

    • @felinetech9215
      @felinetech9215 4 года назад

      @@codeRECODE any info on how to do that sir ?

  • @l0remipsum991
    @l0remipsum991 3 года назад

    Thank you so much. 1437! You literally saved my a$$. Subbed!

  • @gracyfg
    @gracyfg 5 месяцев назад

    Can you extend this and show us how to scrap all pages next and all product details and make it a production quallity product. or some points to make this a productions quality code with exceptions etc...

    • @codeRECODE
      @codeRECODE  5 месяцев назад

      All these topics need a lot of details. most of these topics are covered across many videos.
      You can also try my course and ask for a refund within a week if you don’t like it.
      Happy learning!

  • @157sk8er
    @157sk8er 3 года назад

    I am trying to scrape information from a weather site but my code is not showing up in the XHR but it is showing up in the JS tab. How do I scrape data from this tab?

    • @codeRECODE
      @codeRECODE  3 года назад +1

      Nothing changes! All, JS, XHR is Chrome's way of organizing URLs. You will find every under the All tab as well. Just use the same technique.

  • @Pablo-wh4vl
    @Pablo-wh4vl 4 года назад +1

    How will yo go if instead of in XHR, the content is loaded with the following one, with calls in the JS tab? Is still possible with requests?

    • @codeRECODE
      @codeRECODE  4 года назад +1

      Tabs are only for logical grouping. You can extract info from any request, just that the code will change based on how data is organized.

  • @gsudhanshu
    @gsudhanshu 4 года назад +1

    I am trying to copy what you did in the video but with the same code I am getting error on fetching first api i.e. getAllSchools . 2020-08-23 18:57:38 [scrapy.core.scraper] ERROR: Spider error processing (referer: directory.ntschools.net/)
    Traceback (most recent call last):
    File "/home/sudhanshu/.local/lib/python3.6/site-packages/scrapy/utils/defer.py", line 120, in iter_errback
    yield next(it)
    File "/home/sudhanshu/.local/lib/python3.6/site-packages/scrapy/utils/python.py", line 347, in __next__
    return next(self.data)

  • @codingfun915
    @codingfun915 4 года назад

    How can I get the information if i have all the links of the schools and want to extract data from these links? Where should I keep all the links?? In the starting_urls or where please help me asap

  • @mmelonmann
    @mmelonmann 3 года назад

    What happens when you encounter a 400 Code with the API link address? Can't seem to get past the API as the response.text shows "No API key found in request."

    • @codeRECODE
      @codeRECODE  3 года назад

      Find the API key and add it to headers

  • @andycruz7
    @andycruz7 2 месяца назад

    Thanks man

  • @UmmairRadi
    @UmmairRadi Год назад

    Thank you this is awesome. what about a website that gets data using Graphql?

  • @MedhanshCM
    @MedhanshCM 4 года назад

    Hi,
    Could you help me to solve a similar kind of problem? I tried this header but still not getting any data.

  • @sowson4347
    @sowson4347 4 года назад +1

    Thank you for the easy to follow videos done in an calm unhurried manner. I notice you used VSCode for part of the work and CMD for running Scrapy. I found it extremely difficult to load Scrapy into VSCode even with a virtual environment. I could not run it in the VSCode terminal. How did you do it?

    • @codeRECODE
      @codeRECODE  4 года назад +2

      I work on scrapy a lot so I have it installed at the system level ("pip install scrapy" at cmd with admin rights). Just saves me a few steps. When I have to distribute the code, I always create a virtual environment and use scrapy inside it
      If I want to use VS Code terminal, I just use the bottom left area where the python environment in use is listed, click it, and change it to set to the current virtual environment.

    • @sowson4347
      @sowson4347 4 года назад +1

      @@codeRECODE Thank you for responding so quickly. I was under the impression that Scrapy could run in VSCode just like BS. I solved the issue after watching your video many times over and reading up numerous other sites. What I had failed to comprehend was Scrapy has to be run in the Anaconda cmd environment not within a VSCode notebook. VSCode is just an editor being used to create the spider file. Your use of ntschools.py file in C:\Users\Work also confused me. I have now created my first Scrapy spider and can follow your videos better. Thanks keep up the good work.
      Scrapy refused to install at the system level. I had to use Anaconda.

    • @codeRECODE
      @codeRECODE  4 года назад

      Good that the issue is resolved. Never had a problem installing scrapy with elevated cmd (run as administrator) or sudo pip3 install
      Don't know why you faced a problem
      BTW, "work" was just my user id.

    • @sowson4347
      @sowson4347 4 года назад

      @@codeRECODE User Error 101 - RTFM

  • @dashkandhar
    @dashkandhar 4 года назад +1

    very knowledgeable and clear content, Kudos! ,
    and what if an API is taking time to return response data than how to handle that?

    • @codeRECODE
      @codeRECODE  4 года назад

      Thanks!
      If it taking time, change the DOWNLOAD_TIMEOUT in settings. Add this line to your spider class
      custom_settings={
      'DOWNLOAD_TIMEOUT' : 360 # in seconds. Default is 180 seconds
      }

  • @ThangHuynhTu
    @ThangHuynhTu 2 года назад

    (7:00) : How can you copy paste the headers like that ? I try to copy as you but I have to put quote by myself? Is there anyway to copy fast as yours?

    • @codeRECODE
      @codeRECODE  2 года назад +1

      Oh, I understand the confusion. I removed that part to keep the video short. Anyways, you can make it quick and easy by following these steps.
      pip install scraper-helper
      This library contains some useful functions that I created for my personal use and later made it open source.
      Once you have this installed, you can use the headers that you copied directly without formatting. Simply use the function get_dict() and send the headers in a triple-quoted string:
      headers = scraper_helper.get_dict('''
      accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-
      accept-encoding: gzip, deflate, br
      accept-language: en-GB,en;q=0.9
      ''')
      It will also take care of cleaning up unwanted headers like cookies, content-length etc. Good luck

    • @ThangHuynhTu
      @ThangHuynhTu 2 года назад

      @@codeRECODE Really nice. Thanks for your clarifying!

  • @himanshuranjan7456
    @himanshuranjan7456 4 года назад

    Just one question, does scrapy has support of async. I mean when taking a look at libraries like request or request-html they have async support, so the time consumed during scrapping is very less.

    • @codeRECODE
      @codeRECODE  4 года назад

      Yes and better!
      It is based on twisted. The whole framework is built around the idea of async. You would have to use to appreciate how fast it is.

  • @FBR2169
    @FBR2169 2 года назад

    Hello Sir. A quick question. What if the Request Method of the website is POST instead of GET? Will this still work? If not what should I do?

    • @codeRECODE
      @codeRECODE  2 года назад

      Yes it will.
      See my many videos on POST requests - ruclips.net/user/CodeRECODEsearch?query=post

  • @bibashacharya2637
    @bibashacharya2637 2 года назад

    hello sir my question is that can we do exactly same things with docker and spalsh?? please reply

    • @codeRECODE
      @codeRECODE  2 года назад

      Yes -- See this ruclips.net/video/RgdaP54RvUM/видео.html

  • @niteeshmishra2790
    @niteeshmishra2790 Год назад

    hi i am wondering if i want to scrape multiple field then how to do it,suppose i searched mobile on amazon now i get mobile brand name description link and complete details along with next page.

    • @codeRECODE
      @codeRECODE  Год назад

      See this ruclips.net/video/LfSsbJtby-M/видео.html

  • @amarchinta4463
    @amarchinta4463 3 года назад

    Hi sir, I have one question about not this tutorial. I want to fetch multiple different domains having the same page structure with a single spyder. How I can achieve this ? Please help

    • @codeRECODE
      @codeRECODE  3 года назад

      If same structure means same selectors for all those domains, just add them to start_urls or create a crawl spider.

  • @muhammedjaabir2609
    @muhammedjaabir2609 4 года назад

    why iam getting thi error ???
    "raise JSONDecodeError("Expecting value", s, err.value) from None
    "

  • @chapidi99
    @chapidi99 3 года назад

    Hello, is there an example how to scrape if there is paging?

    • @codeRECODE
      @codeRECODE  3 года назад

      I have covered pagination in many videos. I am planning to create one video to cover all kind of pagination in one video.

  • @abukaium2106
    @abukaium2106 4 года назад

    Hello sir, I have made a spider same to your coding but it show twisted.python.failure.Failure twisted.internet.error.ConnectionLost: Connection to the other side was lost in a non-clean fashion: Connection lost, what can I do to solve it. Please reply. Thanks

    • @codeRECODE
      @codeRECODE  4 года назад

      some connectivity issue. See if you can connect using scrapy shell

  • @harshgupta-ds2cw
    @harshgupta-ds2cw 4 года назад

    I have been trying to find a webscraper which will work on OTT platforms. Your method didn't gave me any results. I need help.

    • @codeRECODE
      @codeRECODE  4 года назад

      Scraping OTT is almost impossible due to technical reasons -as they have multiple layers of defenses to stop privacy, AND legal reasons. I am not going to attempt it for sure :-)

  • @arunk6435
    @arunk6435 2 года назад

    Hello, Mr Upendra. Every time I start to scrape, my data usage reaches its limit too fast. What is your data plan? I mean, How many GBs are you allowed to use per day?

    • @codeRECODE
      @codeRECODE  2 года назад

      It's really hard to calculate how many GBs your project is going to consume. If you can probably run your project on any of the cloud services.
      For any serious work, I would suggest to get a broadband connection with no data cap.

    • @arunk6435
      @arunk6435 2 года назад

      @@codeRECODE Thank You, Mr Upendra. I would like to know what data plan you use. What is your daily Data Limit?

  • @daddyofalltrades
    @daddyofalltrades 3 года назад +2

    Sir thanks a lot !! This series will definitely help me ❤️

  • @adityapandit7344
    @adityapandit7344 3 года назад

    Hi Sir,
    How can we scrape json data from a website using scrapy.

    • @codeRECODE
      @codeRECODE  3 года назад

      Create a regular scrapy request for the url that contains the json data. In the Call back method (for example, parse) you can access the json directly using response.json() in the newer versions

    • @adityapandit7344
      @adityapandit7344 3 года назад

      @@codeRECODE hi sir have you posted any video on it?

  • @stalluri11
    @stalluri11 3 года назад

    is there a way to scrape webpages in python when url doesnot change with page numbers

    • @codeRECODE
      @codeRECODE  3 года назад +1

      Yes, I have covered this in many videos. I am planning to do a dedicated video on pagination.

    • @stalluri11
      @stalluri11 3 года назад

      @@codeRECODE look forward to it. I can't find a video on this

  • @chakrabmonoj
    @chakrabmonoj 3 года назад

    In fact I followed your steps into the XHR and 1. It does not show accept.json (but the site is run by JS which I checked by the hack shown by you here) 2. It also says 'eval' not allowed on the site (not sure what that means) - it shows no file being generated as you have shown for this site.
    what could be happening here?
    I am trying to sort all my connections by the total number of reactions their posts have got.
    Can u help with a suggestion for coding this?
    thanks

    • @codeRECODE
      @codeRECODE  3 года назад +1

      I am attaching the link to the code. I just tried it and it works. Make sure that you run this with *scrapy runspider **ntschools.py* , not like a python script.
      Source: gist.github.com/eupendra/7900849c56872925635d0c6c6b8f78f5

    • @chakrabmonoj
      @chakrabmonoj 3 года назад

      @@codeRECODE Thanks for the quick revert. What I forgot to mention is I was trying to use your code on LinkedIn. Does it have excessive privacy policies because of which it is not showing any Json file being generated? Any help appreciated.

  • @shubhamsaxena3220
    @shubhamsaxena3220 2 года назад

    Can we scrape any dynamic website using this method?

    • @codeRECODE
      @codeRECODE  2 года назад +1

      Short answer - No. There are multiple techniques to scrape dynamic websites. Every site is different and would need a different technique.

  • @AndresPerez-qd8pn
    @AndresPerez-qd8pn 4 года назад +1

    Hey i love your videos,
    I'm a little stuck with some code, could you help me? That would be very nice (some tutoring).

  • @maysgumir3972
    @maysgumir3972 4 года назад

    HI,
    I need your help. I am trying to scrape details from the e-commerce site www.banggood.com, the price is ajax loaded and I cannot retrieve it with scrappy then I tried to get the ajax request manually as you teach in the video but I cannot find the exact path for the request. Could you please make a video on this particular website (to find ajax request manually). Your help will be more appreciable. you can choose any category for scraping details.
    @Code / RECODE

  • @beebeeoii5461
    @beebeeoii5461 3 года назад

    hi, great video but sadly this will not work if the site does some hashing/encrypting of their API. for eg, a token has to be attached as the header and the token can only be achieved through some kind of computation done by the webpage

    • @codeRECODE
      @codeRECODE  3 года назад +2

      If your browser can handle encryption, hashing, you can do that with Scrapy too. Most of the time, they will just send some unique key which you have to send in the next request.
      If you don't have time to examine how it is working, you can use splash/selenium or something similar and save time. It will be faster to code but slower in execution.
      If you do figure out APIs, the scrapes are going to be very fast, especially when you want to get millions of items every day.
      Finally, just think of it as another tool in your arsenal. Use the one that suits the problem at hand :-)
      Good luck!

  • @azwan1992
    @azwan1992 2 года назад

    Nice!

  • @the_akpathi
    @the_akpathi 2 года назад

    Is it legally ok to send headers from a script like this? Specially headers like user-agent?

    • @codeRECODE
      @codeRECODE  2 года назад

      This is an educational video aiming to teach how things work. For legal issues, you would need to talk to your lawyer.

  • @yashnenwani9261
    @yashnenwani9261 3 года назад

    Sir i want to use search bar to search for a particular thing and then extract related data
    Pls. Help!

    • @codeRECODE
      @codeRECODE  3 года назад

      Open dev tools and check the network tab. See what happens when you click search.
      If you can't figure it out, use selenium

  • @HoustonKhanyile
    @HoustonKhanyile 3 года назад

    Çould you please make video scrapping a music streaming service like Soundcloud.

  • @engineerbaaniya4846
    @engineerbaaniya4846 4 года назад

    Where I can get detailed tutorial

    • @codeRECODE
      @codeRECODE  4 года назад

      courses.coderecode.com/p/mastering-web-scraping-with-python

  • @adityapandit7344
    @adityapandit7344 3 года назад

    Hii sir when I loads the json data then I M facing json decode error expecting value line 1 . What is the solution of it

    • @codeRECODE
      @codeRECODE  3 года назад

      It means that the string that you are trying to load with json is not in the form of valid json format. It may need some clean up

    • @adityapandit7344
      @adityapandit7344 3 года назад

      @@codeRECODE yes sir the error has been resolved. Now can you give me an idea how can I link scrapy with django. It will be very greatful.sorry I am asking too many questions. But I M doing it practically that's why I M facing these problems.

  • @harshnambiar
    @harshnambiar 4 года назад

    Also, can you scrape bseindia this way?

    • @codeRECODE
      @codeRECODE  4 года назад

      Haven't tried bse. Have a look at my blog to see how I did it for NSE.
      coderecode.com/scrapy-json-simple-spider/

  • @taimoor722
    @taimoor722 4 года назад

    i need help regarding how to approach client for webscrapping project

    • @codeRECODE
      @codeRECODE  4 года назад

      I would be including some tips in my upcoming courses and videos

  • @zaferbagdu5001
    @zaferbagdu5001 4 года назад

    hi , i tried to write a code but in query response return 'Failed to load response data' , as a result there are jquery links , am i use them

    • @codeRECODE
      @codeRECODE  4 года назад +1

      share your code in pastebin or something similar. I will try to find the problem

    • @zaferbagdu5001
      @zaferbagdu5001 4 года назад

      @@codeRECODE code here=pastebin.pl/view/ee0b7d3d
      Shortly the real page is www.tjk.org/TR/YarisSever/Info/Page/GunlukYarisSonuclari
      i want to scrap tables in this page
      thanks for everything

  • @adityapandit7344
    @adityapandit7344 3 года назад

    Hi sir

    • @codeRECODE
      @codeRECODE  3 года назад

      Please watch the XPath video I posted. That will help you. It will be something like this:
      //script[@type="application/ld+json]"

    • @adityapandit7344
      @adityapandit7344 3 года назад

      @@codeRECODE yes it's but it's the second script tag in this page how can we mention the second one

    • @codeRECODE
      @codeRECODE  3 года назад

      just add [2]

    • @adityapandit7344
      @adityapandit7344 3 года назад

      @@codeRECODE where can I add 2 can you tell me

  • @naijalaff6946
    @naijalaff6946 4 года назад +1

    great video.Thanks you so much.

  • @udayposia5069
    @udayposia5069 3 года назад

    I want to send null value for one of the formdata using FormREquest.form_response. How should I pass null value. Its not accepting ' ' or None.

    • @codeRECODE
      @codeRECODE  3 года назад

      Share your code. Usually blank strings work.

  • @monika6800
    @monika6800 4 года назад

    Hi
    Could me please help me in scraping one of the dynamic site?

    • @codeRECODE
      @codeRECODE  4 года назад +1

      Which site is that? What is the problem you are facing?

  • @WDMatt02
    @WDMatt02 3 года назад +1

    i love u indian buddy, thanks to ur rook sacrifice

    • @codeRECODE
      @codeRECODE  3 года назад

      Glad that my videos are helpful :-)

  • @sunilghimire6990
    @sunilghimire6990 4 года назад

    scrapy crawl generates error like
    DEBUG: Rule at line 1702 without any user agent to enforce it on.
    Help me

    • @codeRECODE
      @codeRECODE  4 года назад

      What exactly are you trying to achive? Are you going through the same exercise as I showed in the video?

    • @sunilghimire6990
      @sunilghimire6990 4 года назад

      I am following your tutorials and i tried to scrape website
      Title = response.css('title::text'). extract ()
      Yield Title
      I got the title but also got unusual error as mentioned.

    • @codeRECODE
      @codeRECODE  4 года назад +1

      @@sunilghimire6990
      It looks like you are either not passing the Headers in the request OR something is wrong with the user-agent part of the header dictionary OR the header dictionary itself is not correctly formatted.
      Here are a few other things I can suggest:
      1. You are using extract(), which is the same as getall() This is confusing and that's why it is outdated now.
      2. Probably you are using "scrapy CRAWL" to run the spider. What I have created here is a standalone spider which needs to be run using "scrapy runspider"
      3. Take up my free course to get the basics clear. I am sure it will help you. Here is the coderecode.com/scrapy-crash-course
      4. Once you register for the free course, you will find the complete source code that you can run. If you face any problem, you can attach screenprint and code in the comments in my course and I will surely help in detail

    • @sunilghimire6990
      @sunilghimire6990 4 года назад

      @@codeRECODE thank you sir

  • @oktayozkan2256
    @oktayozkan2256 2 года назад

    this is API scraping. some websites use csrftoken and sessions in their API, this makes the website nearly impossible to scrape from API.

    • @codeRECODE
      @codeRECODE  2 года назад

      While CSRFtoken and sessions can be handled, I do agree that this technique does not work everywhere.
      However, this should be the first thing that we should try. Rendering using Selenium/Playwright should be the last resort.
      Even after that, many websites will not work, and there will be no workaround. 🙂

  • @nimoDiary
    @nimoDiary 4 года назад

    Can you please teach how to scrape badminton players data from pbl site

    • @codeRECODE
      @codeRECODE  4 года назад

      Whats the site URL? What have you tried and what problem are you facing?

    • @nimoDiary
      @nimoDiary 4 года назад

      www.pbl-india.com/
      I am trying to extract the data of squads of all teams with all their details including names, country, world rank, etc

    • @codeRECODE
      @codeRECODE  4 года назад

      @@naijalaff6946 Thank you for the mention in readme. Feels good :-)

  • @ashish23555
    @ashish23555 3 года назад

    Why need of scrappy or selenium as these r not helpful on AJAX

    • @codeRECODE
      @codeRECODE  3 года назад

      I am not sure I understand your question. Can you elaborate?

    • @ashish23555
      @ashish23555 3 года назад

      @@codeRECODE how to scrapp pages from a website protected with reCAPTCHA.

    • @codeRECODE
      @codeRECODE  3 года назад +1

      @@ashish23555 Use a service like 2captcha.com

  • @ashish23555
    @ashish23555 3 года назад +1

    Really scrapy is the best but it needs time to be pro.

  • @Ahmad-sn9kh
    @Ahmad-sn9kh 2 месяца назад

    i want scrape data from tiktok can you help me

  • @zangruver132
    @zangruver132 4 года назад +1

    Hey. I wanted to scrape number of comments of each game in the following link (fitgirl-repacks.site/all-my-repacks-a-z/). But I can't find it anywhere in the network tab. Yes the html without JS provides a number of comments with it but it is outdated one.

    • @codeRECODE
      @codeRECODE  4 года назад +1

      it's there! Here is how to find it. Open the site, press F12, go to the Network tab and open any listing. On the top, you will see something like 238 comments. Now, make sure that your focus is on the Network tab and press CTRL F. Now search for this number 238. You will quite a few results and one of them will be a .js file that will have this data.
      You will note that this comes from a third-party commenting system.
      Reminder - Getting this data using web scraping may not be legal. I do not give advice on what is legal and what is not. What I explained is only for learning how websites work. Good luck!

  • @kaifscarbrow
    @kaifscarbrow 2 года назад

    Cool price. I've been doing ~500k records for $100 🥲

  • @KartikSir_
    @KartikSir_ 2 года назад

    Getting Error :
    [scrapy.core.engine] DEBUG: Crawled (403)

  • @shannoncole6425
    @shannoncole6425 3 года назад

    Nice!