Create Your Own Scraper API with FastAPI and Python

Поделиться
HTML-код
  • Опубликовано: 6 ноя 2024

Комментарии • 69

  • @Christian-mn8dh
    @Christian-mn8dh 3 года назад +3

    Just came across this channel and am so surprised that you don't have at least 70k subscribers. Such quality content! keep at it.

  • @samman5980
    @samman5980 3 года назад +10

    I was literally just thinking about doing this. Are you some how web scraping my brain?!?

  • @pythonantole9892
    @pythonantole9892 3 года назад +2

    Just blown away at how simple this is.

  • @celerystalk390
    @celerystalk390 3 года назад +5

    Another great and super useful tutorial! Thank you and keep up the good work John.

  • @loverboykimi
    @loverboykimi 2 года назад +1

    It was smooth. Thanks.

  • @c.caratti2848
    @c.caratti2848 3 года назад +1

    Great intro tutorial to FastAPI with scraping! Thanks!

  • @thekarthik
    @thekarthik 3 года назад +2

    Great tutorial, thankyou. I was just thinking about doing this haha. Will this work with multiple endpoints too?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад

      Thanks! yes it would, we you just need to code them in

  • @yusufrumi1626
    @yusufrumi1626 2 года назад +1

    Happy new year John.

  • @hayat_soft_skills
    @hayat_soft_skills 3 года назад +3

    I love the content that you are sharing .
    your teaching method is awesome!
    kindly make more videos on hosting scraper on digital ocean like selenium script , scrapy script
    Thanks!

  • @karthikb.s.k.4486
    @karthikb.s.k.4486 3 года назад +2

    Nice tutorial. What is the command prompt that you opened in windows OS for running fas api. Please let me know.

    • @hassanrahmani4764
      @hassanrahmani4764 3 года назад

      Its is wsl sub system for linux u can get it from windows store on win 10😊

    • @karthikb.s.k.4486
      @karthikb.s.k.4486 3 года назад

      @@hassanrahmani4764 Thank you 😊

    • @hassanrahmani4764
      @hassanrahmani4764 3 года назад

      @@karthikb.s.k.4486 welcome am.thinking to make a web scrapoing and programming channel should i make it like will it grow? Freelancing related stuff and programmming

    • @karthikb.s.k.4486
      @karthikb.s.k.4486 3 года назад

      @@hassanrahmani4764 sorry I am not sure on this

  • @bighneswar98
    @bighneswar98 2 года назад +1

    Nicely explained mate. Unable to use fastapi along with scrapy. Throws internal error everytime and on command line it shows twisted reactor already running. Any fix?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад

      I’m that case I think you’d need them separate - use scrapy to save to a database then fastapi to display the information

    • @bighneswar98
      @bighneswar98 2 года назад

      @@JohnWatsonRooney So no way to run scrapy scrapers directly like request html ones using fastapi?

  • @s6yx
    @s6yx 3 года назад

    dope. I was meaning to just use bs4 and flask for my old webscraping code but decided on fastapi because of this video

  • @techamusement6714
    @techamusement6714 3 года назад +3

    Thank you sir you just provided that what we wanted.Absolutely exquisite

  • @zacky7862
    @zacky7862 3 года назад +1

    Great tutorial. I'm gonna need this.

  • @chitranshjain5931
    @chitranshjain5931 3 года назад +1

    Thanks for sharing this !
    Really like your stuff and it makes my life easier. 👍

  • @acikast
    @acikast 2 года назад +1

    Thank you John for sharing your knowledge!

  • @ricksanchez7077
    @ricksanchez7077 2 года назад +1

    Thanks for sharing 😀

  • @adityaraj1284
    @adityaraj1284 2 года назад

    i have watched this video, its a fantastic video i got all i wanted except one that how to deploy it online so that i can use api for my android app projects. Please make a video on it as this is really gonna help me and i would be very thankfull to you.

  • @willyle23
    @willyle23 2 года назад +1

    Hello John, I have a question. I was able to create everything like you have shown but I have used render() for a JS website. When I go to host it on fastAPI, it is telling me to use the AsyncHTMLsession instead. Have you ran into this problem?

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      Hey! Did you put the render part inside an async function? If so change it to a regular sync function with just “def” and it should work

    • @willyle23
      @willyle23 2 года назад

      @@JohnWatsonRooney Thanks for your reply, No I did not. What I did was create a scraper class then created a function called load_rls which takes in all of the pages available on the given URL ( by looping through r.html and calling a helper function scape_other_pages to scrape data from all of the pages and add to a list.) . Both of the function uses HTMLsession class and the r.html.render(). This was able to run through an entire javascript page and return a response in the python terminal, however when I try to post it on FastAPI, it gets an error. I did what you suggested and I am getting a 404 not found error. Thank you for the help!
      edit: I found out why it was giving a 404 error, the parameter cat I takes in converts / into %2f,. I changed my function to just take in 1 parameter and when running it, I get runetimeError: There is no current event loop in thread "AnyIO worker thread"

  • @valuetraveler2026
    @valuetraveler2026 Год назад +1

    What do you think about cacheing these FastAPI requests? I am using render() within the scraper function so it is slow. How would you handle dynamic data which changes frequently?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад

      Since making this video I’ve found that separating it out is much better - having a separate script for scraping and adding to the database freeing up the api to serve the data itself. If you want to have a more scrape on request service you really need to think about using something like celery to manage the tasks and return the data when the scrape is complete

  • @NotBeHaris
    @NotBeHaris 3 года назад

    I have question not related to this video sir how to scrape web which are build with vue or react means with javascript because there behavior is different from simple html sites.

  • @paulmiller591
    @paulmiller591 3 года назад +1

    Cool stuff John

  • @kannadastocktrader3369
    @kannadastocktrader3369 Год назад +1

    Can I webscrap Amazon website for the product details and price??

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад

      yes, i have videos on my channel to show you how

    • @kannadastocktrader3369
      @kannadastocktrader3369 Год назад

      By creating api

    • @kannadastocktrader3369
      @kannadastocktrader3369 Год назад

      I have more than 1000 pages to be scrapped, it is taking 2 hours for me to scrape , but I wanna automate this and also reduce the time

  • @artof-war
    @artof-war Год назад

    How can you scrape more than one tag ?

  • @ferilukmansyah3037
    @ferilukmansyah3037 3 года назад +1

    great tutor thanks john

  • @forbidden_lion
    @forbidden_lion 3 года назад

    Been trying to upload this thing on Heroku for the past 3 hours, no solution! Can you please make a video on deploying FastAPI Scrapper on Heroku?

  • @DM-py7pj
    @DM-py7pj 3 года назад

    Have you done a video on setting up a virtual environment in python?

  • @ahmd09
    @ahmd09 3 года назад +1

    I just want to know how you got JSON formatted in the browser in such a nice way.
    I hate unformatted JSON data

    • @Klausi-uq4xq
      @Klausi-uq4xq 3 года назад +2

      in firefox it is built in.. for chrome you have to install an extension like JSON Viewer

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      I think I use “json formatter” chrome plugin

    • @ahmd09
      @ahmd09 3 года назад

      @@JohnWatsonRooney thanks

    • @ahmd09
      @ahmd09 3 года назад

      @@Klausi-uq4xq thank you

  • @MuhammadAbdullah-fy6sg
    @MuhammadAbdullah-fy6sg 3 года назад

    man you need to start streaming. Honestly there very few programming streamers on twitch.

  • @paul_devos
    @paul_devos Год назад +2

    I've been scraping websites for 10+ years as a hobby data scientist. bs4 & requests. Rare occasion if I have to, use Selenium.
    You have the most interesting content I've seen in some time on RUclips or the web in general. Using tools I hadn't even heard of. requests_html, selectolax, httpx, etc. I had used Scrapy before (found it too bloated for my needs).
    That said, regarding FastAPI + web scraping -- I hadn't ever really considered FastAPI as I mostly just make "packages" (e.g. Reddit API wrapper like praw) for any websites I scrape regularly.
    I'm guessing the added advantage of this is you can sell/expose the API instead of the package... otherwise, is there any inherent advantage to using FastAPI versus just creating the aforementioned package wrapper?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад +1

      It's very use case dependent. Things like this can be useful integrating into applications, and having a simple API to access your data or run scraping jobs is quite useful but you are right, most are just standalone scripts/projects that run and have a specific task.

  • @hassanrahmani4764
    @hassanrahmani4764 3 года назад +2

    I need to become expert in web scrapping what should i do😁

    • @-__--__aaaa
      @-__--__aaaa 3 года назад +2

      maybe not to use bs4 to scrap everything & also u should try using regex btw

    • @hassanrahmani4764
      @hassanrahmani4764 3 года назад +2

      @@-__--__aaaa hmmmm okay i ll see

    • @thekarthik
      @thekarthik 3 года назад

      @@hassanrahmani4764 honestly, have a look at this channel. It's all web scraping and he explains in an easy to understand manner

    • @mariabazueva7204
      @mariabazueva7204 3 года назад

      @@thekarthik do you know where he works now?

  • @rameshks5281
    @rameshks5281 3 года назад

    Hi sir, can you create one video about import multiple URLs present in external files like excel and import those URLs individually and loop through and scrape and append the scrape data to the same file .

  • @alfakih7247
    @alfakih7247 Год назад

    How to update negative reviews on Google

  • @waliullah3321
    @waliullah3321 Год назад

    please create an api for Turnitin (plagiarism checker) using fast api or flask or Django

  • @theinstigatorr
    @theinstigatorr 3 года назад

    I have to confess I don’t really know what an API is still, particularly in the context you’re using it. I thought an API was a way for people hosting data to provide outsiders access to data without being able to access everything carte blanche. To me this just looks like you’re creating a scraper project. What am I missing?

  • @DanishKhan-ob3xh
    @DanishKhan-ob3xh 3 года назад

    how to remove watermarks using python fastapi

  • @tpag20
    @tpag20 2 года назад

    I wish learn Python with FastAPI with Graphql with Strawberry.

  • @randyallen8610
    @randyallen8610 Год назад

    how can i contact you

  • @tpag20
    @tpag20 2 года назад

    awesome!

  • @faldofajri6796
    @faldofajri6796 4 месяца назад

    MATURSUWUN SANGET MISTER