I love the content that you are sharing . your teaching method is awesome! kindly make more videos on hosting scraper on digital ocean like selenium script , scrapy script Thanks!
@@karthikb.s.k.4486 welcome am.thinking to make a web scrapoing and programming channel should i make it like will it grow? Freelancing related stuff and programmming
Nicely explained mate. Unable to use fastapi along with scrapy. Throws internal error everytime and on command line it shows twisted reactor already running. Any fix?
i have watched this video, its a fantastic video i got all i wanted except one that how to deploy it online so that i can use api for my android app projects. Please make a video on it as this is really gonna help me and i would be very thankfull to you.
Hello John, I have a question. I was able to create everything like you have shown but I have used render() for a JS website. When I go to host it on fastAPI, it is telling me to use the AsyncHTMLsession instead. Have you ran into this problem?
@@JohnWatsonRooney Thanks for your reply, No I did not. What I did was create a scraper class then created a function called load_rls which takes in all of the pages available on the given URL ( by looping through r.html and calling a helper function scape_other_pages to scrape data from all of the pages and add to a list.) . Both of the function uses HTMLsession class and the r.html.render(). This was able to run through an entire javascript page and return a response in the python terminal, however when I try to post it on FastAPI, it gets an error. I did what you suggested and I am getting a 404 not found error. Thank you for the help! edit: I found out why it was giving a 404 error, the parameter cat I takes in converts / into %2f,. I changed my function to just take in 1 parameter and when running it, I get runetimeError: There is no current event loop in thread "AnyIO worker thread"
What do you think about cacheing these FastAPI requests? I am using render() within the scraper function so it is slow. How would you handle dynamic data which changes frequently?
Since making this video I’ve found that separating it out is much better - having a separate script for scraping and adding to the database freeing up the api to serve the data itself. If you want to have a more scrape on request service you really need to think about using something like celery to manage the tasks and return the data when the scrape is complete
I have question not related to this video sir how to scrape web which are build with vue or react means with javascript because there behavior is different from simple html sites.
I've been scraping websites for 10+ years as a hobby data scientist. bs4 & requests. Rare occasion if I have to, use Selenium. You have the most interesting content I've seen in some time on RUclips or the web in general. Using tools I hadn't even heard of. requests_html, selectolax, httpx, etc. I had used Scrapy before (found it too bloated for my needs). That said, regarding FastAPI + web scraping -- I hadn't ever really considered FastAPI as I mostly just make "packages" (e.g. Reddit API wrapper like praw) for any websites I scrape regularly. I'm guessing the added advantage of this is you can sell/expose the API instead of the package... otherwise, is there any inherent advantage to using FastAPI versus just creating the aforementioned package wrapper?
It's very use case dependent. Things like this can be useful integrating into applications, and having a simple API to access your data or run scraping jobs is quite useful but you are right, most are just standalone scripts/projects that run and have a specific task.
Hi sir, can you create one video about import multiple URLs present in external files like excel and import those URLs individually and loop through and scrape and append the scrape data to the same file .
I have to confess I don’t really know what an API is still, particularly in the context you’re using it. I thought an API was a way for people hosting data to provide outsiders access to data without being able to access everything carte blanche. To me this just looks like you’re creating a scraper project. What am I missing?
Just came across this channel and am so surprised that you don't have at least 70k subscribers. Such quality content! keep at it.
Thanks, will get there some day I hope!
I was literally just thinking about doing this. Are you some how web scraping my brain?!?
Haha, that would be quite a skill to have!
It's damn truth
Just blown away at how simple this is.
Another great and super useful tutorial! Thank you and keep up the good work John.
Thanks mate!
It was smooth. Thanks.
Great intro tutorial to FastAPI with scraping! Thanks!
Great tutorial, thankyou. I was just thinking about doing this haha. Will this work with multiple endpoints too?
Thanks! yes it would, we you just need to code them in
Happy new year John.
Happy new year!
I love the content that you are sharing .
your teaching method is awesome!
kindly make more videos on hosting scraper on digital ocean like selenium script , scrapy script
Thanks!
Nice tutorial. What is the command prompt that you opened in windows OS for running fas api. Please let me know.
Its is wsl sub system for linux u can get it from windows store on win 10😊
@@hassanrahmani4764 Thank you 😊
@@karthikb.s.k.4486 welcome am.thinking to make a web scrapoing and programming channel should i make it like will it grow? Freelancing related stuff and programmming
@@hassanrahmani4764 sorry I am not sure on this
Nicely explained mate. Unable to use fastapi along with scrapy. Throws internal error everytime and on command line it shows twisted reactor already running. Any fix?
I’m that case I think you’d need them separate - use scrapy to save to a database then fastapi to display the information
@@JohnWatsonRooney So no way to run scrapy scrapers directly like request html ones using fastapi?
dope. I was meaning to just use bs4 and flask for my old webscraping code but decided on fastapi because of this video
Thank you sir you just provided that what we wanted.Absolutely exquisite
Great tutorial. I'm gonna need this.
Thanks for sharing this !
Really like your stuff and it makes my life easier. 👍
Thank you John for sharing your knowledge!
Sure thing!
Thanks for sharing 😀
i have watched this video, its a fantastic video i got all i wanted except one that how to deploy it online so that i can use api for my android app projects. Please make a video on it as this is really gonna help me and i would be very thankfull to you.
Hello John, I have a question. I was able to create everything like you have shown but I have used render() for a JS website. When I go to host it on fastAPI, it is telling me to use the AsyncHTMLsession instead. Have you ran into this problem?
Hey! Did you put the render part inside an async function? If so change it to a regular sync function with just “def” and it should work
@@JohnWatsonRooney Thanks for your reply, No I did not. What I did was create a scraper class then created a function called load_rls which takes in all of the pages available on the given URL ( by looping through r.html and calling a helper function scape_other_pages to scrape data from all of the pages and add to a list.) . Both of the function uses HTMLsession class and the r.html.render(). This was able to run through an entire javascript page and return a response in the python terminal, however when I try to post it on FastAPI, it gets an error. I did what you suggested and I am getting a 404 not found error. Thank you for the help!
edit: I found out why it was giving a 404 error, the parameter cat I takes in converts / into %2f,. I changed my function to just take in 1 parameter and when running it, I get runetimeError: There is no current event loop in thread "AnyIO worker thread"
What do you think about cacheing these FastAPI requests? I am using render() within the scraper function so it is slow. How would you handle dynamic data which changes frequently?
Since making this video I’ve found that separating it out is much better - having a separate script for scraping and adding to the database freeing up the api to serve the data itself. If you want to have a more scrape on request service you really need to think about using something like celery to manage the tasks and return the data when the scrape is complete
I have question not related to this video sir how to scrape web which are build with vue or react means with javascript because there behavior is different from simple html sites.
Cool stuff John
Thanks Paul
Can I webscrap Amazon website for the product details and price??
yes, i have videos on my channel to show you how
By creating api
I have more than 1000 pages to be scrapped, it is taking 2 hours for me to scrape , but I wanna automate this and also reduce the time
How can you scrape more than one tag ?
great tutor thanks john
Been trying to upload this thing on Heroku for the past 3 hours, no solution! Can you please make a video on deploying FastAPI Scrapper on Heroku?
Have you done a video on setting up a virtual environment in python?
I just want to know how you got JSON formatted in the browser in such a nice way.
I hate unformatted JSON data
in firefox it is built in.. for chrome you have to install an extension like JSON Viewer
I think I use “json formatter” chrome plugin
@@JohnWatsonRooney thanks
@@Klausi-uq4xq thank you
man you need to start streaming. Honestly there very few programming streamers on twitch.
I've been scraping websites for 10+ years as a hobby data scientist. bs4 & requests. Rare occasion if I have to, use Selenium.
You have the most interesting content I've seen in some time on RUclips or the web in general. Using tools I hadn't even heard of. requests_html, selectolax, httpx, etc. I had used Scrapy before (found it too bloated for my needs).
That said, regarding FastAPI + web scraping -- I hadn't ever really considered FastAPI as I mostly just make "packages" (e.g. Reddit API wrapper like praw) for any websites I scrape regularly.
I'm guessing the added advantage of this is you can sell/expose the API instead of the package... otherwise, is there any inherent advantage to using FastAPI versus just creating the aforementioned package wrapper?
It's very use case dependent. Things like this can be useful integrating into applications, and having a simple API to access your data or run scraping jobs is quite useful but you are right, most are just standalone scripts/projects that run and have a specific task.
I need to become expert in web scrapping what should i do😁
maybe not to use bs4 to scrap everything & also u should try using regex btw
@@-__--__aaaa hmmmm okay i ll see
@@hassanrahmani4764 honestly, have a look at this channel. It's all web scraping and he explains in an easy to understand manner
@@thekarthik do you know where he works now?
Hi sir, can you create one video about import multiple URLs present in external files like excel and import those URLs individually and loop through and scrape and append the scrape data to the same file .
How to update negative reviews on Google
please create an api for Turnitin (plagiarism checker) using fast api or flask or Django
I have to confess I don’t really know what an API is still, particularly in the context you’re using it. I thought an API was a way for people hosting data to provide outsiders access to data without being able to access everything carte blanche. To me this just looks like you’re creating a scraper project. What am I missing?
how to remove watermarks using python fastapi
I wish learn Python with FastAPI with Graphql with Strawberry.
how can i contact you
awesome!
MATURSUWUN SANGET MISTER