Hehe thanks for dropping by Luke!! 🙌🏽💜 Selenium fan here! 👋 I'm figuring stuff out and experimenting with ideas as I go but I'm really enjoying this project!
As someone who is looking to go into the data field, this was incredibly eye opening on the range of different things it can be applied to. Very grateful for this video, thanks :)
Ah yeah i love the Witcher books! And showing how solve real problems in your process is great since i learn better with visuals this is a great series!
I've been using Selenium for web automation, never thought to use it for web-scraping. Thanks a lot for that idea 👋 Also, looking forward to the rest of the serie.
I follow you since a while, and I really like your priceless tips. With this project you officially became my guru!!! 🙌 Can't wait for your next video, and can't wait for the 3rd season!!! 😊
Thanks for this! I have learned to use beautifulsoup in the python for everybody specialization in Coursera, and this video is similar to the final project I made but with selenium (in my project I scrapped song's lyrics). What a great oportunity to learn a new library and make an amazingly fun project in the way. Thanks for your amazing content!
Ohh that was nice! 🙌🏽🙌🏽 It's always good to know that the code is actually reproducible 😂. Yess, more is definitely coming! I also try to keep a balance between project vids and non-project ones, though. Thank you for watching and following along!
@@Thuvu5 It's definitely reproducible, I made it to the end of the video with the same results. Then cloned my own repo so I could experiment with the data set. Thanks for sharing this information!
gpt4 with vision and browsing is really changing the webscraping game. It's not going to do well with high volume tasks, at least not cheaply, but it will make webscraping much easier in general
hi ! thank you this video help me a lot with a litle task in my job !! I was confused a litle bit with the xpath but I solved with right click in the element and copy the xpath directly
Em chào chị, thấy chị đang làm RUclips channel liên quan đến đúng nghề em cũng đang theo nên em máu quá. Hiện em cũng đang làm và maintain project liên quan đến web crawling. Tất nhiên là sẽ phức tạp hơn nhiều so với những gì cơ bản thông thường. Do một số web liên quan đến đặc thù business nên sẽ có cơ chế chống crawl bằng cách dùng dịch vụ của bên thứ 3, ví dụ như Cloudfare chẳng hạn. Lúc này sẽ cần dùng đến những driver chuyên dụng như undetected-chromedriver. Mong chị ra thêm nhiều video hay nữa để mọi người học hỏi ạ 😁
Very good content, congratulations for the videos and for the didactic. It's a lot of fun to study and follow your content, it made me enjoy using Python again
Wow this is brilliant. great explanation of the differences between those three libraries. i think you should link the 2nd part of the series in the end cards instead of those two other videos about stats etc also f this title doesn't work well maybe you could try "Selenium vs Scrapy vs BS4, what do i use in a REAL project" or something like that. i feel like that represents the content of the video very well and could be clickable for many
Hey this reminds me of something we did a few years ago, with Instagram... ;') Btw, I think there's a better approach rather than using Xpath. It's using CSS selectors. They are a bit faster ,especially once you use a lot of queries in one page. Firefox never gives CSS selectors as an option, but Chrome does do it! Helped me quite a bit in writing other (headless) selenium applications ! Also maybe a note on session storing using Selenium / crawling in your next video? Keep up the good work !
Omg Tjard!!!! Thanks so much for watching my video and commenting. Yesss, miss our Instagram bot 😂. And good to know about the CSS selector! hope you’re doing well 🤗
My first comment was deleted, possibly because it had links that I used for troubleshooting. A couple of improvements that I discovered after finishing your video: 1. I converted the deprecated find_elements_* functions to just find_elements. This removed the warning message. 2. For the driver, I found it went faster if I called a headless chrome. I set this with Chrome options. This also seems to make the cookie popup not appear for some reason. 3. Because I'm running WSL2, I had better performance by installing the Chrome driver to a folder within my project folder and running that as a service. I'll link my alterations in a follow up comment in case RUclips is auto-filtering comments with links.
Oh thank you so much for the improvement points! I’ll probably update some part of the code as you suggested, otherwise please feel free to send me a pull request! 🙌😀
Thanks for the video I'm having issues in this project, it keeps giving me AttributeError: module 'selenium' has no attribute 'Chrome' Please what can I do? Your input will be highly appreciated
Thank you for a great video, looking forward to see others. I'd like to ask you a question, when you searched an element by xpath you used (By.XPATH, ...), not a method find_elements_by xpath, is it to show both methods available or there is a reason for this (e.g., faster, easier to read)?
any advice with crawling through a website with a lot of "a href" elements, especially when they are child elements. using selenium it seems to struggle. is selenium even the right tool.
How to write code to scrap information from websites on first 3 pages of google search engine. I mostly see solution for single URL but to extract information. I need something comprehensive. Can I give website as argument?
I tried: from selenium import dancemoves but sadly i got a traceback error "ImportError: cannot import name 'dancemoves' from 'selenium'" i guess I need to update Selenium....?
Hi Thu, I have one question - the Wiki Terms and Conditions page forbids any kind of scrape of its content. Isn't this a little, let's say not legal? Thanks
I would like to ask for your advice. I have a use case here where the websites consists of thousands of pdf files which I need to download it one by one. Does the web scraping possible to downliad all of the pdf files all at once?
Hey, I’m not sure about it as I’ve never downloaded pdf with this method before, but I think it should be possible to use selenium to click on the download button and download things. I guess it can be done almost all at once if you make a very small waiting time in between the downloads
to all those who follow along and got the 'time out' error like me, try search "Stop infinite page load in selenium webdriver python" in stack overflow. The bot in me cannot extract characters more than 4 books. I can't sleep for a night to find source. >,
Looking forward to this series, Thu! 🙌🏼 Also, love me some Selenium for web scraping!
Hehe thanks for dropping by Luke!! 🙌🏽💜 Selenium fan here! 👋 I'm figuring stuff out and experimenting with ideas as I go but I'm really enjoying this project!
Thanks. I have created many web scrapers. I can create other scrapers for free if needed.@@Thuvu5
As someone who is looking to go into the data field, this was incredibly eye opening on the range of different things it can be applied to. Very grateful for this video, thanks :)
Aw thank you for this! 🤓🙌🏽
Thank you so much for this video! I've struggled with data scraping for my project, but with your tutorial, I managed to get the data I needed!
I'm so thrilled I found your channel - your content is amazing and inspiring!!! Thank you so much for sharing!
This confiemd some doubts I had about what types of framework/libraries I'm using. Thanks!
I was waiting for a full-length project with all the minor errors and the possibility of coding along like this.
Great content as always, Thu! Thanks
Hey Filipe, yeahh.. so glad I kept the promise haha. Thank you for watching 🤩
Ah yeah i love the Witcher books! And showing how solve real problems in your process is great since i learn better with visuals this is a great series!
I've been using Selenium for web automation, never thought to use it for web-scraping. Thanks a lot for that idea 👋
Also, looking forward to the rest of the serie.
Cool example! Looking for the next episode
Thanks for your consistently great videos. I can't wait to see the rest of the project!
Aw thank you for consistently checking out my videos, Cole! 😇👋
I follow you since a while, and I really like your priceless tips. With this project you officially became my guru!!! 🙌 Can't wait for your next video, and can't wait for the 3rd season!!! 😊
Hey Barbara, thanks so much for following my channel! 🤗 me too, can’t wait for the 3rd season 🤩
Love this series. Looking forward to the next one!
This channel is underrated 😭
Aw thank you John!! (I think so too 😂🙌)
It’s finally here!!!
Heck yeah, thank you for your patience 🙈
Excited for the complete project...!!! 🔥🔥🔥
Awesome video on Selenium! You do a very good job of explaining things step by step. Keep up the great content!
Thanks!
What a tutorial. You have gained a new subscriber!!
Thanks for sharing you knowleage Thu! Besides training my English, I learn more about data science. :)
Thanks for this! I have learned to use beautifulsoup in the python for everybody specialization in Coursera, and this video is similar to the final project I made but with selenium (in my project I scrapped song's lyrics). What a great oportunity to learn a new library and make an amazingly fun project in the way.
Thanks for your amazing content!
Hey Matias, that’s so cool! Seems like a very interesting project you did! Thanks for watching 🙌🎉
That was pretty fun following along, really starting to like python. I hope to see more like this in the future.
Ohh that was nice! 🙌🏽🙌🏽 It's always good to know that the code is actually reproducible 😂. Yess, more is definitely coming! I also try to keep a balance between project vids and non-project ones, though. Thank you for watching and following along!
@@Thuvu5 It's definitely reproducible, I made it to the end of the video with the same results. Then cloned my own repo so I could experiment with the data set. Thanks for sharing this information!
Incredible!!!very fun project !!! Congrats !!!
gpt4 with vision and browsing is really changing the webscraping game. It's not going to do well with high volume tasks, at least not cheaply, but it will make webscraping much easier in general
0:30 I’m sold here. 😊
( I’m beginner in DS field )
Thanks for that ! thanks to you I managed to use it for the lore of a game I love !
Great to hear Julien!! 🙌
Thank you for your lession about scrap by selenium.
You’re very welcome 😊
hi ! thank you this video help me a lot with a litle task in my job !! I was confused a litle bit with the xpath but I solved with right click in the element and copy the xpath directly
Yay 🙌🙌
Em chào chị, thấy chị đang làm RUclips channel liên quan đến đúng nghề em cũng đang theo nên em máu quá. Hiện em cũng đang làm và maintain project liên quan đến web crawling. Tất nhiên là sẽ phức tạp hơn nhiều so với những gì cơ bản thông thường. Do một số web liên quan đến đặc thù business nên sẽ có cơ chế chống crawl bằng cách dùng dịch vụ của bên thứ 3, ví dụ như Cloudfare chẳng hạn. Lúc này sẽ cần dùng đến những driver chuyên dụng như undetected-chromedriver. Mong chị ra thêm nhiều video hay nữa để mọi người học hỏi ạ 😁
OK em, cám ơn em đã chia sẻ nha 🤗
Very Impressive Initiative!
Incredible!!!very fun project !!! Thanks !!!
yes!! can't wait to see how it goes!
Please continue this series
Thank you! This was what I need!
So glad it helped, Tulkin 🙌🏽
Very useful and easy to understand
Aw so glad to hear, Anari! 🙌🏽
Thanks a lot. The video was very helpful.
Very good content, congratulations for the videos and for the didactic. It's a lot of fun to study and follow your content, it made me enjoy using Python again
Wow this is brilliant. great explanation of the differences between those three libraries. i think you should link the 2nd part of the series in the end cards instead of those two other videos about stats etc
also f this title doesn't work well maybe you could try "Selenium vs Scrapy vs BS4, what do i use in a REAL project" or something like that. i feel like that represents the content of the video very well and could be clickable for many
Wow thank you so much for these suggestions! 🙌 You’re absolutely right! This is so helpful, I’ll adjust the title and end screen 😁
Rest aside...She picked up The Witcher 🤩🤩..Queen, you dropped this 👑...
Yay, so nice to see another The Witcher fan here! 👋 I can't wait for the next season later this year! 🤩
Hey this reminds me of something we did a few years ago, with Instagram... ;')
Btw, I think there's a better approach rather than using Xpath. It's using CSS selectors. They are a bit faster ,especially once you use a lot of queries in one page. Firefox never gives CSS selectors as an option, but Chrome does do it! Helped me quite a bit in writing other (headless) selenium applications !
Also maybe a note on session storing using Selenium / crawling in your next video?
Keep up the good work !
Omg Tjard!!!! Thanks so much for watching my video and commenting. Yesss, miss our Instagram bot 😂. And good to know about the CSS selector! hope you’re doing well 🤗
@@Thuvu5
Hay quá chị ơi ❤
Chị cám ơn em ❤️
Thank you youtube algorithm for bringing me here
Yay that’s awesome, Vineeta! Thank you for watching 🤗
Awesome :)
part 2 please :D
Link to Part 2 is in the video description 🙂
@@Thuvu5 Thank you!!
Such a good video . I loved it 🙌🙌🙌🙌🙌
Thank you, you're awesome!
Aw thank you for watching! ☺️
Very Useful video thu,Thanks
Thanks so much TheMis Blog 🙌🏽. Your comments mean a lot 😀
@@Thuvu5 🥰🥰🥰stop it now
❤👌👌👌 awesome
Wow I admire you so much
Aw thank you! 🤗🙌
BTW find_element_by_class_name is deprecated. Should be find_element(By.CLASS_NAME, "class name"). Like you used for the xpath
Thank you!!!
My first comment was deleted, possibly because it had links that I used for troubleshooting. A couple of improvements that I discovered after finishing your video:
1. I converted the deprecated find_elements_* functions to just find_elements. This removed the warning message.
2. For the driver, I found it went faster if I called a headless chrome. I set this with Chrome options. This also seems to make the cookie popup not appear for some reason.
3. Because I'm running WSL2, I had better performance by installing the Chrome driver to a folder within my project folder and running that as a service.
I'll link my alterations in a follow up comment in case RUclips is auto-filtering comments with links.
Oh thank you so much for the improvement points! I’ll probably update some part of the code as you suggested, otherwise please feel free to send me a pull request! 🙌😀
And thank you for the video. I learned a lot going through this walkthrough. Looking forward to the next installment in the series!
@@piggjf I'm so glad! And great work with the pull request, thank you so much for improving the code! 🙌🏽
Great content, but please can you show the web driver manager error occurring due to chrome driver.
Nice job
Thanks Ahmed!
Thanks for the video
I'm having issues in this project, it keeps giving me
AttributeError: module 'selenium' has no attribute 'Chrome'
Please what can I do? Your input will be highly appreciated
Thank you for a great video, looking forward to see others. I'd like to ask you a question, when you searched an element by xpath you used (By.XPATH, ...), not a method find_elements_by xpath, is it to show both methods available or there is a reason for this (e.g., faster, easier to read)?
Like @Luke Barousse I'm looking forward this project. I have a similar project and this will really help kick mine off. Thank you very much :-)
Oh what a nice coincidence!! Thanks for watching Ken 🙌
any advice with crawling through a website with a lot of "a href" elements, especially when they are child elements. using selenium it seems to struggle. is selenium even the right tool.
How to write code to scrap information from websites on first 3 pages of google search engine. I mostly see solution for single URL but to extract information. I need something comprehensive. Can I give website as argument?
I have got 170 urls how can i extract text from each of them and do text analysis
I tried:
from selenium import dancemoves
but sadly i got a traceback error "ImportError: cannot import name 'dancemoves' from 'selenium'"
i guess I need to update Selenium....?
LOL yesss I think so! 😂 Or try restarting your computer, it always fixes weird glitches 🤣jk
@@Thuvu5 I'm never happy with my code until all the weird glitches work together to cancel each other out! 🥳
Hi Thu, I have one question - the Wiki Terms and Conditions page forbids any kind of scrape of its content. Isn't this a little, let's say not legal? Thanks
And the 100th Like goes to me..!!!!
Yaaaay, thanks for the nice number! 😀🙌🏽
I would like to ask for your advice. I have a use case here where the websites consists of thousands of pdf files which I need to download it one by one. Does the web scraping possible to downliad all of the pdf files all at once?
Hey, I’m not sure about it as I’ve never downloaded pdf with this method before, but I think it should be possible to use selenium to click on the download button and download things. I guess it can be done almost all at once if you make a very small waiting time in between the downloads
The website I am scraping from gives me captcha to check whether I am human or not. Is there any way to avoid this??
Hi Thu, how can i support you besides making purchases on your affiliate links. Let us know
Hey Zain, thank you! I really appreciate it! Currently I don't have any other ways you could support me :(, but thank you for offering that!
❤💞❤
Why selenium for a site which that data is not hidden by JS ?
I see it unnecesary and heavy processing (Being python already slower than java)
I can not for the life of me get past this error: "WebDriverException: Message: chrome not reachable" whenever I try and run driver.get(page_url)
ok
to all those who follow along and got the 'time out' error like me, try search "Stop infinite page load in selenium webdriver python" in stack overflow. The bot in me cannot extract characters more than 4 books. I can't sleep for a night to find source. >,
Oh I'm so sorry to hear this 😅. Thank you for sharing this tip!! 🙌🏽
playwright >
You're so rush
Everyone is not a nerd, kinda hard to follow. Watched it in a slower pace. Make sure sequel is for non nerds.
You are my crush 🥰
she may be your crush but after this video she is my god 🤣
LOL that's funny ;P
SIMP DETECTED!
@@Thuvu5 how it's funny?
@@Thuvu5 i have nothing to do with data science still idk why i watch your videos
I have a huge crush on you crush 🥰 😍☺️
Why it returns an empty list in book_categories when I typed "book_categories = driver.find_elements(By.CLASS_NAME, 'category-page_member-link')"
driver.get(pageURL) is showing error. can you help me.
same here