Scraping Dynamic JavaScript Websites - Beautiful Soup Python

Oxylabs

Просмотров 67 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 22 дек 2024

Комментарии • 80

@oxylabs 2 года назад ⁺¹
Thank you for watching! We hope you find this video helpful! Please leave a comment if you have any questions. If you are interested in web scraping and tutorial videos, subscribe to our RUclips channel: ruclips.net/user/Oxylabs
@abhijitboda 3 года назад ⁺¹⁶
This is gold guideline. Literally covered most of the cases
@oxylabs 3 года назад
We're very delighted to read this!
@DigitalAlligator Год назад ⁺²
This is right on spot, most other videos are not even close to mention all
@roystonfurtado 10 месяцев назад
Wow. Great Video! I was looking for a video that highlights realistic and efficient web scraping and this is it. Thanks.
@mehdiahmed7836 3 года назад ⁺⁸
short detailed very informative, that's how a good tutorial is made
Thanks
@oxylabs 3 года назад
Thank you so much!
@carlostoledoFLA 8 месяцев назад ⁺¹
Hi, Thank's for this video! For me, in a dynamic sites, using selenium for a get page source, don't work ! Still responding in javascript tag's. The path of the server request and response is: browser request -> server response -> javascript response -> api response -> browser ? Thank's
@oxylabs 8 месяцев назад
Thank you for the feedback. It’s difficult to say without seeing the code, but have you tried using different web drivers? The situation might change if you add some actions to the page. Also, it might be handy for you to check the code in textual form: oxylabs.io/blog/dynamic-web-scraping-python.
@dakooki Год назад
is this possible with website that requires a user input from the user for example adding a quantity or selecting a shipping service ?
@toshirv341 Год назад
Thank you for the informative tutorial! I will probably try web scrapping over the next month, so I'll comment here again if I have any problem!
@oxylabs Год назад
Thanks for watching, and definitely reach out if you need any help!
@mantasda1904 Год назад
why are you not using requests-html library? Seems to achieve the same in a simpler way
@oxylabs Год назад
Good point, thanks for the feedback!
@voqz6667 9 месяцев назад
it's dead
@kihemboallan2645 3 месяца назад
can i use jupyter notebooks for what you just did?
@oxylabs 2 месяца назад
Hello, yes, you can. However, it's mostly useful for practice or small-scale tasks. You can learn more here: oxylabs.io/blog/what-is-jupyter-notebook
@shreyasoni1302 2 года назад
How to get the data when the tag source is not None instead there is a file mentioned
@oxylabs 2 года назад
Hello! Thank you for asking :)
If there is a src= attribute, then you need to get the file content by doing additional request to the url defined in src= attribute.
source = soup.find("script")
link_to_file = source["src"]
@ureshkayastha Год назад
Can you sent the new version selenium 4 video for dynamic web scraping
@oxylabs Год назад
We'll keep that in mind for our future videos :)
@gianni_ari 6 месяцев назад
hi! I was not able to install the chrome driver, do you have any suggestion?
@oxylabs 6 месяцев назад
Hi! Could you specify why you couldn't install the driver? Was there an error message of any sorts?
@Ned478 2 года назад
6:14 line 10.
Is that pathing to the folder with chromedriver or the chromedriver.exe? Either way mine wont work
@oxylabs 2 года назад
Hey. That's a path to a chromedriver executable. Your question is hard to answer since we don't know what kind of error you are getting. If it's a File Not Found error, you would need to make sure that the path leading to the chromedriver is correct - try to use a full path instead of a relative one if it is failing for you. Also, if you're running on Windows OS, the chromedriver should have an .exe at the end. Hope this helps!
@vladimirantonov4506 Год назад
Hello!
And why do all parsers analyze the same site? Interesting different approaches...
Thanks for the interesting example!
@masterbe591 Год назад
Thanks for this video. Never thought about to use F12 and Network-Tab to find the source of websites data. greetings
@legit_nyel Год назад
This method is impossible if the script have src especially reCaptcha right?
@oxylabs Год назад
The presented method works on any and all types of web pages, both static and dynamic. As a page containing the reCaptcha is a type of a dynamic page, you can read, extract and manipulate data that is present on the page. :) Hope this helps you!
@angellaz3869 2 года назад
😭😭😭 Idk how to say thank you.. I've been searching for a help for this ajax stuff. this is the one I can say made my day
@oxylabs 2 года назад
That's so sweet to hear! Glad you enjoyed it!
@ismaelperezmesa524 Год назад
thanks a lot for all these content you constantly share. I would like to ask you something: this tutorial example works if I want to deploy it on the web as an api for consuming it after? thank you so much
@oxylabs Год назад
Hey, we're glad you like our content! As for the tutorial, it's focused on showing how to build your own scraper. In case you want an easy way out, try Oxylabs Scraper APIs for free: oxy.yt/2iM
@filippodiconno9923 2 года назад
9:15, what does line 10 do
@oxylabs 2 года назад
Hey! It defines a regular expression to match certain combinations of characters within a document. This one specifically is looking for whatever text thats between var data = and a new line.
@КатеринаВоровская-п6р 2 года назад
Thanks for the video. Could you please explain where you took value 'h3 >a' for select at the end of the video?
@oxylabs 2 года назад
Hello. We’re glad you enjoyed it!
The h3 > a syntax just tells Beautiful Soup to get a tags that are directly beneath h3.
You can try going to the link displayed in the video (librivox.org/search/?q=time%20machine&search_form=advanced), right-click on any book title and select “inspect”. This should open the exact place where you can see that a is bellow h3. Have fun!
@python689 2 года назад
Hello, help me please, how to get the text out "Wilson Tour Premier All Court 4B"
soup = BeautifulSoup(html, 'lxml')
title = soup.find('h1', class_='product--title')
Tennis balls Wilson Tour Premier All Court 4B
@oxylabs 2 года назад
Hey, thanks for asking!
There are a couple of ways (without modifying your find function)
1.Retrieve a list of tag's children and select last one on the list. Strip the white space afterwards:
title.contents[-1].strip()
2. Retrieve the whole text of a title, split it by a double space and select last string on the list:
title.text.split(" ")[-1]
@drewgatch4488 Год назад
Excellent video. Quick question - when I click Ctrl+U on the website, my source page looks different. I don't have anywhere and I have separating each section. Does this matter or was "script" just used to locate the data needed?
@oxylabs Год назад ⁺²
Hello! If you're unable to locate in the source page of the website, do make sure that you visited "quotes.toscrape.com/js" (adding the `/js` at the end of the URL).
This was important to ensure that you can follow the script parsing portion of the guide, as we then extracted information found in the tag.
You can see the differences when comparing the source codes of "quotes.toscrape.com/js/" vs "quotes.toscrape.com"
@vbernard1 2 года назад
Hi, it was very good. Thanks. But I'm facing a problem: at the line 13, it is reported to me that "NameError: name 'data' is not defined". Any idea how to fix it?
@oxylabs 2 года назад
Hello! It seems that you’re trying to access a variable called data but it doesn't exist. Please double check the names for the variables you have defined. Also, there are couple of more scenarios when this error might get triggered - this article summarizes that quite well:
www.geeksforgeeks.org/handling-nameerror-exception-in-python/
@ivandechiara3415 2 года назад
THANKS ! You saved my life! :)
@oxylabs 2 года назад
Happy we could help!
@programmingpriest4862 2 года назад ⁺¹
Thank you so much. much needed
@oxylabs 2 года назад
We glad it helped!
@mihailazarescu1297 Год назад
Great video! Thanks!
Hi, could you demonstrate how to asynchronously request pages that require JavaScript rendering?
@oxylabs Год назад ⁺¹
That's a great idea for a future video, we'll keep that in mind, thanks 😊
@jjaemc7389 2 года назад
i'm soooooo appreciative of you😄
@oxylabs 2 года назад
Thanks for your feedback. It's much appreciated!
@mhrasoulian 3 года назад ⁺¹
It was very useful. Thank you!
@oxylabs 3 года назад
We're very glad to you liked it!
@tarunbhardwaj121 2 года назад
This video helped me a lot.
@oxylabs 2 года назад ⁺¹
Thank you!
@anjifeldspar8804 2 года назад
Thanks for making this
@capunzel5859 Год назад
Love the explanation, but also loved the music. Can you share the track id?
@oxylabs Год назад
Hey! So happy you enjoyed it :) The track is this one: Purple Planet Music - Corporate Planning
@Johnbrown-op5xt 2 года назад
Thank you so much. Great useful info.
@oxylabs 2 года назад
Glad you liked it!
@kumarsunil3219 2 года назад
Hello. This is video seems very interesting and helpful but I need some more assistance if you can.
@oxylabs 2 года назад
Hey! how can we help you?
@maikwurl1484 6 месяцев назад
Thank You
@KhalilYasser 3 года назад
Awesome. Thank you very much.
@oxylabs 3 года назад
Thank YOU for the support!
@muneebrehman7842 3 года назад
You saved my day
@oxylabs 3 года назад
We're very happy to hear!
@lorsco4107 3 года назад
Awesome!
@oxylabs 3 года назад
Glad to hear!
@nwizugbesamson6718 2 года назад
This is so perfect
@oxylabs 2 года назад
Thank you!
@kamaleshpramanik7645 2 года назад
Thank you very much Madam ...
@oxylabs 2 года назад
You're welcome!
@0805-k4o 2 года назад ⁺¹
now doesn't work (crying~~~~~~~~)
@oxylabs 2 года назад
Hey. Could you specify where exactly it's not working for you? Maybe we can help!
@benasvalancius Год назад
dekoju, naudinga!
@Grizzler231 4 месяца назад
The way she says html
@PresleyMcquade 2 года назад
Awesome!
@oxylabs 2 года назад
Thank you!

Следующие

Автовоспроизведение