Scraping with Playwright 101 - Easy Mode

Поделиться
HTML-код
  • Опубликовано: 27 сен 2024

Комментарии • 32

  • @alexanderkomanov4151
    @alexanderkomanov4151 6 месяцев назад +2

    Great one!
    I think that using pytest-playwright package can save several lines of code in the initialization part, because you can just use the page:Page fixture

  • @NomadicDmitry
    @NomadicDmitry Месяц назад +1

    Really great tutorial! Thanks, John!

  • @robertramirez2167
    @robertramirez2167 6 месяцев назад +3

    I like that image blocking tip!

  • @bgriffin5447
    @bgriffin5447 2 месяца назад +1

    That split move was nice

  • @Extrey
    @Extrey 6 месяцев назад +1

    Nooooo waaaay, i just found schema on another websites, nice trick anyway, but i find it more efficient to read the info from the category pages. Thanks for your videos, they always inspire me!!!

  • @donaldandmijung
    @donaldandmijung Месяц назад

    really well explained! is there a way to run the loop in the original browser? say if were only interested in the first page of the pagination and the products on only page 1.

  • @user-wu4ip7mp3z
    @user-wu4ip7mp3z 4 месяца назад +1

    I'm following this exact code in VSCode and only the initial web is opened, it doesn't open the subsequent pages that direct to each of the product, no idea how to fix this...

    • @user-wu4ip7mp3z
      @user-wu4ip7mp3z 4 месяца назад

      nvm, fixed it, turns out the data-selenium=...GridView... has been changed to [data-selenium='miniProductPageProductNameLink']

  • @IshaqKhan010
    @IshaqKhan010 5 месяцев назад +1

    sir can you make a video how to deploy playwright script on google cloud function / vpc please

  • @elu1
    @elu1 6 месяцев назад

    Thank you John for the teaching. I seem to have issue with Xvfb for running 'headless'. Any suggestion or resources that I can learn from?

  • @fredde7356
    @fredde7356 6 месяцев назад

    Hey John, can you please continue the scraping livestream with your test site? 😃
    Would love to see how to handle the drop-down menus, Java script and how to handle stricter cloudflare rules
    Would be happy to hear about some news! Enjoy easter :)

    • @munchcup
      @munchcup 5 месяцев назад

      On cloudflare One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.

  • @carloiurcovici
    @carloiurcovici 6 месяцев назад +1

    Thank you John, I've been really enjoying your videos recently and applying everything at work where it comes in really handy. Would you consider creating a python/scraping course on Udemy or a similar platform?

    • @JohnWatsonRooney
      @JohnWatsonRooney  6 месяцев назад

      thanks for watching. I have thought about creating a course but no serious plans yet i;m afraid

    • @carloiurcovici
      @carloiurcovici 6 месяцев назад

      @@JohnWatsonRooney thanks for the reply, if you change your mind you got my money 😂

  • @bigoper
    @bigoper 3 месяца назад +1

    This is awesome!!
    As an API Security Specialist, I always start by looking at the HTTP calls, searching for an API call that might have that same info. Saving me time from scraping the page. Most of the time I’m having success with that approach, especially when dealing with solid companies/websites/platforms.

  • @badrenanna3961
    @badrenanna3961 6 месяцев назад +3

    can you please start talking about some difficult cases :
    - scraping a website that has cloudflare protection against bots (even using proxy rotation it didn't work)
    - scraping website that have captchas protection
    ..
    Thank you

    • @munchcup
      @munchcup 5 месяцев назад +2

      One idea is usually using undetected chrome driver to avoid cloudflare and you can put delay while logging in to solve the captchas the first time and save the cookies. After that you no longer need to solve captchas it will be automatic.

  • @archiee1337
    @archiee1337 3 месяца назад

    why not headless?

  • @s6yx
    @s6yx 5 месяцев назад

    Can’t you just do viewpoint for setting a screen size and header and run it headless with no issue

  • @danueecitizen
    @danueecitizen 5 месяцев назад

    can this work with amazon ? 🤔

  • @pkavenger9990
    @pkavenger9990 2 месяца назад +1

    Your content is good but i think you should engage with your audience more instead of speaking like you are talking to yourself. You will see that you will get much more views. Take Gotham chess channel for example he is not a Grandmaster of chess but His channels have more views and subscriber than Hikaru and Magnus because of his communication skills.

  • @mohsinhassan88
    @mohsinhassan88 6 месяцев назад +3

    Omg why the white editor??

    • @РНТ
      @РНТ 6 месяцев назад

      Exactly. When I saw it I immediately remembered this video: ruclips.net/video/XlgqZeeoOtI/видео.html 😂

    • @tendosingh5682
      @tendosingh5682 6 месяцев назад +1

      For some its easier on the eyes. MY eyes cant stand the dark themes.

    • @mohsinhassan88
      @mohsinhassan88 6 месяцев назад

      @@РНТ exactly how I felt. And specially since John usually has amazing videos and everything is so perfectly balanced in terms of theme and ease on eyes.
      I was a super shock

  • @graczew
    @graczew 6 месяцев назад +1

    Good content as always. Enjoy your Easter break 😉👍

  • @alexdin1565
    @alexdin1565 6 месяцев назад

    Thanks john, but now days most websites don't allow you to open links like you do they will block you after 3 or 4 pages open in same time
    another question If you can make a video on how we can use playwright inside a docker with proxy to make many requests at same time it will be very nice
    sorry for my English, I'm not a native speaker