Scraping Google News the Easy Way with Python and pygooglenews

Поделиться
HTML-код
  • Опубликовано: 11 ноя 2024

Комментарии • 84

  • @artembugara1332
    @artembugara1332 3 года назад +107

    Creator of the package here. Nice video!

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +13

      Hey! Thanks for creating it for us to use, I’m glad you liked the video!

    • @barunbodhak720
      @barunbodhak720 3 года назад +3

      You are live saver

    • @MrSigi1990
      @MrSigi1990 3 года назад +6

      i get a base64 error when i want to import pygooglenews because it uninstalls feedparser 6.0 and replaces it with 5.2 during pygooglenews package install how can i solve this?

    • @GamingOzzz
      @GamingOzzz 3 года назад +1

      Hey man, is there any way we can get thumbnail of the headlines?

    • @GamingOzzz
      @GamingOzzz 3 года назад +4

      @@MrSigi1990 pip install feedparser==6.0

  • @celerystalk390
    @celerystalk390 3 года назад +3

    This video and the pygooglenews source code are great intro to scraping rss feed. Thanks John!

  • @frenchy2216
    @frenchy2216 3 года назад +2

    I honestly love RUclips, you've just done everything I wanted to do for me! Now just to try it out, thanks!

  • @informationdominance6434
    @informationdominance6434 2 года назад +6

    Had an issue with the install. Had to downgrade setuptools to the older version that has support for 2to3:
    pip install "setuptools=1.0.0"
    pip install -U --no-deps "feedparser>=6.0.8"

    • @racsomtz9546
      @racsomtz9546 8 месяцев назад

      I'm still having the problem, any suggestion?
      Collecting pygooglenews
      Using cached pygooglenews-0.1.2-py3-none-any.whl.metadata (19 kB)
      Requirement already satisfied: beautifulsoup4=4.9.1 in c:\python312\lib\site-packages (from pygooglenews) (4.12.2)
      Collecting dateparser=0.7.6 (from pygooglenews)
      Using cached dateparser-0.7.6-py2.py3-none-any.whl (362 kB)
      Collecting feedparser=5.2.1 (from pygooglenews)
      Using cached feedparser-5.2.1.zip (1.2 MB)
      Preparing metadata (setup.py) ... error
      error: subprocess-exited-with-error
      × python setup.py egg_info did not run successfully.
      │ exit code: 1
      ╰─> [1 lines of output]
      ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
      [end of output]
      note: This error originates from a subprocess, and is likely not a problem with pip.
      error: metadata-generation-failed
      × Encountered error while generating package metadata.
      ╰─> See above for output.
      note: This is an issue with the package mentioned above, not pip.
      hint: See above for details.

  • @kevin-wg5iv
    @kevin-wg5iv 3 года назад +2

    Hi John I just wanted to say your videos are awesome thanks

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад

      Thanks!

    • @kevin-wg5iv
      @kevin-wg5iv 3 года назад

      @@JohnWatsonRooney Can you please do a video on how to use asynchtmlsessions/arender(), where I have to hit like 50 urls that are javascript, with each request using a different proxy ip from a list, is that possible? i'm trying to hit a few pages and the content is missing and if I did 1 at a time it would take forever. Thank you

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      @@kevin-wg5iv its coming up don't worry :)

    • @kevin-wg5iv
      @kevin-wg5iv 3 года назад

      @@JohnWatsonRooney awesome, I wasn't sure in that scenario where I need to rotate proxies would it be better to use asynchtmlsession vs normal htmlsession and thread it with concurrent futures but I kept getting errors with concurrent futures and couldn't figure it out. Thanks

  • @tubelessHuma
    @tubelessHuma 3 года назад +1

    Very helpful. You are always looking for new ways. 💕

  • @alessandrodf888
    @alessandrodf888 3 года назад +3

    AttributeError: module 'base64' has no attribute 'decodestring'
    . my py is 3.9.6

  • @adnaneguettaf5461
    @adnaneguettaf5461 Год назад +1

    Great video, you coud have used the built-in 'json' package to display the json data in a more readable way

  • @ryankrueger8538
    @ryankrueger8538 3 года назад +2

    Thank you very much this tutorial helped me with my current python project. :)

    • @ryankrueger8538
      @ryankrueger8538 3 года назад

      Also, it didn't really work, I got an error:
      Traceback (most recent call last):
      File "C:\Users\Ryan\PycharmProjects\Stock\Stock Searcher.py", line 1, in
      from pygooglenews import GoogleNews
      File "C:\Users\Ryan\PycharmProjects\Stock\venv\lib\site-packages\pygooglenews\__init__.py", line 1, in
      import feedparser
      File "C:\Users\Ryan\PycharmProjects\Stock\venv\lib\site-packages\feedparser.py", line 93, in
      _base64decode = getattr(base64, 'decodebytes', base64.decodestring)
      AttributeError: module 'base64' has no attribute 'decodestring'
      Process finished with exit code 1

  • @danyylpochtar4017
    @danyylpochtar4017 3 года назад +2

    Hello, I have this problem 'base64' has no attribute 'decodestring'. I'm trying to solve it but I can't understand how to do it. Can you help me?

    • @ankitranjan30
      @ankitranjan30 2 года назад

      Try this, it just worked for me now:
      !pip install "setuptools=1.0.0"
      !pip install -U --no-deps "feedparser>=6.0.8"
      !pip install pygooglenews==0.1.2

  • @food.lovestory9847
    @food.lovestory9847 3 года назад +1

    How to scrap the content of the news headline using this pygooglenews?

  • @daveplatter7610
    @daveplatter7610 Год назад +1

    fantastic video. thank you!

  • @JakubJacobSobotka
    @JakubJacobSobotka 3 года назад +2

    Thank you. Can I get the images used by Google News?

  • @KhalilYasser
    @KhalilYasser 3 года назад +1

    Thank you very much. Awesome tutorial as usual.

  • @lala-rj5di
    @lala-rj5di 2 года назад +1

    Interesting! Ty!! Now i have more inspiration on what to look at.(;

  • @tanishqraj3690
    @tanishqraj3690 3 года назад +2

    "AttributeError: module 'base64' has no attribute 'decodestring'
    '
    I'm getting this error when I'm using the pygooglenews package can you help me out?

    • @MrSigi1990
      @MrSigi1990 3 года назад +1

      pygooglenews uninstalls feedparses 6.0 and replaces it with 5,.2 which doesnt work on python 3.9 so thats anoying. i guess its possible to write exceptions to now do it or download a lower version of python that runs feedparser 5.2

    • @GamingOzzz
      @GamingOzzz 3 года назад +4

      pip install feedparser==6.0

  • @mjjabarian8246
    @mjjabarian8246 2 года назад +2

    Is there anyway to get the whole article or at least the first x characters of the article for each story?

    • @anthonyfrancq
      @anthonyfrancq 9 месяцев назад

      You can use the newspaper3k package to do that

  • @kenyeresgellert
    @kenyeresgellert 3 года назад +2

    Great content! Thank you. When those guitars will be played though?

  • @kishorem7625
    @kishorem7625 Год назад

    Sir what if i wanted to search both LOCKDOWN and FOOTBALL together??? please help me im currently facing with this issue.

  • @eksalailia4352
    @eksalailia4352 2 года назад

    hello, i want to ask to you. i want to add content of the article. so, what should i declare? please, answer my question and thank you🙏

  • @beastvirus
    @beastvirus 3 года назад +1

    Please create on a video how to scrape Google SERPs for mobile and desktop and keep track of ranking.

  • @jjw7362
    @jjw7362 Год назад +1

    Thank you. But can not collect the news contents??

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад

      hi, unfortunately i think this package has stopped working

  • @wataruterada9351
    @wataruterada9351 3 года назад +2

    Thank you John for this helpful video. Is there any way to remove the limit for 100 results?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +1

      Thanks! I am not aware of a way to remove the limit, I believe it is google news that controls that

  • @aaronbell759
    @aaronbell759 2 года назад +1

    What's a workaround for getting 429 responses? Trying to scrape google search results

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад

      I think this package was abandoned and no longer works I’m afraid

    • @aaronbell759
      @aaronbell759 2 года назад

      @@JohnWatsonRooney I'm actually not using this package, but just using bs4 to scrape google search results. They seem to have some advanced rate limit tracking so maybe a proxy is the way to go but I'd prefer to figure out a better way

  • @p.mistry1986
    @p.mistry1986 3 года назад +2

    Awesome content as usual! How exactly would I go about pairing a list of search terms with the articles it pulls? So for example if I created a list called SearchList = ['Basketball', 'Football', 'Hockey'] and I iterated through this list to search for all articles related to each list item. How would I then associate each item from SearchList to their corresponding articles and printing it out in a neat fashion. So almost having one column showing the search term and the second column showing the corresponding article? Thank you!

    • @nockmago
      @nockmago 3 года назад +2

      you put the results of the search in a pandas dataframe and add a new column with the corresponding search term!

  • @sommojames
    @sommojames Год назад

    I fail already on first line "from pygooglenews import GoogleNews". My error is "AttributeError: module 'base64' has no attribute 'decodestring'". I am using last version of Python. Just me?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад +1

      I’m afraid this package does not work anymore, it’s a shame

  • @mrsfunny669
    @mrsfunny669 3 года назад

    Can we export to html files the result from scraping? To make static site

  • @azizullah7881
    @azizullah7881 Год назад

    with respect, your lecture is much valuable and I get a lot of knowledge from it, but I have a question about how to news file to an excel sheet. I would be thankful

  • @mohammedzareefw203
    @mohammedzareefw203 3 года назад +2

    can you make a tutorial how to display scrap data on flask website ?

  • @paveldanilov4869
    @paveldanilov4869 2 года назад

    now the package isnt available, :((

  • @miguelnuno928
    @miguelnuno928 2 года назад

    How to get thumbanil/image of the article?

  • @ItzelFlores-ku7dz
    @ItzelFlores-ku7dz 7 месяцев назад

    Thank you.

  • @lee2501
    @lee2501 Год назад +1

    Does anyone knows why do I get this error message?
    pip install pygooglenews --upgrade
    Collecting pygooglenews
    Using cached pygooglenews-0.1.2-py3-none-any.whl (10 kB)
    Collecting feedparser=5.2.1
    Using cached feedparser-5.2.1.zip (1.2 MB)
    Preparing metadata (setup.py) ... error
    error: subprocess-exited-with-error
    × python setup.py egg_info did not run successfully.
    │ exit code: 1
    ╰─> [1 lines of output]
    error in feedparser setup command: use_2to3 is invalid.
    [end of output]
    note: This error originates from a subprocess, and is likely not a problem with pip.
    error: metadata-generation-failed
    × Encountered error while generating package metadata.
    ╰─> See above for output.
    note: This is an issue with the package mentioned above, not pip.
    hint: See above for details.

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад

      I’m afraid I think this package no longer works so this video is out dated now

  • @ArronFinn
    @ArronFinn 3 года назад +1

    Hi John, thanks a mill for this video it has helped a lot. Is there a way to turn the result into an Excel file after getting the list?

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад +2

      Sure - I usually use pandas to create a data frame from the list then export to csv but you can use the csv module in Python if you’d rather

  • @burtmcgurt3584
    @burtmcgurt3584 2 года назад +1

    I like it!

  • @sanadmasoud9898
    @sanadmasoud9898 Год назад +1

    Thanks for the video, can we return the actual article content?

    • @JohnWatsonRooney
      @JohnWatsonRooney  Год назад +1

      Unfortunately I don’t think this package works anymore. You can still scrape news but it would be direct using requests rather than this

    • @sanadmasoud9898
      @sanadmasoud9898 Год назад +1

      @@JohnWatsonRooney Sounds good thanks!

  • @oneminute4565
    @oneminute4565 3 года назад

    how to scrape data from mobile app??? please consider it

  • @miguelnuno928
    @miguelnuno928 2 года назад

    Did not work for me. The deps feedparser and others need to be updated.

    • @informationdominance6434
      @informationdominance6434 2 года назад +3

      easy fix...
      Had to downgrade setuptools to the older version that has support for 2to3:
      pip install "setuptools=1.0.0"
      pip install -U --no-deps "feedparser>=6.0.8"

    • @miguelnuno928
      @miguelnuno928 2 года назад

      @@informationdominance6434 Thank you for your answer. I did what you did. unfortunatly i still get the error ModuleNotFoundError: No module named 'sgmllib'

    • @ankitranjan30
      @ankitranjan30 2 года назад

      @@miguelnuno928 Try this, it just worked for me now:
      !pip install "setuptools=1.0.0"
      !pip install -U --no-deps "feedparser>=6.0.8"
      !pip install pygooglenews==0.1.2

  • @moustafaezz9158
    @moustafaezz9158 3 года назад

    Hey bro can you make a tutorial integrating Scrapy with Django

  • @Probly
    @Probly 2 года назад +1

    humongous amount of errors when trying to parse date

    • @JohnWatsonRooney
      @JohnWatsonRooney  2 года назад +1

      Unfortunately I don’t think this works anymore

    • @Probly
      @Probly 2 года назад

      @@JohnWatsonRooney no problem thanks for the reply. Was trying to use it for my dissertation but it’s due today so couldn’t make it work

    • @vispinet
      @vispinet 2 года назад

      @@JohnWatsonRooney same here. It works but cannot filter by date. Also Another thing I'm missing is a short description of the article, which I need in my dataset. Do you have any advice on the best Google News scraping method right now?

  • @jawadahmadkhan1076
    @jawadahmadkhan1076 2 года назад +1

    IP will be blocked in no time.

  • @kaladappanimi4269
    @kaladappanimi4269 3 года назад

    Hi john I'm still waiting for your email. So i could send you site for the tutorial

    • @JohnWatsonRooney
      @JohnWatsonRooney  3 года назад

      hi! its on my main RUclips page, i'd rather not post it in comments as it get picked up for spam.

  • @atangbingana283
    @atangbingana283 Год назад

    @JohnWatsonRooney Bro Grow your Channel rn put some videos showing poeple how to code all this amazing stuff using Chat GPT you will be 100 K subs in two months.