Scraping Google News the Easy Way with Python and pygooglenews

John Watson Rooney

Просмотров 34 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 ноя 2024

Комментарии • 84

@artembugara1332 3 года назад ⁺¹⁰⁷
Creator of the package here. Nice video!
@JohnWatsonRooney 3 года назад ⁺¹³
Hey! Thanks for creating it for us to use, I’m glad you liked the video!
@barunbodhak720 3 года назад ⁺³
You are live saver
@MrSigi1990 3 года назад ⁺⁶
i get a base64 error when i want to import pygooglenews because it uninstalls feedparser 6.0 and replaces it with 5.2 during pygooglenews package install how can i solve this?
@GamingOzzz 3 года назад ⁺¹
Hey man, is there any way we can get thumbnail of the headlines?
@GamingOzzz 3 года назад ⁺⁴
@@MrSigi1990 pip install feedparser==6.0
@celerystalk390 3 года назад ⁺³
This video and the pygooglenews source code are great intro to scraping rss feed. Thanks John!
@frenchy2216 3 года назад ⁺²
I honestly love RUclips, you've just done everything I wanted to do for me! Now just to try it out, thanks!
@JohnWatsonRooney 3 года назад
Haha glad to help!
@informationdominance6434 2 года назад ⁺⁶
Had an issue with the install. Had to downgrade setuptools to the older version that has support for 2to3:
pip install "setuptools=1.0.0"
pip install -U --no-deps "feedparser>=6.0.8"
@racsomtz9546 8 месяцев назад
I'm still having the problem, any suggestion?
Collecting pygooglenews
Using cached pygooglenews-0.1.2-py3-none-any.whl.metadata (19 kB)
Requirement already satisfied: beautifulsoup4=4.9.1 in c:\python312\lib\site-packages (from pygooglenews) (4.12.2)
Collecting dateparser=0.7.6 (from pygooglenews)
Using cached dateparser-0.7.6-py2.py3-none-any.whl (362 kB)
Collecting feedparser=5.2.1 (from pygooglenews)
Using cached feedparser-5.2.1.zip (1.2 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
ERROR: Can not execute `setup.py` since setuptools is not available in the build environment.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
@kevin-wg5iv 3 года назад ⁺²
Hi John I just wanted to say your videos are awesome thanks
@JohnWatsonRooney 3 года назад
Thanks!
@kevin-wg5iv 3 года назад
@@JohnWatsonRooney Can you please do a video on how to use asynchtmlsessions/arender(), where I have to hit like 50 urls that are javascript, with each request using a different proxy ip from a list, is that possible? i'm trying to hit a few pages and the content is missing and if I did 1 at a time it would take forever. Thank you
@JohnWatsonRooney 3 года назад ⁺¹
@@kevin-wg5iv its coming up don't worry :)
@kevin-wg5iv 3 года назад
@@JohnWatsonRooney awesome, I wasn't sure in that scenario where I need to rotate proxies would it be better to use asynchtmlsession vs normal htmlsession and thread it with concurrent futures but I kept getting errors with concurrent futures and couldn't figure it out. Thanks
@tubelessHuma 3 года назад ⁺¹
Very helpful. You are always looking for new ways. 💕
@JohnWatsonRooney 3 года назад ⁺¹
Glad you think so!
@alessandrodf888 3 года назад ⁺³
AttributeError: module 'base64' has no attribute 'decodestring'
. my py is 3.9.6
@informationdominance6434 2 года назад ⁺¹
did you get it working? I had the same issue and removed some dep.
@adnaneguettaf5461 Год назад ⁺¹
Great video, you coud have used the built-in 'json' package to display the json data in a more readable way
@ryankrueger8538 3 года назад ⁺²
Thank you very much this tutorial helped me with my current python project. :)
@ryankrueger8538 3 года назад
Also, it didn't really work, I got an error:
Traceback (most recent call last):
File "C:\Users\Ryan\PycharmProjects\Stock\Stock Searcher.py", line 1, in
from pygooglenews import GoogleNews
File "C:\Users\Ryan\PycharmProjects\Stock\venv\lib\site-packages\pygooglenews\__init__.py", line 1, in
import feedparser
File "C:\Users\Ryan\PycharmProjects\Stock\venv\lib\site-packages\feedparser.py", line 93, in
_base64decode = getattr(base64, 'decodebytes', base64.decodestring)
AttributeError: module 'base64' has no attribute 'decodestring'
Process finished with exit code 1
@danyylpochtar4017 3 года назад ⁺²
Hello, I have this problem 'base64' has no attribute 'decodestring'. I'm trying to solve it but I can't understand how to do it. Can you help me?
@ankitranjan30 2 года назад
Try this, it just worked for me now:
!pip install "setuptools=1.0.0"
!pip install -U --no-deps "feedparser>=6.0.8"
!pip install pygooglenews==0.1.2
@food.lovestory9847 3 года назад ⁺¹
How to scrap the content of the news headline using this pygooglenews?
@daveplatter7610 Год назад ⁺¹
fantastic video. thank you!
@JakubJacobSobotka 3 года назад ⁺²
Thank you. Can I get the images used by Google News?
@KhalilYasser 3 года назад ⁺¹
Thank you very much. Awesome tutorial as usual.
@lala-rj5di 2 года назад ⁺¹
Interesting! Ty!! Now i have more inspiration on what to look at.(;
@tanishqraj3690 3 года назад ⁺²
"AttributeError: module 'base64' has no attribute 'decodestring'
'
I'm getting this error when I'm using the pygooglenews package can you help me out?
@MrSigi1990 3 года назад ⁺¹
pygooglenews uninstalls feedparses 6.0 and replaces it with 5,.2 which doesnt work on python 3.9 so thats anoying. i guess its possible to write exceptions to now do it or download a lower version of python that runs feedparser 5.2
@GamingOzzz 3 года назад ⁺⁴
pip install feedparser==6.0
@mjjabarian8246 2 года назад ⁺²
Is there anyway to get the whole article or at least the first x characters of the article for each story?
@anthonyfrancq 9 месяцев назад
You can use the newspaper3k package to do that
@kenyeresgellert 3 года назад ⁺²
Great content! Thank you. When those guitars will be played though?
@kishorem7625 Год назад
Sir what if i wanted to search both LOCKDOWN and FOOTBALL together??? please help me im currently facing with this issue.
@eksalailia4352 2 года назад
hello, i want to ask to you. i want to add content of the article. so, what should i declare? please, answer my question and thank you🙏
@beastvirus 3 года назад ⁺¹
Please create on a video how to scrape Google SERPs for mobile and desktop and keep track of ranking.
@jjw7362 Год назад ⁺¹
Thank you. But can not collect the news contents??
@JohnWatsonRooney Год назад
hi, unfortunately i think this package has stopped working
@wataruterada9351 3 года назад ⁺²
Thank you John for this helpful video. Is there any way to remove the limit for 100 results?
@JohnWatsonRooney 3 года назад ⁺¹
Thanks! I am not aware of a way to remove the limit, I believe it is google news that controls that
@aaronbell759 2 года назад ⁺¹
What's a workaround for getting 429 responses? Trying to scrape google search results
@JohnWatsonRooney 2 года назад
I think this package was abandoned and no longer works I’m afraid
@aaronbell759 2 года назад
@@JohnWatsonRooney I'm actually not using this package, but just using bs4 to scrape google search results. They seem to have some advanced rate limit tracking so maybe a proxy is the way to go but I'd prefer to figure out a better way
@p.mistry1986 3 года назад ⁺²
Awesome content as usual! How exactly would I go about pairing a list of search terms with the articles it pulls? So for example if I created a list called SearchList = ['Basketball', 'Football', 'Hockey'] and I iterated through this list to search for all articles related to each list item. How would I then associate each item from SearchList to their corresponding articles and printing it out in a neat fashion. So almost having one column showing the search term and the second column showing the corresponding article? Thank you!
@nockmago 3 года назад ⁺²
you put the results of the search in a pandas dataframe and add a new column with the corresponding search term!
@sommojames Год назад
I fail already on first line "from pygooglenews import GoogleNews". My error is "AttributeError: module 'base64' has no attribute 'decodestring'". I am using last version of Python. Just me?
@JohnWatsonRooney Год назад ⁺¹
I’m afraid this package does not work anymore, it’s a shame
@mrsfunny669 3 года назад
Can we export to html files the result from scraping? To make static site
@azizullah7881 Год назад
with respect, your lecture is much valuable and I get a lot of knowledge from it, but I have a question about how to news file to an excel sheet. I would be thankful
@mohammedzareefw203 3 года назад ⁺²
can you make a tutorial how to display scrap data on flask website ?
@JohnWatsonRooney 3 года назад ⁺²
Yes I’m working on one!
@paveldanilov4869 2 года назад
now the package isnt available, :((
@miguelnuno928 2 года назад
How to get thumbanil/image of the article?
@ItzelFlores-ku7dz 7 месяцев назад
Thank you.
@lee2501 Год назад ⁺¹
Does anyone knows why do I get this error message?
pip install pygooglenews --upgrade
Collecting pygooglenews
Using cached pygooglenews-0.1.2-py3-none-any.whl (10 kB)
Collecting feedparser=5.2.1
Using cached feedparser-5.2.1.zip (1.2 MB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [1 lines of output]
error in feedparser setup command: use_2to3 is invalid.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
@JohnWatsonRooney Год назад
I’m afraid I think this package no longer works so this video is out dated now
@ArronFinn 3 года назад ⁺¹
Hi John, thanks a mill for this video it has helped a lot. Is there a way to turn the result into an Excel file after getting the list?
@JohnWatsonRooney 3 года назад ⁺²
Sure - I usually use pandas to create a data frame from the list then export to csv but you can use the csv module in Python if you’d rather
@burtmcgurt3584 2 года назад ⁺¹
I like it!
@sanadmasoud9898 Год назад ⁺¹
Thanks for the video, can we return the actual article content?
@JohnWatsonRooney Год назад ⁺¹
Unfortunately I don’t think this package works anymore. You can still scrape news but it would be direct using requests rather than this
@sanadmasoud9898 Год назад ⁺¹
@@JohnWatsonRooney Sounds good thanks!
@oneminute4565 3 года назад
how to scrape data from mobile app??? please consider it
@miguelnuno928 2 года назад
Did not work for me. The deps feedparser and others need to be updated.
@informationdominance6434 2 года назад ⁺³
easy fix...
Had to downgrade setuptools to the older version that has support for 2to3:
pip install "setuptools=1.0.0"
pip install -U --no-deps "feedparser>=6.0.8"
@miguelnuno928 2 года назад
@@informationdominance6434 Thank you for your answer. I did what you did. unfortunatly i still get the error ModuleNotFoundError: No module named 'sgmllib'
@ankitranjan30 2 года назад
@@miguelnuno928 Try this, it just worked for me now:
!pip install "setuptools=1.0.0"
!pip install -U --no-deps "feedparser>=6.0.8"
!pip install pygooglenews==0.1.2
@moustafaezz9158 3 года назад
Hey bro can you make a tutorial integrating Scrapy with Django
@Probly 2 года назад ⁺¹
humongous amount of errors when trying to parse date
@JohnWatsonRooney 2 года назад ⁺¹
Unfortunately I don’t think this works anymore
@Probly 2 года назад
@@JohnWatsonRooney no problem thanks for the reply. Was trying to use it for my dissertation but it’s due today so couldn’t make it work
@vispinet 2 года назад
@@JohnWatsonRooney same here. It works but cannot filter by date. Also Another thing I'm missing is a short description of the article, which I need in my dataset. Do you have any advice on the best Google News scraping method right now?
@jawadahmadkhan1076 2 года назад ⁺¹
IP will be blocked in no time.
@kaladappanimi4269 3 года назад
Hi john I'm still waiting for your email. So i could send you site for the tutorial
@JohnWatsonRooney 3 года назад
hi! its on my main RUclips page, i'd rather not post it in comments as it get picked up for spam.
@atangbingana283 Год назад
@JohnWatsonRooney Bro Grow your Channel rn put some videos showing poeple how to code all this amazing stuff using Chat GPT you will be 100 K subs in two months.

Следующие

Автовоспроизведение