Scrape Twitter with 5 Lines of Code
HTML-код
- Опубликовано: 22 май 2024
- In this video I show how you can scrape twitter data using the python library snscrape to easily pull millions of historic tweets and save them off on your computer. Use this to create data for analysis or just archive off your tweets quick and easy.
snscrape github: github.com/JustAnotherArchivi...
Timeline:
00:00 Intro
01:16 Twitter Scraper
03:37 Loop and save bulk
05:35 Add a Progress bar
Follow me on twitch for live coding streams: / medallionstallion_
My other videos:
Speed Up Your Pandas Code: • Make Your Pandas Code ...
Speed up Pandas Code: • Make Your Pandas Code ...
Intro to Pandas video: • A Gentle Introduction ...
Exploratory Data Analysis Video: • Exploratory Data Analy...
Working with Audio data in Python: • Audio Data Processing ...
Efficient Pandas Dataframes: • Speed Up Your Pandas D...
* RUclips: youtube.com/@robmulla?sub_con...
* Discord: / discord
* Twitch: / medallionstallion_
* Twitter: / rob_mulla
* Kaggle: www.kaggle.com/robikscube
#webscraping #python #twitter - Наука
Unfortunately, does not work anymore after changes. Returns 404 constantly.
Hi Rob, I'm new to data science and I was wondering if you can make a video that shares the basics to webscraping from webpages. More specifically how to go about navigating and reading the parsed HTML texts of data to locate the strings we want like say content specific to a table or reviews on a webpages.
Thanks for posting such easy to digest videos and sharing valuable tips to do things in a more efficient way!
Presently doing my MSc project and this video just pushed me forward so much!
Thank youuuuu
Really glad to hear that.
You make it look so easy. Amazing vid!
Thank you! 😊 Hopefully you can put it to some good use.
oh man, this is incredible and exactly what I was looking for 🤩 Let us know when you decide to launch a Patreon!
Thanks man! No Patreon for now, just share with others who might also apprecaite it.
Awesome. Simple and straight 👌
Thanks for watching!
Amazing, will help me a lot with a project. Keep up whit the awesome content!!
Great to hear! Thanks for commenting!
Hi, thanks for the video super useful.
I am quite new to all of this. What software are you using? And do you make those titles/write in each box separately?
I am currently using PyCharm.
He's using Jupyter Notebook
Another nice vid! Good job!
Glad you enjoyed! Share with a friend!
@@robmulla Always do!
hi, thank you for the tutorial. do you know how to set the data for specified location or range?
This is awesome, nice and straight to the point!
and its not working
thanks! your progress bar is awesome!
I appreciate it! I actually have a whole video on tqdm (the progress bar package) you should check out.
I'm geting 403 Forbidden error while trying to scrape tweets using the library. Does it means there is a restriction?
I wish you were my lecturer/ tutor, I’m doing a conversion masters and the way you explain code is stopping me from having a breakdown 🤣🤣
Ha! Thanks, I apprecaite that. I try to make things easy to follow, glad to hear you're learning from it!
@@robmulla you’re doing an amazing job at making it easy to follow. Thank you!
Hello, I wrote the code for the video, but I'm getting an error. It still runs as of this date due to the issue with the APIs that need to be paid for on Twitter.
Sir, do you have a chat group where people ask questions? I followed your flight EDA, do you think predictions could be done with random forest ? Do you have something about this? (newbie here) . Thanks. BTW i like your videos.
Yes I do! Join the discord: discord.gg/HZszek7DQc
Hello,Would you mind if ask if you can demonstrate how to use a crawler to capture twitter space cations. If the audio file is very long (more than ten or twenty hours), how to quickly capture it?
Thanks for the knowledge sharing ....
When i am trying the same i am getting the 404 error .. Can you help me with the error rectification..
Hi Rob, thank you a lot for this video. I have two questions:
1. How to proceed if we want to scrap data on twitter with more queries like : #father; #mother; #baby; #foot etc.. Are we going to write one code for each query?
2. Did you make video on scrapping Facebook, Instagram, Mastodon, Reddit, Telegram like you did for twitter?
use a for loop
Thanks for the awesome video, are you planning on doing NLP project?
Thanks for watching! Have you seen my video on sentiment analysis? That has some NLP ruclips.net/video/QpzMWQvxXWk/видео.html
Instead of capturing tweets based on some hashtag, can we pull tweets from specific accounts? Like I have 5 accounts for whom I want the tweets data for a given time frame, is it doable from this method?
Just subscribed ❤
Thanks for subbing. Hope you like the videos.
Hi I am getting an error at line 6 for loop that Error retrieving , errors blocked 403 and request failed to giving up
Excellent Rob. Would you happen to have tutorial sessions? if so kindly link me to it
Yes! Subscribe and hit the bell icon and you’ll be notified when I go live. Also follow me on twitch.
It seems like the snscrape module doesn't work anymore, due to returns an error. Do you have any information?
This is gold. 6 minutes and so much information given without any unnecessary fluff... I'm amazed. Thank you so much.
Could you make it work? I guess the most recent reply is yours, I get " blocked (404) " error with Python 3.10.
@@ozgurbuldum6829 yeah, blocked!
@@ozgurbuldum6829 it does not work anymore right?
Thank you for this wonderful tutorial.
Q: How can I retrieve infos like tweet content , username ... ,for a specific tweet using its id or url?
Thank you so much.
I have one question.
When I use snscrape, I can only use "tweet.content, count, date, id, index, outlinks, outlinkess, tcountlinks, tcountlinksss, url".
But I want to explore like counts, image url and so on. Would you tell me how to scrape them?
I believe likes are in there as well as the attached images. But you might want to check test it out. If snscrape doesn’t work you’ll need to get a Twitter api key.
I'd really love to see you Forecasting Time Series using Transformers based Neural Nets.
I plan on making a video like this at some point. Only downside is that in my experience these models don't perform well unless the data is huge.
@@robmulla You're right, but given the versatility of Transformers Architecture and the actual importance on the State of the Art it's worthy to know it, I personally haven't found good materials and since your videos are awesome I'd very much like to see it.
Hi guys, I have a problem with snscrape. The code works well but without any reason the process early stop at 188 tweets :/
Hey I am getting a 403 blocked error, I suppose the snscrape is going through API
Mate, how are u so good? What's your strategy? You code every single day? Or maybe you have many yrs of experience? What sort of courses or books do you study? How often do you read papers?
I'm asking cz I learn so much jst thru recreating some of your projects and usually u have a very unique approach that I am used to, with very accurate results.
I sure wanna be like you some day.
Also could u keep the projects coming mate? I know there are tons of them on kaggle but it's quite frustrating to get stuck for us newbies a video explanation really helps.
Thanks for the kind words. I really don’t think of myself as being better than any other person with the same amount of experience. I think the main thing I try to do is learn constantly and be patient. Things take time to learn. Check out my video about how to get started in ML for a more detailed breakdown of the skills to learn. Good luck.
Good stuff! Why did use a list first (data) rather than a data.frame?
Thanks! Appending to a list is much faster than appending to a dataframe so typically I prefer to just creat the dataframe after all the data is collected.
Amazing man
Thanks 🔥
with this twitter x update is not working anymore? I tried to run the "for tweet in..." and it did not work "Error retrieving ...."
Yea, I think Elon broke it. :(
I tried but showing error after "for" statement, Error like : blocked (403), Error retrieving and message "Bad Authentication data" code 215 what I do now? Actually I have some twitter link and from that link I want like , view , RT count .. can you help me.. ?? How I code
very nice. Thanks
Most welcome
Very effective teaching! could you have a video for exploring data scrapped from twitter?
Thanks! I'd suggest checking out one of my live streams!
How do I get a specific tweet and retrieve the quotes and replies?
Thank you very much for the video,,,
SNScrape used to work for me before but now it shows blocked 404, after some research I found that twitter now is blocking scraping without login.....
any idea how to solve the problem ?
Many thanx for the efforts
Interesting, I was just going to start using this. 😢
Hi Rob. Thanks for the video.
Did twitter update their security? Using your exact code from the first 3 minutes of the video, I keep getting an error retrieving api requests. Any tips on how to get past that?
Sameee. Did you work around it?
@@eve9587 I did not unfortunately. What about you?
Dude use top=True
It works now
@@eve9587 Hi, I am having the same problem with snscrape recently. I am quite beginner on this. Can you explain where to apply top=true? Thanks in advance.
Looks to be an issue everyone is experiencing. You can follow the github issues here and they may update. Hopefully just a temporary issue...
github.com/JustAnotherArchivist/snscrape/issues
Can i use this same code in vscode
how can i scrape specific tweets? pls answer me it is very important .
HI, Rob. I don't understand why i have this error:
Errors: blocked (403), blocked (403), blocked (403), blocked (403)
that's when i call scraper.get_items()
same errror
Does it still work with new API packages?
Hi Rob! How much time it takes per 100k tweet? Just to compare with my scraper built in another language which takes approximately 10 to 15min to scape each 100k tweets.
I think if you watch the end of the video the progress bar shows 35 tweets per second. So like 1 hour for 100k tweets?
Is your method still working today ?
hi rob, i think this api doesn't work anymore. i think twitter blocked it
if I have the tweet's url or ID, can I get its content?
The data seems to be truncated. could anyone please tell me how to deal with that ?
It is not working now days.
Do you have any other alternatives?
Hello, Thanks for the nice tutorial, I just wonder if there is a way to scrape only tweets instead of reply and tweets
I’m not sure but it should be possible in the query method.
May I ask if I want to scrape from one specific user? How can I do that?
Great question. You can add any query search terms like you would in the twitter search bar. Here is a cheat sheet: media.sproutsocial.com/uploads/2016/02/Twitter-Search-Operators-Cheatsheet-1.pdf
So to search a specific user you should be able to do something like "from:Rob_Mulla".
Hope that helps.
@@robmulla Thanks for answering me! I'll definitely try that method 😁
The code is no longer working; it reports a problem of tweets blocking. Could you please check and help solve this issue?
Yh noticed thusalso, something seems to be blocked.
How can I solve the 404 request error that I get when looping through the scraper object?
Any idea?
I think they’ve blocked it unfortunately
why the tweets extracted showing only half text? why not full text plz help
Oh no, I haven't experienced that. Have you checked the github issues to see if others have the same problem? github.com/JustAnotherArchivist/snscrape/issues
it's not working for me I'm getting and error in retrieving the data 🤕🤕
When I run the code the progress bar turns red and becomes "1001/?". Did I make a mistake in the code or can I fix this?
I think it's because that loop will actually gather 1002 results with n_tweets set at 1000. I think Rob was just trying to get ~1000, not 1000 exactly.
Reason: enumerate starts from 0, and the loop is only broken once that number goes ABOVE n_tweets (e.g. 1001), so you end up with an extra result at both ends.
Quick Fix: break the loop when i > (n_tweets - 2)
Good catch! This is a bug. It can be fixed by subtracting 2 from the number of tweets in the line before the break.
is there a way to pull say, latest 50 tweets from a specific user?
Yes! I mentioned in a different response but you can use twitter's query to filter to specific users or dates. Check this cheat sheet:
media.sproutsocial.com/uploads/2016/02/Twitter-Search-Operators-Cheatsheet-1.pdf
So do something like "from:Rob_Mulla"... And setting your loop to only go through 50 would get you the latest tweets.
Everything works (in VS Code notebook) except the progress bar isn't displaying? It runs without error. Here is the code:
n_tweets = 5000
for i, tweet in tqdm(enumerate(scraper.get_items()),total=n_tweets):
data = [
tweet.date,
tweet.id,
tweet.content,
tweet.user.username,
tweet.likeCount,
tweet.retweetCount,
]
tweets.append(data)
if i >n_tweets:
break
try importing the base tqdm instead of tqdm notebook:
replace:
from tqdm.notebook import tqdm
with:
from tqdm import tqdm
@@robmulla Thanks that did the trick!!!
Is it possible to scrape profile information and followers list? thank you
That’s a great question. There are ways to do this but I don’t believe snscrape can do it. You might need to apply for the Twitter api and use something like tweepy
big headstart in a project
does it still work? i doubt that. Was getting a 404 and 429 status codes last time i tried
Does Twitter block you if create a bot with and get tweets every day?
I’m not sure. Things are changing every day at Twitter. This package has been relatively unstable lately so you might want to check out their GitHub (linked in the description)
how do i scrape the english tweets alone?
Thank you for sharing this amazing tool and wonderful tutorial.
Q: How can I retrieve all replies to a selection of tweets?
Thanks. I don’t know if that’s possible with snscrape. You might need the official Twitter api.
what software do you use for write the code ?
it's look like jupyterlab
Yes, jupyterlab - I have a whole video about it!
is this still valid with x though?
Why your jupyter notebook interface looks different than mine?
I got into detail about my jupyter setup towards the end of this video: ruclips.net/video/5pf0_bpNbkw/видео.html - Hope that helps. In short: jupyterlab with solarized dark theme.
@@robmulla Thanks. Imma do it with my setup
Hi, just wanted to ask, we need twitter API access to do this scraping, right? Cause when I run the same code, it sends a request to twitter API website and then an error is displayed.
did you find any solution
@@khadijadar3156 yes, I can scrape tweets now. It was a very simple code.
@@rishavbhardwaj8044 can you tell me how you fixed that?
tell us how!
@@TemporaryForstudy
I was using this code, but as of 2 days ago, it stopped working:
import snscrape.modules.twitter as sntwitter
import pandas as pd
limit = 1000
tweets_list = []
for tweet in sntwitter.TwitterSearchScraper("NestleIndia").get_items():
if len(tweets_list) == limit:
break
else:
tweets_list.append([tweet.date, tweet.content, tweet.user.username, tweet.user.displayname,
tweet.user.description, tweet.replyCount, tweet.retweetCount, tweet.likeCount,
tweet.lang])
df = pd.DataFrame(tweets_list, columns=['Date', 'Content', 'Username', 'Description', 'Language'])
# Save the DataFrame to a CSV file
df.to_csv('tweets_nestle.csv', index=False)
print("Data saved to tweets.csv")
Awesome
Thanks Patrick!
Can u limit the tweets to only tweets in a certain language say English?
Yes, use something like "lang:en" in the query. Check out this cheatsheet on twitter queries. media.sproutsocial.com/uploads/2016/02/Twitter-Search-Operators-Cheatsheet-1.pdf
how can we install snscrape?
an error 403 from twitter api occurs
at get items
Sir, is it legal to scrape data on twitter without using twitter api now? I heard twitter shut down their free api.
I don’t know. I’m not a lawyer but you should be ok.
@@robmulla thank you
I have seen that you use jupyter lab in dark mode, how to activate dark mode?
Solarized dark theme! Check out my tutorial video where I explain how: ruclips.net/video/5pf0_bpNbkw/видео.html
This with tweepy makes for a very interesting reposting bot
Interesting idea.
My code is not working
What would be helpful for me would be a video on how to connect Python to a SQL Server database.!!
It's really easy with pandas. pd.read_sql() - you just need to create the connection object which depends on your database type.
Hi Rob, I cannot seem to make it work, I am not sure if its something I am doing but it produces a 'Errors: blocked (404), blocked (404), blocked (404), blocked (404)' message and states 'failed, giving up', with a lot of links to twitter that show I don't have access. Do you have any suggestions? Thanks!
same issue
snscrape no longer works for twitter. Elon has changed the policies making it hard to scrape without using the official API.
how do I get only people account and also their follower and following count?
Not sure if you can do this with snscrape, but with tweepy you could.
@@robmulla thank you sir!
can we access tweets older than 7 days?
At least you used to be able to
Will I be able to scrape data using google collab?
You should be able to!
sure
Hi Rob - I tried but it is not working anymore.
What is the mark after the "for i ,"
The comma after the "i" is to separate out the two things enumerate returns (index, value) in to the index "i" and value "tweet".
Does that make sense?
What Tim said! Thanks Tim!
ok people this aint working now dont waste your time here
How can I get follower count of different users using this??
I don’t think so. The Twitter api should give this easy enough though.
Edit: It looks like this package is no longer working for some people (Thanks Elon) If you have issues here are some tips:
From the SNScrape github page:
```
If you discover an issue with snscrape, please report it at github.com/JustAnotherArchivist/snscrape/issues. If you use the CLI, please run snscrape with -vv and include the log output in the issue. If you use snscrape as a module, please enable debug-level logging using import logging; logging.basicConfig(level = logging.DEBUG) (before using snscrape at all) and include the log output in the issue.
```
even I install the last version, it still won't work
Yes it's not working 😭😭
please help with the 403 error if you can.
It’s possible they may have blocked this
@@robmulla Fix it
the coding is easy but currently, it's not working. I tried many times 🤕
i've try now but sntwitter cannot get any items
Oh no. Did Elon shut it down?!
The request was blocked by the twitter.
i have an error "Unable to find guest token"
I'm not sure why that would be, works for me! Let me know if you get it working.
came across this error you need to initialize the scraper object after before every run of .get_items()
What is the thing you’re using to run the script? I know nothing about coding so I’m confused
Juypter notebook just google it and you’ll find it … or search anaconda
It’s not the snake I promise…
hi rob? are you have a new trick for scrapping? i need this for college. thank you for attention rob GBU.
Hi Rob, thank you very much for your video. I tried to scrape Twitter with snscrape but it seams like the new API version blocks snscrape. Could you do a new video to scrape it without the API? Is there any way to do it?
thank god i saw your comment before wasting my time just like i did on tweepy😭
apis aren't working now! f$u elon
I AM TRYING TO FIGUIRE IT OUT... DID YOU FIND ANY WAY TO SCRAPE TWITTER?
LIKE I SPENT WEEKS TO FIND OUT A SOLUTION BUT FAILED..
What shall we do for reddit,
Make one analysing WallStreetBets.
Great idea. Maybe I’ll make another video.
not working anymore
Does anyone know if this method still works today?
nope
brother can you make a video on followers scarping
Ohh. Great idea. I need to figure it out first. I think the Twitter api would handle that
it is not working anymore !
Hello , is this code running now with all the v2 new api Elon announced ? Thank you a lot