this is absolute beautiful. a little side note: if you are experienceing internet lags and unable to locate elements, use sleep(2) before you try to obtain elements. sleep (2) means the program will pause 2 seconds after it carry on. it gives time for buffer when your internet is slow.
Hello, thank you for the program, my key word is to capture the posts of a specific day ('lang:en until:2019-11-30 since:2019-11-29 -filter:links -filter:replies'), but Data return starts at 23:59 (Timestamp: 2019-11-29T23:59:59.000Z), is there a way to capture data during a specific time period, for example, from 12:00 noon to 1 pm of the day. ?
Can someone help me through a problem , i am using pycharm and when i run my code , it opens up a new window every single time and its really annoyion, is there any way i can use one window for multiple executions
I completed the whole thing but I get an error message ( Message: no such element: Unable to locate element: {"method":"xpath","selector":"//input[@aria-label="Search query"]"} ) for the search function ( input[@aria-label="Search query"] ). It was doing the same thing prior to the final completion portion but I would just copy & paste the same thing & it worked. Once it's all wrapped in one I can't do it; any ideas?
hard to say with seeing the full code. But, feel free to check out the GitHub repository. I've been making changes and updates as time goes by to make it better. github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
I've changed this script to scrape the usernames of all the followers of any account now the issue is after getting few thousand followers if you go to task manager and see the chrome tab starts using alot of Memory because of continuous scrolling and after sometime i get memory error in the web page. Is it possible to clean the memory after few iterations because the sole reason looks like it is because of so many scrolling.
hi! i want to share with you an error in which i occurred. when i try to change the cards like cards[7], this message appears: " Message: stale element reference: element is not attached to the page document". How i can solve this?
Whenever I do cards = driver.find_elements_by_xpath('//div[@data-testid="tweet"]') it results in an empty list. I've tried other xpaths but no luck getting the tweets.
it's hard to say without seeing the full code. However, you're more than welcome to check out the code on GitHub: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
Hey, its very helpful but there is a problem in send_keys() in chrome web-driver. Seems like it is not sending keys at required fields. Kindly help. Thanks
Hi lzzy, when I run the code 'card = cards[0]', it turned out that 'IndexError: list index out of range', I am so confused, can you please help me out? thanks very much
Hello Izzy. Your tutorial is amazing, thank you. the scrapping is done and i've saved the data to csv, but when i open it there is no data captured. please let me know if you know the solution
Hello Fauzan, I struggled with the same thing. I had a look at the HTML and found a couple of things that hopefully will solve your problem: 1. The "collect_all_tweets_from_current_view" function needs to be updated. The page_card element is no longer rendered as a element. You would need to change it to an element do comply with Twitter's latest changes. previously: page_cards = driver.find_elements_by_xpath('//div[@data-testid="tweet"]') updated: page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]') 2. The "extract_data_from_current_tweet_card" function might need to be updated for the _comment and _responding variable. The HTML has changed a bit since Izzy updated his github. Try doing the following changes: previously: _comment = card.find_element_by_xpath('.//div[2]/div[2]/div[1]').text _responding = card.find_element_by_xpath('.//div[2]/div[2]/div[2]')).text updated: _comment = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]').text _responding = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[2]').text Try to apply these changes, and hopefully it will start adding data to your csv (at least it did for me).
@@lars-magnusunderhaug2576 hey can you send me latest code for comment and responding? its not working _comment = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]').text _responding = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[2]').text
After we store all the tweets as LIST in 'card' variable and then call the first element, it's showing an error that "List index is out of bounds". How to solve this?
Thanks bro! I just had to make some changes due to actualizations in methods of libraries and adecuate it to Chrome. I couldn't make the loop stop but by interrupt et still worked. Thanks again !
I ran into some trouble with ' There was unusual login activity on your account. ' . How can I avoid it? I tried a try except statement where I first try email and if it fails, I want to use username. However it does not work because it says it cannot find driver.find_element_by_xpath('//input[@aria-label="Search query"]')
Hey all, if you want to do some advanced filtering... check out the patterns from twitter.com/search-advanced As an example, if I include the following text in the search box, I can scrape all tweets that include the keyword "web scraping" with the hashtag #Python, with a minimum of 5 likes, and between a specified date range: web scraping (#Python) min_faves:5 until:2020-09-30 since:2020-08-01
Hello, thank you for uploading this tutorial. However, I wanted to find out how you deal with staleElementReference Exception because I am trying to scrape tweets. The code works fine but since I am scraping a lot of tweets, at some point the bot breaks and returns the staleElementReference exception. Can you please help me
Hi, I have a problem when I run the script. It shows this error: NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//span"} (Session info: MicrosoftEdge=87.0.664.47) Do you know why this happens? Thank you, and great video!
It's hard to say without seeing your code. Have you tried running the code from GitHub? github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
I had a similar problem. After adding the code below, the problem was solved. Thanks for the sharing the code and excellent video @Izzy Analytics ---> driver.maximize_window()
You can, kind of. If I remember correctly, the emoji is actually an image. However, the file name of that image is the character Unicode. So, all you need to do is scrape the file name.
Izzy I am facing an issue when I run this code its work fine but only 40 to 50 tweet scrap i want 2 to 3K tweets for my research work can you please help me out in this regarding
@@izzyanalytics4145 yes i tried the same code but only few tweets scraped for example if i want to scrap @realdonaldtrump tweets its around 53.4k but this code scraps only 100 or 150 can you please shars your email id i will share both file with you code and csv file of scraped tweets
You can use Selenium headless by adding the '--headless' and '--disable-gpu' arguments to options. However, Beautiful Soup can't render the dynamic pages. You could parse the html with beautiful soup on the driver.page_source. But, since you can already do that with Selenium, it's just an unnecessary step. I do like to use BeautifulSoup when I can.
@@izzyanalytics4145 ya! I want to scrape the Profile info, everyone in each the post, page, and group, I want to see the behavior of each people, so I can analyze and filter data to use several useful cases! sorry for bad English!
hi, thanks for the video! i want to scrape 100 data, then i tried to change this line for card in page_cards[-15:]: to for card in page_cards[-100:]: it only collect 10 tweets. do you know how to solve it?
Hey, thankyou for the tutorial, unfortunately selenium skips some tweets, I haven't been able to find why and it doesn't give me any errors, tweets are missing even before making the "cards" array.
One adjustment I would recommend making is to set a default value for all the items in the tweet instead of returning. This will let you see what is getting missed. I know there are a few reasons this can happen. For example: sponsored or promoted content (doesn't have a post date); some tweets have content disclaimers that hide the tweet so this gets dropped too. You can also change the number of tweets in the lookback... I think I'm currently looking back to 15, but you could increase that number.
Hi, i've been watching this over and over and run in to all the problems in the comments and slowly solving them one by one... I don't intend to scrape all the page and all the tweets, some tweets at random would do just fine for me, but i ran into this problem: at the end of the script the csv file results empty... I even added a print(len(data)) to double check and it returns "0"... I'm running the code in pycharm, does that have something to do? (I'm using a modified version of the github so it doesn't scrape all the tweets) Really good video, i have little clue of python in general but i learned a lot from following this! Thank you! :)
I'd start further back... you want to make sure that the tweets are actually getting collected first. Try printing out the records that you are saving in the tweets list to see if they are getting that far.
Hey Izzy, I am unable to login using your code. In fact I am not able to login to any of my accounts in that particular window be it Gmail or Twitter using the manual method. Is there something wrong with the browser that Selenium opens up?
I don't see anything obvious. But, you might be able to create a new url pattern by using the Advanced search options and then seeing how Twitter builds the url: twitter.com/search-advanced?lang=en
It should be scraping more than that. If not, try checking out the GitHub version I try to keep updated. github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
hi! this is so helpful :) just have a question I'm having trouble finding solutions to online - when I run the last function, I get an error saying 'unhashable type: 'set''. I have been trying other mutable objects but many do not have a .add method. Any insights would be so appreciated. Thanks :)
If you need something mutable, you can always covert the set to a list with the 'list()' function. Here's the project site with updated code. github.com/israel-dryer/Twitter-Scraper
@@izzyanalytics4145 thanks so much! its up and working pretty well - any suggestions to optimize tweets scraped? I increased the number seconds to scroll from 0.5 to 3 and then 10 sec. Every time it doubles the data scraped. Any other insights would be greatly appreciated :)
Hey Izzy, great content and the scraper is working fine on my end except it skips few tweets. I am using the advanced search to scrape historical tweet data. Can you please recommend what changes to the code I can make to ensure higher accuracy? (time is not a constraint so I am okay with wait time). Again, thanks a lot!
Hey there, very helpful and many thanks! I do not fetch all cards while automatically scrolling if when increasing sleep time to allow for longer loading. Do you have any recommendation how this might be improved?
Hello, I am using parts of your tutorial yet the find_element_by_xpath to find the tweet is returning an error. Any chance you could help? My code: from selenium import webdriver link = [link which i'm trying to access] def open_browser(link): driver = webdriver.Chrome(executable_path="/Users/[my name]/Downloads/chromedriver 4") driver.get(link) return driver driver = open_browser(link) card = driver.find_element_by_xpath('//div[@data-testid="tweet"]') The error: selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@data-testid="tweet"]"} Thank you :)
Correct, the singular will return a single item. I believe I'm using the plural "elements" in this section, which will return a list, if you're talking about 8:01
Not exactly sure without seeing you code. But I would check to make sure you are using single and double quotes in the right places. single quotes should be used on the outside of the xpath block, and double quotes on the inside. : github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@@izzyanalytics4145 When I search a hashtag there are thousands of results under that, but when I try to scrape it, I only get the first 6 or 7. Is there a way to get all the results? I tried infinite scrolling and then scraping the entire page, but didnt work. Thank you!!!!!
Not sure without looking at the code. Try using this script and see if you get better results: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
This is really interesting. Thanks for this video. Just I want you to explain me if I want to scrape data from specific location that is not near me, also specific time in the past. for example within a time period in 2020. Also I want to extract tweets contains specific words these words are not English. Please give me some tips?
yes, it looks like the link is contained in the `a` tag that contains the tweet timestamp. So, start with the timestamp, get the parent, then get the href from the parent.
Hey Izzy. This has been amazing. Thank you for this. I run into one problem consistently: about every 9 or 10 tweets, the scraper will fail to pick up about 3 tweets, then pick up and get the next 9 or 10 tweets, then miss the next 3 or so. I tried playing with the scroll script, adding some sleep time in between, but none of it makes a difference. I think it may be a subtle change in the html every few tweets? do you have any thoughts? trouble shooting tips, etc. Thank you again dude!
Have you tried running the script from GitHub? I made some tweaks since originally posting. You could increase the number of tweets that are in the reviewed in each scroll.
I'm not quite sure what you mean. You can checkout the code and compare to see what may be the issue. I've also added updates since I originally published this. github.com/israel-dryer/Twitter-Scraper
Holy shit this is good clear content. Rare on RUclips. Keep posting stuff because this is what we need, not more rehashed shit content anwyway izzy my problem is I'm not grabbing all the tweets. I'm skipping over large numbers of them, and I don't know why because I'm identifying them by data-testid = tweet and so if I get some, shouldn't I be getting them all? I'm missing more than half sometimes, and it misses over the same ones on everypage...Wierd...
I made some changes today that should help. Unfortunately, I wasn't able to test some of the changes because I hit the limit. So I'll have to wait tomorrow. Check out the ".py" file. If it's not working now, it will work as soon as I'm able to get back on and test it. github.com/israel-dryer/Twitter-Scraper
@@izzyanalytics4145 Thanks bro I'm getting a higher success of scraping. I'm using it to pull dirt on racists at work and ruin their lives Izzy. You are doing god's work my man!
Hey bro absolutely great video and it works perfectly for me.. keep it up!! I'm just having one small issue, i'm gathering rather old tweets and i just experienced that the code stops working if there's a tweet, which is unavailable but still shown in the twitter timeline. Any idea on how to skip these?
I've updates to the code over time... I'd check out the latest version; I think it may handle some of these issues; github.com/israel-dryer/Twitter-Scraper
Thank you for the Tutorial :) It helps me a lot for my Bachelors-Thesis! Speaking of: how should I cite you in my sources? Is your github page sufficient?
@@izzyanalytics4145 Hey :) How come you don't run into issues with javascript? Because I have a list of tweets I´d like to scan for the content of their links and everything I try keeps throwing errors
@@izzyanalytics4145 That is very kind! I am amazed at how quick you are to reply and how much you care about the community! But I actually found my solution just now :) I took another look at your github code and found out, that I just didn't wait long enough for the page to be fully loaded. Because of that javascript obstructed the interaction
I'm not sure if I used it in that code, but you can use the Wait object to wait for certain conditions before proceeding. This makes the scraper a lot more safe.
If I were to search a very common term that would basically be an infinite scroll, is there a way to limit the number tweets gathered or cut it off once you reach a certain date of the tweet? Also thanks so much, this is a really great tutorial!
You could add a piece of code in a while loop that checks for the number of tweets gather, and breaks when that number is achieved. I've got a bit of code online (linked below) that improves the scrolling functionality and breaks out when the scrolling is done... however, you could easily add a check in the "while scrolling" block. As far as dates. there is actually an advanced search with twitter. Go to the url, add a few parameters and then see what it does to the URL. You can then use this pattern to filter by date range, tags, and other parameters. Twitter Advanced Search: twitter.com/search-advanced?lang=en GitHub Code: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
Thanks a lot Izzy! This tutorial really helped me a lot!! But I am struggling with some issues :( I've noticed that some of the tweets were missing in the collected data. I have looked through the updated codes you've uploaded in GitHub, but still couldn't solve the problem. I think something went wrong with scrolling down. Looking at the monitor, there were times it scrolls down way too much, skipping some tweets. Is this an issue related to lookback limit? If so, how can I find the appropriate number for my computer? Thank you so much!!
Possibly. You would need to build in more substantial error handling and check out periodically. Webscrapers are inherently fragile because they rely on an external document structure that you have no control over.
Hi, your video helped me a lot. Thanks! Can you make a tutorial on how to get twitter account profile data please? To get when the account created, the following count, and the followers count. Thanks a bunch!
Yes. I'll post some code as soon as I get some time. In the meantime, how I would do this is build the profile URL from the user name scraped on the page. Then I would use the driver to navigate to that URL and scrape the profile info from the top of the page using the same methods I used in this video.
I saw in some youtube videos saying you need a twitter developer account to scrap Twitter data, can you explain the differences and limitations of yours and their method? Big thanks btw, you are exactly what I am looking for
You would need a developer account if you want to use the API. An API is a lot easier to use, but it's often limited in how much you can pull, or how far back it goes, and costs money for expanded access. You can check it out here to see if it's something that would work for your case. developer.twitter.com/en
@@izzyanalytics4145 Please let us know,If you are looking forward to code it and make it available on your github repo any soon,before the video comes. So that we don't need to wait till the video is out and will be happy to test it :)
Hey, I'm trying to replicate what you've done, and the code seems to be working fine. However, when I check the CSV file after scraping, it's empty. I've also attempted to use your code from GitHub, but I'm facing the same issue. Could you please help me figure out the reason behind this?
Hi @izzy_analytics. This code working well for infinite scrolling but when I try to scrape limited number of tweets like > 200 < it skip the tweets and reach very fast at the last page and stop scrolling and the file have 120-130 tweets while the account have 200 tweets. So plz help me, how I can fix this issue because I want to scrape many profile which contains very small amount of tweets. I used chrome web driver on windows 7, I think so this isn't a issue of this problem. Thanks in advance :)
@@izzyanalytics4145 the problem becomes same, when I search (at the rate)gregmaffei, the profile show that he made 380 tweets but when I try to scrape then the it have 224 tweets and every time the number of scraped tweets becomes change. Also my last code is also updated according to your github code but the result remains same. Kindly help me. Thanks.
One thing I tried to get around 280 was to change the slice in `for card in page_cards[-15:]` to 40. This looks back on more tweets... however, there could be several things going on and it will take time to properly diagnose... I know that it's getting to the end, because I can see results from the first tweets in the timeline. So, tweets are presumably getting lost between the beginning and the end. Some things that need to be looked into: It says there are 380 tweets, but is that how many that are really available? Is there something different on some cards that's causing it to to return empty from the `get tweet data` function? I'll keep you posted.
thanks for this tutorial, while i was trying to replicate, I had error for this line of code, please advice. password.send_keys(Keys.RETURN) #same action as clicking the login button AttributeError: 'NoneType' object has no attribute 'send_keys'
@@izzyanalytics4145 Message: stale element reference: element is not attached to the page document When i tried to do: card.find_element_by_xpath('.//span').text do you maybe know why? cheers
possibly the page might not have fully loaded. You could try adding a delay using the sleep function before that code. Sometimes I will also use implicitly wait at the beginning of my code. See these sourced for more info: (1) www.geeksforgeeks.org/implicitly_wait-driver-method-selenium-python/#:~:text=implicitly_wait%20driver%20method%20%E2%80%93%20Selenium%20Python.%20Selenium%E2%80%99s%20Python,API%20to%20write%20functional%2Facceptance%20tests%20using%20Selenium%20WebDriver. (2) sqa.stackexchange.com/questions/40942/stale-element-reference-element-is-not-attached-to-the-page-of-the-document
No more twint, tweetscrapper or tweepy.... this is "The way", you are a Legend!
Thanks!
This content is exactly what I’m looking for days; please add selenium to your vid title. Anyhow, excellent work. 👍
Glad to hear. Thanks for the feedback.
This is a very approachable way of teaching this project and hacking together a little demo. Just what I was looking for.
this is absolute beautiful. a little side note: if you are experienceing internet lags and unable to locate elements, use sleep(2) before you try to obtain elements. sleep (2) means the program will pause 2 seconds after it carry on. it gives time for buffer when your internet is slow.
You can checkout the updated code on GitHub. I've implemented the Wait object in several spots to wait for certain conditions before proceeding.
@@izzyanalytics4145 yes, saw that update, but i was having trouble locating the "latest" tab because my internet sucks lol
tell me a way to get image urls from the post
i want something to alert me when a specific twitter user started following another twitter user account.
Hello, thank you for the program, my key word is to capture the posts of a specific day ('lang:en until:2019-11-30 since:2019-11-29 -filter:links -filter:replies'), but Data return starts at 23:59 (Timestamp: 2019-11-29T23:59:59.000Z), is there a way to capture data during a specific time period, for example, from 12:00 noon to 1 pm of the day. ?
Here's the URL to the advanced search: twitter.com/search-advanced?lang=en
Can someone help me through a problem , i am using pycharm and when i run my code , it opens up a new window every single time and its really annoyion, is there any way i can use one window for multiple executions
hello when i execute the code that collects tweets the scrolling wont stop
I completed the whole thing but I get an error message ( Message: no such element: Unable to locate element: {"method":"xpath","selector":"//input[@aria-label="Search query"]"} ) for the search function ( input[@aria-label="Search query"] ).
It was doing the same thing prior to the final completion portion but I would just copy & paste the same thing & it worked.
Once it's all wrapped in one I can't do it; any ideas?
hard to say with seeing the full code. But, feel free to check out the GitHub repository. I've been making changes and updates as time goes by to make it better. github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
@@izzyanalytics4145 facing the same problem at that point. While checking the console log, it seems like a 401 response (Authorization Required)
Try adding sleep(), for me, it needed to load the page and because of that, it couldnt find the search query
now it doesn't work for me bcs it has another interface and username should be passed first and after clicking next there's a password window
Hey, it keeps giving an error:
Unable to locate element: {"method":"xpath","selector":"//input[@name="session[username_or_email]"]"}
I've updated this project since I originally published. This may help. github.com/israel-dryer/Twitter-Scraper
whenever i put it says list index out of range, can anyone help me why is this happening?
I've changed this script to scrape the usernames of all the followers of any account now the issue is after getting few thousand followers if you go to task manager and see the chrome tab starts using alot of Memory because of continuous scrolling and after sometime i get memory error in the web page. Is it possible to clean the memory after few iterations because the sole reason looks like it is because of so many scrolling.
bumped to an error saying "'WebElement' object is not subscriptable" while assigning cards[0] to card. any suggestions?
Usually that error occurs if you use 'find_element... ' instead of 'find_elements... '. One returns a list and the other returns a Web element.
Please make a scraping of wikipedia which can scrap tabular data and text data and using api is optional
I'll check into it. Thanks for the suggestion.
hi! i want to share with you an error in which i occurred. when i try to change the cards like cards[7], this message appears: " Message: stale element reference: element is not attached to the page document". How i can solve this?
I've updated since I originally published this. It may help. github.com/israel-dryer/Twitter-Scraper
Whenever I do cards = driver.find_elements_by_xpath('//div[@data-testid="tweet"]') it results in an empty list.
I've tried other xpaths but no luck getting the tweets.
it's hard to say without seeing the full code. However, you're more than welcome to check out the code on GitHub: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
same with me, have you found any solutions yet?
Add sleep(1) after you open twitter before you search for the user input
I also get the same error. Do you get any solutions? Thankyou
Hi @Izzy Analytics how can I scrape the number of followers and following of each tweet users?
Hey, its very helpful but there is a problem in send_keys() in chrome web-driver. Seems like it is not sending keys at required fields. Kindly help. Thanks
Hi lzzy, when I run the code 'card = cards[0]', it turned out that 'IndexError: list index out of range', I am so confused, can you please help me out?
thanks very much
man this is an awesome video, so straightforward forward and it has everything in it! kudos to your efforts!
happy analyzing
Hello Izzy. Your tutorial is amazing, thank you. the scrapping is done and i've saved the data to csv, but when i open it there is no data captured. please let me know if you know the solution
Hello Fauzan,
I struggled with the same thing. I had a look at the HTML and found a couple of things that hopefully will solve your problem:
1. The "collect_all_tweets_from_current_view" function needs to be updated. The page_card element is no longer rendered as a element. You would need to change it to an element do comply with Twitter's latest changes.
previously:
page_cards = driver.find_elements_by_xpath('//div[@data-testid="tweet"]')
updated:
page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')
2. The "extract_data_from_current_tweet_card" function might need to be updated for the _comment and _responding variable. The HTML has changed a bit since Izzy updated his github. Try doing the following changes:
previously:
_comment = card.find_element_by_xpath('.//div[2]/div[2]/div[1]').text
_responding = card.find_element_by_xpath('.//div[2]/div[2]/div[2]')).text
updated:
_comment = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]').text
_responding = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[2]').text
Try to apply these changes, and hopefully it will start adding data to your csv (at least it did for me).
@@lars-magnusunderhaug2576 Thank you so much. This really helps to solve my problem.
@@lars-magnusunderhaug2576 hey can you send me latest code for comment and responding? its not working
_comment = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]').text
_responding = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[2]').text
How to count followers?
After we store all the tweets as LIST in 'card' variable and then call the first element, it's showing an error that "List index is out of bounds". How to solve this?
having the same issue. Did you solve it?
why I always fail in login, when my password and username are correct?
Thanks bro! I just had to make some changes due to actualizations in methods of libraries and adecuate it to Chrome. I couldn't make the loop stop but by interrupt et still worked. Thanks again !
I'm having trouble with inputting username. It opens the browser but then doesn't input anything into the username field.
I've been making updates to the code. Try this one out: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
I ran into some trouble with ' There was unusual login activity on your account. ' . How can I avoid it? I tried a try except statement where I first try email and if it fails, I want to use username. However it does not work because it says it cannot find driver.find_element_by_xpath('//input[@aria-label="Search query"]')
You'll have to wait a day or two. You got flagged for logging in too many times. Happened to me too when I was experimenting, but it goes away.
Hey all, if you want to do some advanced filtering... check out the patterns from twitter.com/search-advanced As an example, if I include the following text in the search box, I can scrape all tweets that include the keyword "web scraping" with the hashtag #Python, with a minimum of 5 likes, and between a specified date range:
web scraping (#Python) min_faves:5 until:2020-09-30 since:2020-08-01
Hello, thank you for uploading this tutorial. However, I wanted to find out how you deal with staleElementReference Exception because I am trying to scrape tweets. The code works fine but since I am scraping a lot of tweets, at some point the bot breaks and returns the staleElementReference exception. Can you please help me
you can do all this using requests headless
Hi, I have a problem when I run the script. It shows this error: NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//span"}
(Session info: MicrosoftEdge=87.0.664.47)
Do you know why this happens? Thank you, and great video!
It's hard to say without seeing your code. Have you tried running the code from GitHub? github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
I had a similar problem. After adding the code below, the problem was solved. Thanks for the sharing the code and excellent video @Izzy Analytics ---> driver.maximize_window()
is there any way to retrieve emoji along with the body in the tweet?? pls help
You can, kind of. If I remember correctly, the emoji is actually an image. However, the file name of that image is the character Unicode. So, all you need to do is scrape the file name.
Izzy I am facing an issue when I run this code its work fine but only 40 to 50 tweet scrap i want 2 to 3K tweets for my research work can you please help me out in this regarding
have you tried running the script from GitHub and see if you get the same result? github.com/israel-dryer/Twitter-Scraper
@@izzyanalytics4145 yes i tried the same code but only few tweets scraped for example if i want to scrap @realdonaldtrump tweets its around 53.4k but this code scraps only 100 or 150 can you please shars your email id i will share both file with you code and csv file of scraped tweets
@@TechBankVideos israel.dryer@gmail.com
Thank you so much Izzy. Really helped me with a college project I was doing. Much appreciated
Question:
Why wouldn't you use BeautifulSoup headless rather then use a browser GUI?
You can use Selenium headless by adding the '--headless' and '--disable-gpu' arguments to options. However, Beautiful Soup can't render the dynamic pages. You could parse the html with beautiful soup on the driver.page_source. But, since you can already do that with Selenium, it's just an unnecessary step. I do like to use BeautifulSoup when I can.
Thank you so much, the video so clearly! Waiting for video Facebook scraping!
Glad it was helpful!
Just curious; if you could scrape data from Facebook, what would you scrape? Profile info? Profile info? Group participants?
@@izzyanalytics4145 ya! I want to scrape the Profile info, everyone in each the post, page, and group, I want to see the behavior of each people, so I can analyze and filter data to use several useful cases! sorry for bad English!
I want to scrap data from Facebook regarding covid19 comments. From whoever commenting about covid19. Is it possible to extract that data???
Thanks Izzy, this helps me a lot! But i have a question, how does this selenium get quote tweet, retweeted tweet, and reply tweets?
I've updated this since I originally published. This may help. github.com/israel-dryer/Twitter-Scraper
hi, thanks for the video! i want to scrape 100 data, then i tried to change this line
for card in page_cards[-15:]:
to
for card in page_cards[-100:]:
it only collect 10 tweets. do you know how to solve it?
you can compare your code with that on my GitHub site to make sure it is the same: github.com/israel-dryer/Twitter-Scraper
when i run the file in the cmd the function doesn't return the tweet info
Having you tried running the script on GitHub? github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
Hey, thankyou for the tutorial, unfortunately selenium skips some tweets, I haven't been able to find why and it doesn't give me any errors, tweets are missing even before making the "cards" array.
One adjustment I would recommend making is to set a default value for all the items in the tweet instead of returning. This will let you see what is getting missed. I know there are a few reasons this can happen. For example: sponsored or promoted content (doesn't have a post date); some tweets have content disclaimers that hide the tweet so this gets dropped too. You can also change the number of tweets in the lookback... I think I'm currently looking back to 15, but you could increase that number.
Hi, i've been watching this over and over and run in to all the problems in the comments and slowly solving them one by one... I don't intend to scrape all the page and all the tweets, some tweets at random would do just fine for me, but i ran into this problem: at the end of the script the csv file results empty... I even added a print(len(data)) to double check and it returns "0"... I'm running the code in pycharm, does that have something to do? (I'm using a modified version of the github so it doesn't scrape all the tweets)
Really good video, i have little clue of python in general but i learned a lot from following this! Thank you! :)
I'd start further back... you want to make sure that the tweets are actually getting collected first. Try printing out the records that you are saving in the tweets list to see if they are getting that far.
@@izzyanalytics4145 thank you! let's try that
Hey Izzy, I am unable to login using your code. In fact I am not able to login to any of my accounts in that particular window be it Gmail or Twitter using the manual method. Is there something wrong with the browser that Selenium opens up?
It's hard to say without seeing your setup, code, etc... Are you getting any error messages?
@@izzyanalytics4145 Now I am not. Your code helped me a lot for my college project. Thank you so much!!
Hi there,
Could you please tell how to scrape tweets from a particular region. For example tweets from let say India only.
I don't see anything obvious. But, you might be able to create a new url pattern by using the Advanced search options and then seeing how Twitter builds the url: twitter.com/search-advanced?lang=en
how can i scrape all tweets? not only the last 15
It should be scraping more than that. If not, try checking out the GitHub version I try to keep updated. github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
@@izzyanalytics4145 thank u
Do you have any idea if I can do a sentiment analysis with multilingual Tweets?
@@mariembenhamouda6403 try medium.com/analytics-vidhya/how-to-succeed-in-multilingual-sentiment-analysis-without-transformers-f1a98c76c30c
hi! this is so helpful :) just have a question I'm having trouble finding solutions to online - when I run the last function, I get an error saying 'unhashable type: 'set''. I have been trying other mutable objects but many do not have a .add method. Any insights would be so appreciated. Thanks :)
If you need something mutable, you can always covert the set to a list with the 'list()' function. Here's the project site with updated code. github.com/israel-dryer/Twitter-Scraper
@@izzyanalytics4145 thanks so much :)
@@izzyanalytics4145 thanks so much! its up and working pretty well - any suggestions to optimize tweets scraped? I increased the number seconds to scroll from 0.5 to 3 and then 10 sec. Every time it doubles the data scraped. Any other insights would be greatly appreciated :)
its only scrape 20-30 tweets... how can i scrape more tweets ?????
it's hard to say without seeing the code. You can checkout mine on GitHub: github.com/israel-dryer/Twitter-Scraper
@@izzyanalytics4145 Thanks for the reply, I refer your code, now its working... Thanks again...
@@funtooz9513 how can you solve this issue can you please guide me in this regard
@@abdulrahim-fq1oj by manage sleep time and scroll attempts
@@funtooz9513 can you please send me the confguration please
Can I scrappy thousands of followers through it? As twint doesn't work anymore.
You can attempt to scrape as many as you want. Here's an updated version if you're interested. github.com/israel-dryer/Twitter-Scraper
does this tutorial applicable in windows 7?
Yes. But instead of using the Edge web driver, you would probably want to install chromedriver or firefox's geckodriver
Hey Izzy, great content and the scraper is working fine on my end except it skips few tweets. I am using the advanced search to scrape historical tweet data. Can you please recommend what changes to the code I can make to ensure higher accuracy? (time is not a constraint so I am okay with wait time).
Again, thanks a lot!
I've updated the project on GitHub. You may find the update code works a bit better: github.com/israel-dryer/Twitter-Scraper
Hey there, very helpful and many thanks! I do not fetch all cards while automatically scrolling if when increasing sleep time to allow for longer loading. Do you have any recommendation how this might be improved?
I've been making updates to the script. Check this one out: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
@@izzyanalytics4145 Many thanks again!
Hello, I am using parts of your tutorial yet the find_element_by_xpath to find the tweet is returning an error. Any chance you could help?
My code:
from selenium import webdriver
link = [link which i'm trying to access]
def open_browser(link):
driver = webdriver.Chrome(executable_path="/Users/[my name]/Downloads/chromedriver 4")
driver.get(link)
return driver
driver = open_browser(link)
card = driver.find_element_by_xpath('//div[@data-testid="tweet"]')
The error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@data-testid="tweet"]"}
Thank you :)
find_element_by_xpath returns a single item, not a list. I guess it will produce an error at card=cards[0] or card=cards[7]
Correct, the singular will return a single item. I believe I'm using the plural "elements" in this section, which will return a list, if you're talking about 8:01
@@izzyanalytics4145 Thanks. You are right. My bad.
Uhh, did you mention that twitter does not display every tweet on timelines and searches? It is due to index limitations.
Thanks for making such a wonderful video. Could you please make a video for extracting reply contents too rather than just the counts.
Bro! Please comeback. You are awesome.
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//input[@aria-label="Search query"]"}
Not exactly sure without seeing you code. But I would check to make sure you are using single and double quotes in the right places. single quotes should be used on the outside of the xpath block, and double quotes on the inside. : github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@@izzyanalytics4145 Really appreciate your video!
I can only get a limited number of tweets. Is there a way to get all of them?
Thanks in advance
Is the scraper not scraping all of the results provided, or is your search not producing all of the results you expected?
@@izzyanalytics4145 When I search a hashtag there are thousands of results under that, but when I try to scrape it, I only get the first 6 or 7. Is there a way to get all the results?
I tried infinite scrolling and then scraping the entire page, but didnt work.
Thank you!!!!!
Not sure without looking at the code. Try using this script and see if you get better results: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@@izzyanalytics4145 Thanks a lot. It worked.
@@Suwaniify Hi how you resolve this issue of limited tweets because i also want tweets in bulk please help me to resolve this issue thanks
This is really interesting. Thanks for this video. Just I want you to explain me if I want to scrape data from specific location that is not near me, also specific time in the past. for example within a time period in 2020. Also I want to extract tweets contains specific words these words are not English. Please give me some tips?
Is there any way to also get the tweet link?
yes, it looks like the link is contained in the `a` tag that contains the tweet timestamp. So, start with the timestamp, get the parent, then get the href from the parent.
Hi Izzy it's really great information, but is your method can extract email too?
Hey Izzy. This has been amazing. Thank you for this. I run into one problem consistently: about every 9 or 10 tweets, the scraper will fail to pick up about 3 tweets, then pick up and get the next 9 or 10 tweets, then miss the next 3 or so. I tried playing with the scroll script, adding some sleep time in between, but none of it makes a difference. I think it may be a subtle change in the html every few tweets? do you have any thoughts? trouble shooting tips, etc. Thank you again dude!
Have you tried running the script from GitHub? I made some tweaks since originally posting. You could increase the number of tweets that are in the reviewed in each scroll.
@@izzyanalytics4145 That worked! you are a lifesaver
Hi, love your video! I actually have a problem at the very end; it says 'Webdriver'. How can I fix this error?
I'm not quite sure what you mean. You can checkout the code and compare to see what may be the issue. I've also added updates since I originally published this. github.com/israel-dryer/Twitter-Scraper
Holy shit this is good clear content. Rare on RUclips. Keep posting stuff because this is what we need, not more rehashed shit content
anwyway izzy my problem is I'm not grabbing all the tweets. I'm skipping over large numbers of them, and I don't know why because I'm identifying them by data-testid = tweet and so if I get some, shouldn't I be getting them all?
I'm missing more than half sometimes, and it misses over the same ones on everypage...Wierd...
I made some changes today that should help. Unfortunately, I wasn't able to test some of the changes because I hit the limit. So I'll have to wait tomorrow. Check out the ".py" file. If it's not working now, it will work as soon as I'm able to get back on and test it. github.com/israel-dryer/Twitter-Scraper
@@izzyanalytics4145 Thanks bro I'm getting a higher success of scraping. I'm using it to pull dirt on racists at work and ruin their lives Izzy. You are doing god's work my man!
The best tutorial! Thanks!
Glad it helped!
Hey bro absolutely great video and it works perfectly for me.. keep it up!! I'm just having one small issue, i'm gathering rather old tweets and i just experienced that the code stops working if there's a tweet, which is unavailable but still shown in the twitter timeline. Any idea on how to skip these?
I've updates to the code over time... I'd check out the latest version; I think it may handle some of these issues; github.com/israel-dryer/Twitter-Scraper
Your tutorial is very good, keep it up!
Thanks, will do!
Thank you for the Tutorial :) It helps me a lot for my Bachelors-Thesis! Speaking of: how should I cite you in my sources? Is your github page sufficient?
Sure! Glad you found it helpful.
@@izzyanalytics4145 Hey :) How come you don't run into issues with javascript? Because I have a list of tweets I´d like to scan for the content of their links and everything I try keeps throwing errors
@@coolundbidda7611 i can take a look at your code if you wish israel.dryer@gmail.com
@@izzyanalytics4145 That is very kind! I am amazed at how quick you are to reply and how much you care about the community! But I actually found my solution just now :)
I took another look at your github code and found out, that I just didn't wait long enough for the page to be fully loaded. Because of that javascript obstructed the interaction
I'm not sure if I used it in that code, but you can use the Wait object to wait for certain conditions before proceeding. This makes the scraper a lot more safe.
Thanks a lot Izzy! Great help👍
Happy to help!
Thank you so much for this video. It helped me immensely.
If I were to search a very common term that would basically be an infinite scroll, is there a way to limit the number tweets gathered or cut it off once you reach a certain date of the tweet? Also thanks so much, this is a really great tutorial!
I'm also really interested on how we could set up a definite number of scrolling if possible! Thanks
You could add a piece of code in a while loop that checks for the number of tweets gather, and breaks when that number is achieved. I've got a bit of code online (linked below) that improves the scrolling functionality and breaks out when the scrolling is done... however, you could easily add a check in the "while scrolling" block. As far as dates. there is actually an advanced search with twitter. Go to the url, add a few parameters and then see what it does to the URL. You can then use this pattern to filter by date range, tags, and other parameters.
Twitter Advanced Search: twitter.com/search-advanced?lang=en
GitHub Code: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
See my reply to Olivia, and let me know if this works.
Dude. You are a beast. Thank you
Np. Glad it was helpful.
Thanks a lot man, you made my day ❤️❤️
Thanks a lot Izzy! This tutorial really helped me a lot!!
But I am struggling with some issues :(
I've noticed that some of the tweets were missing in the collected data. I have looked through the updated codes you've uploaded in GitHub, but still couldn't solve the problem.
I think something went wrong with scrolling down. Looking at the monitor, there were times it scrolls down way too much, skipping some tweets.
Is this an issue related to lookback limit? If so, how can I find the appropriate number for my computer?
Thank you so much!!
I wonder if screen size makes a difference in terms of how much you would need to scroll. Thoughts? Or have you solved for this already?
Can anyone help me please. I run into infinite loop
Very good my friend!
Thanks!
Thank you so much helped me a lot for my research project.
Glad it helped!
Can such bots be used in production systems, when we, for example, want to scrap tweets twice per day?
Possibly. You would need to build in more substantial error handling and check out periodically. Webscrapers are inherently fragile because they rely on an external document structure that you have no control over.
Can you make video on scraping google maps??
Interesting. What data would you scrape from Google Maps?
You are brilliant!!!
Thank you! Cheers!
I like it, good work.
Thanks!
Hi, your video helped me a lot. Thanks!
Can you make a tutorial on how to get twitter account profile data please?
To get when the account created, the following count, and the followers count.
Thanks a bunch!
Yes. I'll post some code as soon as I get some time. In the meantime, how I would do this is build the profile URL from the user name scraped on the page. Then I would use the driver to navigate to that URL and scrape the profile info from the top of the page using the same methods I used in this video.
Awesome tutorial! Thank you.
Thanks
I saw in some youtube videos saying you need a twitter developer account to scrap Twitter data, can you explain the differences and limitations of yours and their method?
Big thanks btw, you are exactly what I am looking for
You would need a developer account if you want to use the API. An API is a lot easier to use, but it's often limited in how much you can pull, or how far back it goes, and costs money for expanded access. You can check it out here to see if it's something that would work for your case. developer.twitter.com/en
Great video!
Very useful!
Thanks
fire tutorial!!
Just amazing!
bro please Make same video on how to scrape ebay And aliexpress please plesae i am waiting for your kind response
github.com/israel-dryer/Ebay-Scraper
@@izzyanalytics4145 thanks bro may you live a long and happy life
bro please Make video on how to scrape ebay And aliexpress please plesae i am waiting for your kind response
github.com/israel-dryer/Ebay-Scraper
@@izzyanalytics4145 thanks bro thankyou very much very much appreciated
@@izzyanalytics4145 thankyou thankyou thankyou
Awesome And great , thanks for your helpful video.
can you do a tutorial on , how to extract Instagram followers list,scrap all the followers of a user including their user profile to a .csv
good idea. I'll put it on my list.
@@izzyanalytics4145 May we know, when can we see it. :)
Hopefully end of Nov or beg of December. I haven't had any time to do videos lately because of work. 😔 But, I do have lot of items on the list.
@@izzyanalytics4145 Please let us know,If you are looking forward to code it and make it available on your github repo any soon,before the video comes. So that we don't need to wait till the video is out and will be happy to test it :)
thank you, your video help me so much
Glad to hear it!
Hey, I'm trying to replicate what you've done, and the code seems to be working fine. However, when I check the CSV file after scraping, it's empty. I've also attempted to use your code from GitHub, but I'm facing the same issue. Could you please help me figure out the reason behind this?
do you solve it sucessfully, It seems that i meet the same qustion
Hi @izzy_analytics.
This code working well for infinite scrolling but when I try to scrape limited number of tweets like > 200 < it skip the tweets and reach very fast at the last page and stop scrolling and the file have 120-130 tweets while the account have 200 tweets. So plz help me, how I can fix this issue because I want to scrape many profile which contains very small amount of tweets.
I used chrome web driver on windows 7, I think so this isn't a issue of this problem.
Thanks in advance :)
Hi, try using the script I've posted on GitHub. I've added a few more things that might be helpful: github.com/israel-dryer/Twitter-Scraper
Let me know if this is still an issue after you try my code
@@izzyanalytics4145 the problem becomes same, when I search (at the rate)gregmaffei, the profile show that he made 380 tweets but when I try to scrape then the it have 224 tweets and every time the number of scraped tweets becomes change.
Also my last code is also updated according to your github code but the result remains same. Kindly help me.
Thanks.
@izzy analytics Do you check that issue?? I''ve to complete my analysis within few days kindly help me.
Thanks.
One thing I tried to get around 280 was to change the slice in `for card in page_cards[-15:]` to 40. This looks back on more tweets... however, there could be several things going on and it will take time to properly diagnose... I know that it's getting to the end, because I can see results from the first tweets in the timeline. So, tweets are presumably getting lost between the beginning and the end. Some things that need to be looked into: It says there are 380 tweets, but is that how many that are really available? Is there something different on some cards that's causing it to to return empty from the `get tweet data` function? I'll keep you posted.
thanks for this tutorial, while i was trying to replicate, I had error for this line of code, please advice.
password.send_keys(Keys.RETURN) #same action as clicking the login button
AttributeError: 'NoneType' object has no attribute 'send_keys'
Can you just log in to Twitter and skip that part?
It looks like it's not finding the password field. Double check the code for identifying the password element.
@@izzyanalytics4145 thanks , I would check
@@hkchan1339 thanks chan...i tried that part and it did work
Hey Izzy How are you Can You make web scrapper for (Ali Express/Ali baba) And also Make Video on ali express i will be very thankful to you
13:33
Did you steal the video from TechWithTim
Thanks for this video
Most welcome
good work
Thanks
@@izzyanalytics4145 Message: stale element reference: element is not attached to the page document
When i tried to do: card.find_element_by_xpath('.//span').text
do you maybe know why? cheers
possibly the page might not have fully loaded. You could try adding a delay using the sleep function before that code. Sometimes I will also use implicitly wait at the beginning of my code. See these sourced for more info:
(1) www.geeksforgeeks.org/implicitly_wait-driver-method-selenium-python/#:~:text=implicitly_wait%20driver%20method%20%E2%80%93%20Selenium%20Python.%20Selenium%E2%80%99s%20Python,API%20to%20write%20functional%2Facceptance%20tests%20using%20Selenium%20WebDriver.
(2) sqa.stackexchange.com/questions/40942/stale-element-reference-element-is-not-attached-to-the-page-of-the-document