Twitter Scraper Python Tutorial

Izzy Analytics

Просмотров 85 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 23 янв 2025

Комментарии • 254

@rgm2754 3 года назад ⁺¹⁶
No more twint, tweetscrapper or tweepy.... this is "The way", you are a Legend!
@izzyanalytics4145 3 года назад ⁺¹
Thanks!
@tamoorkhan3262 4 года назад ⁺¹¹
This content is exactly what I’m looking for days; please add selenium to your vid title. Anyhow, excellent work. 👍
@izzyanalytics4145 4 года назад ⁺²
Glad to hear. Thanks for the feedback.
@sabrinapyles563 3 года назад ⁺⁴
This is a very approachable way of teaching this project and hacking together a little demo. Just what I was looking for.
@kevinshen3221 4 года назад ⁺²
this is absolute beautiful. a little side note: if you are experienceing internet lags and unable to locate elements, use sleep(2) before you try to obtain elements. sleep (2) means the program will pause 2 seconds after it carry on. it gives time for buffer when your internet is slow.
@izzyanalytics4145 4 года назад ⁺¹
You can checkout the updated code on GitHub. I've implemented the Wait object in several spots to wait for certain conditions before proceeding.
@kevinshen3221 4 года назад
@@izzyanalytics4145 yes, saw that update, but i was having trouble locating the "latest" tab because my internet sucks lol
@Populartiktox 3 года назад
tell me a way to get image urls from the post
@milo-qh7cv 2 года назад ⁺¹
i want something to alert me when a specific twitter user started following another twitter user account.
@hui-juanjiang7519 4 года назад
Hello, thank you for the program, my key word is to capture the posts of a specific day ('lang:en until:2019-11-30 since:2019-11-29 -filter:links -filter:replies'), but Data return starts at 23:59 (Timestamp: 2019-11-29T23:59:59.000Z), is there a way to capture data during a specific time period, for example, from 12:00 noon to 1 pm of the day. ?
@izzyanalytics4145 4 года назад
Here's the URL to the advanced search: twitter.com/search-advanced?lang=en
@Merunix 3 года назад
Can someone help me through a problem , i am using pycharm and when i run my code , it opens up a new window every single time and its really annoyion, is there any way i can use one window for multiple executions
@seadra5039 Год назад
hello when i execute the code that collects tweets the scrolling wont stop
@rand5858 4 года назад ⁺¹
I completed the whole thing but I get an error message ( Message: no such element: Unable to locate element: {"method":"xpath","selector":"//input[@aria-label="Search query"]"} ) for the search function ( input[@aria-label="Search query"] ).
It was doing the same thing prior to the final completion portion but I would just copy & paste the same thing & it worked.
Once it's all wrapped in one I can't do it; any ideas?
@izzyanalytics4145 4 года назад ⁺¹
hard to say with seeing the full code. But, feel free to check out the GitHub repository. I've been making changes and updates as time goes by to make it better. github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
@EK-wd6yc 4 года назад
@@izzyanalytics4145 facing the same problem at that point. While checking the console log, it seems like a 401 response (Authorization Required)
@emreokyay2870 4 года назад
Try adding sleep(), for me, it needed to load the page and because of that, it couldnt find the search query
@vasoochigava5213 2 года назад
now it doesn't work for me bcs it has another interface and username should be passed first and after clicking next there's a password window
@betonitcso 3 года назад ⁺¹
Hey, it keeps giving an error:
Unable to locate element: {"method":"xpath","selector":"//input[@name="session[username_or_email]"]"}
@izzyanalytics4145 3 года назад
I've updated this project since I originally published. This may help. github.com/israel-dryer/Twitter-Scraper
@carried7231 3 года назад
whenever i put it says list index out of range, can anyone help me why is this happening?
@ai.201 2 года назад
I've changed this script to scrape the usernames of all the followers of any account now the issue is after getting few thousand followers if you go to task manager and see the chrome tab starts using alot of Memory because of continuous scrolling and after sometime i get memory error in the web page. Is it possible to clean the memory after few iterations because the sole reason looks like it is because of so many scrolling.
@wayneaguero9941 4 года назад ⁺¹
bumped to an error saying "'WebElement' object is not subscriptable" while assigning cards[0] to card. any suggestions?
@izzyanalytics4145 4 года назад ⁺¹
Usually that error occurs if you use 'find_element... ' instead of 'find_elements... '. One returns a list and the other returns a Web element.
@charansingh-bc1qi 4 года назад ⁺²
Please make a scraping of wikipedia which can scrap tabular data and text data and using api is optional
@izzyanalytics4145 4 года назад ⁺¹
I'll check into it. Thanks for the suggestion.
@danielequattrocchi2090 3 года назад ⁺¹
hi! i want to share with you an error in which i occurred. when i try to change the cards like cards[7], this message appears: " Message: stale element reference: element is not attached to the page document". How i can solve this?
@izzyanalytics4145 3 года назад
I've updated since I originally published this. It may help. github.com/israel-dryer/Twitter-Scraper
@micklen12 4 года назад ⁺¹
Whenever I do cards = driver.find_elements_by_xpath('//div[@data-testid="tweet"]') it results in an empty list.
I've tried other xpaths but no luck getting the tweets.
@izzyanalytics4145 4 года назад ⁺¹
it's hard to say without seeing the full code. However, you're more than welcome to check out the code on GitHub: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@jealouscase3634 4 года назад
same with me, have you found any solutions yet?
@adrianballard6334 3 года назад
Add sleep(1) after you open twitter before you search for the user input
@dianina6435 3 года назад
I also get the same error. Do you get any solutions? Thankyou
@jasurbekfayziev7850 3 года назад
Hi @Izzy Analytics how can I scrape the number of followers and following of each tweet users?
@isaacopherullah1347 3 года назад
Hey, its very helpful but there is a problem in send_keys() in chrome web-driver. Seems like it is not sending keys at required fields. Kindly help. Thanks
@jifengzhu1740 2 года назад
Hi lzzy, when I run the code 'card = cards[0]', it turned out that 'IndexError: list index out of range', I am so confused, can you please help me out?
thanks very much
@samid848 3 года назад
man this is an awesome video, so straightforward forward and it has everything in it! kudos to your efforts!
happy analyzing
@fauzanluthfi6348 3 года назад ⁺³
Hello Izzy. Your tutorial is amazing, thank you. the scrapping is done and i've saved the data to csv, but when i open it there is no data captured. please let me know if you know the solution
@lars-magnusunderhaug2576 3 года назад ⁺³
Hello Fauzan,
I struggled with the same thing. I had a look at the HTML and found a couple of things that hopefully will solve your problem:
1. The "collect_all_tweets_from_current_view" function needs to be updated. The page_card element is no longer rendered as a element. You would need to change it to an element do comply with Twitter's latest changes.
previously:
page_cards = driver.find_elements_by_xpath('//div[@data-testid="tweet"]')
updated:
page_cards = driver.find_elements_by_xpath('//article[@data-testid="tweet"]')
2. The "extract_data_from_current_tweet_card" function might need to be updated for the _comment and _responding variable. The HTML has changed a bit since Izzy updated his github. Try doing the following changes:
previously:
_comment = card.find_element_by_xpath('.//div[2]/div[2]/div[1]').text
_responding = card.find_element_by_xpath('.//div[2]/div[2]/div[2]')).text
updated:
_comment = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]').text
_responding = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[2]').text
Try to apply these changes, and hopefully it will start adding data to your csv (at least it did for me).
@dianina6435 3 года назад
@@lars-magnusunderhaug2576 Thank you so much. This really helps to solve my problem.
@shumyyla_ Год назад
@@lars-magnusunderhaug2576 hey can you send me latest code for comment and responding? its not working
_comment = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[1]').text
_responding = card.find_element_by_xpath('.//div[1]/div[1]/div[1]/div[2]/div[2]/div[2]/div[2]').text
@AhmedRaza-kp8io 2 года назад
How to count followers?
@nipunjain8767 3 года назад
After we store all the tweets as LIST in 'card' variable and then call the first element, it's showing an error that "List index is out of bounds". How to solve this?
@Sundaywave 3 года назад
having the same issue. Did you solve it?
@orvilanggyzofar4293 3 года назад
why I always fail in login, when my password and username are correct?
@CKawamon 2 года назад
Thanks bro! I just had to make some changes due to actualizations in methods of libraries and adecuate it to Chrome. I couldn't make the loop stop but by interrupt et still worked. Thanks again !
@brandts7983 4 года назад
I'm having trouble with inputting username. It opens the browser but then doesn't input anything into the username field.
@izzyanalytics4145 4 года назад
I've been making updates to the code. Try this one out: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
@nielshoogeveen3767 3 года назад
I ran into some trouble with ' There was unusual login activity on your account. ' . How can I avoid it? I tried a try except statement where I first try email and if it fails, I want to use username. However it does not work because it says it cannot find driver.find_element_by_xpath('//input[@aria-label="Search query"]')
@izzyanalytics4145 3 года назад ⁺¹
You'll have to wait a day or two. You got flagged for logging in too many times. Happened to me too when I was experimenting, but it goes away.
@izzyanalytics4145 4 года назад ⁺¹¹
Hey all, if you want to do some advanced filtering... check out the patterns from twitter.com/search-advanced As an example, if I include the following text in the search box, I can scrape all tweets that include the keyword "web scraping" with the hashtag #Python, with a minimum of 5 likes, and between a specified date range:
web scraping (#Python) min_faves:5 until:2020-09-30 since:2020-08-01
@akosuadebrah63 3 года назад
Hello, thank you for uploading this tutorial. However, I wanted to find out how you deal with staleElementReference Exception because I am trying to scrape tweets. The code works fine but since I am scraping a lot of tweets, at some point the bot breaks and returns the staleElementReference exception. Can you please help me
@couldbejake 3 года назад
you can do all this using requests headless
@josephmartinez1267 4 года назад ⁺¹
Hi, I have a problem when I run the script. It shows this error: NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":".//span"}
(Session info: MicrosoftEdge=87.0.664.47)
Do you know why this happens? Thank you, and great video!
@izzyanalytics4145 4 года назад ⁺²
It's hard to say without seeing your code. Have you tried running the code from GitHub? github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@memosay1651 4 года назад
I had a similar problem. After adding the code below, the problem was solved. Thanks for the sharing the code and excellent video @Izzy Analytics ---> driver.maximize_window()
@debris_69 3 года назад
is there any way to retrieve emoji along with the body in the tweet?? pls help
@izzyanalytics4145 3 года назад
You can, kind of. If I remember correctly, the emoji is actually an image. However, the file name of that image is the character Unicode. So, all you need to do is scrape the file name.
@abdulrahim-fq1oj 4 года назад
Izzy I am facing an issue when I run this code its work fine but only 40 to 50 tweet scrap i want 2 to 3K tweets for my research work can you please help me out in this regarding
@izzyanalytics4145 4 года назад ⁺¹
have you tried running the script from GitHub and see if you get the same result? github.com/israel-dryer/Twitter-Scraper
@TechBankVideos 4 года назад
@@izzyanalytics4145 yes i tried the same code but only few tweets scraped for example if i want to scrap @realdonaldtrump tweets its around 53.4k but this code scraps only 100 or 150 can you please shars your email id i will share both file with you code and csv file of scraped tweets
@izzyanalytics4145 4 года назад
@@TechBankVideos israel.dryer@gmail.com
@mishahappy1990 3 года назад
Thank you so much Izzy. Really helped me with a college project I was doing. Much appreciated
@Quantum_Nebula 4 года назад ⁺¹
Question:
Why wouldn't you use BeautifulSoup headless rather then use a browser GUI?
@izzyanalytics4145 4 года назад ⁺⁴
You can use Selenium headless by adding the '--headless' and '--disable-gpu' arguments to options. However, Beautiful Soup can't render the dynamic pages. You could parse the html with beautiful soup on the driver.page_source. But, since you can already do that with Selenium, it's just an unnecessary step. I do like to use BeautifulSoup when I can.
@HuyNguyen-xp4ro 4 года назад ⁺²
Thank you so much, the video so clearly! Waiting for video Facebook scraping!
@izzyanalytics4145 4 года назад ⁺²
Glad it was helpful!
@izzyanalytics4145 4 года назад
Just curious; if you could scrape data from Facebook, what would you scrape? Profile info? Profile info? Group participants?
@HuyNguyen-xp4ro 4 года назад
@@izzyanalytics4145 ya! I want to scrape the Profile info, everyone in each the post, page, and group, I want to see the behavior of each people, so I can analyze and filter data to use several useful cases! sorry for bad English!
@nikhithasagarreddy 4 года назад
I want to scrap data from Facebook regarding covid19 comments. From whoever commenting about covid19. Is it possible to extract that data???
@razakhergaadhannu3031 3 года назад ⁺¹
Thanks Izzy, this helps me a lot! But i have a question, how does this selenium get quote tweet, retweeted tweet, and reply tweets?
@izzyanalytics4145 3 года назад
I've updated this since I originally published. This may help. github.com/israel-dryer/Twitter-Scraper
@riscanaquitasia1199 3 года назад
hi, thanks for the video! i want to scrape 100 data, then i tried to change this line
for card in page_cards[-15:]:
to
for card in page_cards[-100:]:
it only collect 10 tweets. do you know how to solve it?
@izzyanalytics4145 3 года назад
you can compare your code with that on my GitHub site to make sure it is the same: github.com/israel-dryer/Twitter-Scraper
@GameingFist 4 года назад
when i run the file in the cmd the function doesn't return the tweet info
@izzyanalytics4145 4 года назад
Having you tried running the script on GitHub? github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@ZedGT96 4 года назад ⁺¹
Hey, thankyou for the tutorial, unfortunately selenium skips some tweets, I haven't been able to find why and it doesn't give me any errors, tweets are missing even before making the "cards" array.
@izzyanalytics4145 4 года назад
One adjustment I would recommend making is to set a default value for all the items in the tweet instead of returning. This will let you see what is getting missed. I know there are a few reasons this can happen. For example: sponsored or promoted content (doesn't have a post date); some tweets have content disclaimers that hide the tweet so this gets dropped too. You can also change the number of tweets in the lookback... I think I'm currently looking back to 15, but you could increase that number.
@huoket 3 года назад
Hi, i've been watching this over and over and run in to all the problems in the comments and slowly solving them one by one... I don't intend to scrape all the page and all the tweets, some tweets at random would do just fine for me, but i ran into this problem: at the end of the script the csv file results empty... I even added a print(len(data)) to double check and it returns "0"... I'm running the code in pycharm, does that have something to do? (I'm using a modified version of the github so it doesn't scrape all the tweets)
Really good video, i have little clue of python in general but i learned a lot from following this! Thank you! :)
@izzyanalytics4145 3 года назад ⁺¹
I'd start further back... you want to make sure that the tweets are actually getting collected first. Try printing out the records that you are saving in the tweets list to see if they are getting that far.
@huoket 3 года назад
@@izzyanalytics4145 thank you! let's try that
@anuragmandal867 4 года назад
Hey Izzy, I am unable to login using your code. In fact I am not able to login to any of my accounts in that particular window be it Gmail or Twitter using the manual method. Is there something wrong with the browser that Selenium opens up?
@izzyanalytics4145 4 года назад
It's hard to say without seeing your setup, code, etc... Are you getting any error messages?
@anuragmandal867 4 года назад ⁺¹
@@izzyanalytics4145 Now I am not. Your code helped me a lot for my college project. Thank you so much!!
@kanishkrana9426 4 года назад
Hi there,
Could you please tell how to scrape tweets from a particular region. For example tweets from let say India only.
@izzyanalytics4145 4 года назад
I don't see anything obvious. But, you might be able to create a new url pattern by using the Advanced search options and then seeing how Twitter builds the url: twitter.com/search-advanced?lang=en
@mariembenhamouda6403 4 года назад
how can i scrape all tweets? not only the last 15
@izzyanalytics4145 4 года назад ⁺¹
It should be scraping more than that. If not, try checking out the GitHub version I try to keep updated. github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
@mariembenhamouda6403 4 года назад
@@izzyanalytics4145 thank u
Do you have any idea if I can do a sentiment analysis with multilingual Tweets?
@izzyanalytics4145 4 года назад
@@mariembenhamouda6403 try medium.com/analytics-vidhya/how-to-succeed-in-multilingual-sentiment-analysis-without-transformers-f1a98c76c30c
@brennadrummond3161 3 года назад
hi! this is so helpful :) just have a question I'm having trouble finding solutions to online - when I run the last function, I get an error saying 'unhashable type: 'set''. I have been trying other mutable objects but many do not have a .add method. Any insights would be so appreciated. Thanks :)
@izzyanalytics4145 3 года назад
If you need something mutable, you can always covert the set to a list with the 'list()' function. Here's the project site with updated code. github.com/israel-dryer/Twitter-Scraper
@brennadrummond3161 3 года назад
@@izzyanalytics4145 thanks so much :)
@brennadrummond3161 3 года назад
@@izzyanalytics4145 thanks so much! its up and working pretty well - any suggestions to optimize tweets scraped? I increased the number seconds to scroll from 0.5 to 3 and then 10 sec. Every time it doubles the data scraped. Any other insights would be greatly appreciated :)
@funtooz9513 4 года назад ⁺¹
its only scrape 20-30 tweets... how can i scrape more tweets ?????
@izzyanalytics4145 4 года назад
it's hard to say without seeing the code. You can checkout mine on GitHub: github.com/israel-dryer/Twitter-Scraper
@funtooz9513 4 года назад ⁺¹
@@izzyanalytics4145 Thanks for the reply, I refer your code, now its working... Thanks again...
@abdulrahim-fq1oj 4 года назад
@@funtooz9513 how can you solve this issue can you please guide me in this regard
@funtooz9513 4 года назад
@@abdulrahim-fq1oj by manage sleep time and scroll attempts
@abdulrahim-fq1oj 4 года назад
@@funtooz9513 can you please send me the confguration please
@ai.201 3 года назад ⁺¹
Can I scrappy thousands of followers through it? As twint doesn't work anymore.
@izzyanalytics4145 3 года назад
You can attempt to scrape as many as you want. Here's an updated version if you're interested. github.com/israel-dryer/Twitter-Scraper
@sufyanhamid9560 4 года назад ⁺¹
does this tutorial applicable in windows 7?
@izzyanalytics4145 4 года назад ⁺¹
Yes. But instead of using the Edge web driver, you would probably want to install chromedriver or firefox's geckodriver
@kewalmishra7870 3 года назад ⁺³
Hey Izzy, great content and the scraper is working fine on my end except it skips few tweets. I am using the advanced search to scrape historical tweet data. Can you please recommend what changes to the code I can make to ensure higher accuracy? (time is not a constraint so I am okay with wait time).
Again, thanks a lot!
@izzyanalytics4145 3 года назад ⁺¹
I've updated the project on GitHub. You may find the update code works a bit better: github.com/israel-dryer/Twitter-Scraper
@kellnerralf9749 4 года назад
Hey there, very helpful and many thanks! I do not fetch all cards while automatically scrolling if when increasing sleep time to allow for longer loading. Do you have any recommendation how this might be improved?
@izzyanalytics4145 4 года назад ⁺¹
I've been making updates to the script. Check this one out: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter_scraper.py
@kellnerralf9749 4 года назад
@@izzyanalytics4145 Many thanks again!
@jealouscase3634 4 года назад
Hello, I am using parts of your tutorial yet the find_element_by_xpath to find the tweet is returning an error. Any chance you could help?
My code:
from selenium import webdriver
link = [link which i'm trying to access]
def open_browser(link):
driver = webdriver.Chrome(executable_path="/Users/[my name]/Downloads/chromedriver 4")
driver.get(link)
return driver
driver = open_browser(link)
card = driver.find_element_by_xpath('//div[@data-testid="tweet"]')
The error:
selenium.common.exceptions.NoSuchElementException: Message: no such element: Unable to locate element: {"method":"xpath","selector":"//div[@data-testid="tweet"]"}
Thank you :)
@tulaBat 4 года назад
find_element_by_xpath returns a single item, not a list. I guess it will produce an error at card=cards[0] or card=cards[7]
@izzyanalytics4145 4 года назад
Correct, the singular will return a single item. I believe I'm using the plural "elements" in this section, which will return a list, if you're talking about 8:01
@tulaBat 4 года назад ⁺¹
@@izzyanalytics4145 Thanks. You are right. My bad.
@ghb323 2 года назад
Uhh, did you mention that twitter does not display every tweet on timelines and searches? It is due to index limitations.
@sushmita5003 2 года назад
Thanks for making such a wonderful video. Could you please make a video for extracting reply contents too rather than just the counts.
@vasudev16180 3 года назад
Bro! Please comeback. You are awesome.
@martinchen6213 4 года назад
Message: no such element: Unable to locate element: {"method":"xpath","selector":"//input[@aria-label="Search query"]"}
@izzyanalytics4145 4 года назад ⁺¹
Not exactly sure without seeing you code. But I would check to make sure you are using single and double quotes in the right places. single quotes should be used on the outside of the xpath block, and double quotes on the inside. : github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@martinchen6213 4 года назад ⁺¹
@@izzyanalytics4145 Really appreciate your video!
@Suwaniify 4 года назад
I can only get a limited number of tweets. Is there a way to get all of them?
Thanks in advance
@izzyanalytics4145 4 года назад
Is the scraper not scraping all of the results provided, or is your search not producing all of the results you expected?
@Suwaniify 4 года назад
@@izzyanalytics4145 When I search a hashtag there are thousands of results under that, but when I try to scrape it, I only get the first 6 or 7. Is there a way to get all the results?
I tried infinite scrolling and then scraping the entire page, but didnt work.
Thank you!!!!!
@izzyanalytics4145 4 года назад
Not sure without looking at the code. Try using this script and see if you get better results: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@Suwaniify 4 года назад
@@izzyanalytics4145 Thanks a lot. It worked.
@abdulrahim-fq1oj 4 года назад
@@Suwaniify Hi how you resolve this issue of limited tweets because i also want tweets in bulk please help me to resolve this issue thanks
@user-mr6gn5lj4l 3 года назад
This is really interesting. Thanks for this video. Just I want you to explain me if I want to scrape data from specific location that is not near me, also specific time in the past. for example within a time period in 2020. Also I want to extract tweets contains specific words these words are not English. Please give me some tips?
@yidisprei100 3 года назад
Is there any way to also get the tweet link?
@izzyanalytics4145 3 года назад
yes, it looks like the link is contained in the `a` tag that contains the tweet timestamp. So, start with the timestamp, get the parent, then get the href from the parent.
@riopangestu2764 Год назад
Hi Izzy it's really great information, but is your method can extract email too?
@beaumcginley6285 4 года назад ⁺²
Hey Izzy. This has been amazing. Thank you for this. I run into one problem consistently: about every 9 or 10 tweets, the scraper will fail to pick up about 3 tweets, then pick up and get the next 9 or 10 tweets, then miss the next 3 or so. I tried playing with the scroll script, adding some sleep time in between, but none of it makes a difference. I think it may be a subtle change in the html every few tweets? do you have any thoughts? trouble shooting tips, etc. Thank you again dude!
@izzyanalytics4145 4 года назад
Have you tried running the script from GitHub? I made some tweaks since originally posting. You could increase the number of tweets that are in the reviewed in each scroll.
@beaumcginley6285 4 года назад
@@izzyanalytics4145 That worked! you are a lifesaver
@sterre0008 3 года назад ⁺¹
Hi, love your video! I actually have a problem at the very end; it says 'Webdriver'. How can I fix this error?
@izzyanalytics4145 3 года назад
I'm not quite sure what you mean. You can checkout the code and compare to see what may be the issue. I've also added updates since I originally published this. github.com/israel-dryer/Twitter-Scraper
@starvinmarvin5533 4 года назад ⁺¹
Holy shit this is good clear content. Rare on RUclips. Keep posting stuff because this is what we need, not more rehashed shit content
anwyway izzy my problem is I'm not grabbing all the tweets. I'm skipping over large numbers of them, and I don't know why because I'm identifying them by data-testid = tweet and so if I get some, shouldn't I be getting them all?
I'm missing more than half sometimes, and it misses over the same ones on everypage...Wierd...
@izzyanalytics4145 4 года назад
I made some changes today that should help. Unfortunately, I wasn't able to test some of the changes because I hit the limit. So I'll have to wait tomorrow. Check out the ".py" file. If it's not working now, it will work as soon as I'm able to get back on and test it. github.com/israel-dryer/Twitter-Scraper
@starvinmarvin5533 4 года назад ⁺¹
@@izzyanalytics4145 Thanks bro I'm getting a higher success of scraping. I'm using it to pull dirt on racists at work and ruin their lives Izzy. You are doing god's work my man!
@mingzhang613 3 года назад
The best tutorial! Thanks!
@izzyanalytics4145 3 года назад
Glad it helped!
@SaeFireStorm 4 года назад
Hey bro absolutely great video and it works perfectly for me.. keep it up!! I'm just having one small issue, i'm gathering rather old tweets and i just experienced that the code stops working if there's a tweet, which is unavailable but still shown in the twitter timeline. Any idea on how to skip these?
@izzyanalytics4145 4 года назад
I've updates to the code over time... I'd check out the latest version; I think it may handle some of these issues; github.com/israel-dryer/Twitter-Scraper
@dimaschandraw.p2896 3 года назад
Your tutorial is very good, keep it up!
@izzyanalytics4145 3 года назад
Thanks, will do!
@coolundbidda7611 4 года назад ⁺¹
Thank you for the Tutorial :) It helps me a lot for my Bachelors-Thesis! Speaking of: how should I cite you in my sources? Is your github page sufficient?
@izzyanalytics4145 4 года назад ⁺¹
Sure! Glad you found it helpful.
@coolundbidda7611 4 года назад
@@izzyanalytics4145 Hey :) How come you don't run into issues with javascript? Because I have a list of tweets I´d like to scan for the content of their links and everything I try keeps throwing errors
@izzyanalytics4145 4 года назад
@@coolundbidda7611 i can take a look at your code if you wish israel.dryer@gmail.com
@coolundbidda7611 4 года назад ⁺²
@@izzyanalytics4145 That is very kind! I am amazed at how quick you are to reply and how much you care about the community! But I actually found my solution just now :)
I took another look at your github code and found out, that I just didn't wait long enough for the page to be fully loaded. Because of that javascript obstructed the interaction
@izzyanalytics4145 4 года назад ⁺¹
I'm not sure if I used it in that code, but you can use the Wait object to wait for certain conditions before proceeding. This makes the scraper a lot more safe.
@vibhortyagi6787 3 года назад ⁺¹
Thanks a lot Izzy! Great help👍
@izzyanalytics4145 3 года назад
Happy to help!
@palakaggarwal1782 3 года назад
Thank you so much for this video. It helped me immensely.
@oliviacorso5731 4 года назад
If I were to search a very common term that would basically be an infinite scroll, is there a way to limit the number tweets gathered or cut it off once you reach a certain date of the tweet? Also thanks so much, this is a really great tutorial!
@cdgvcshjcdcsjhc 4 года назад ⁺¹
I'm also really interested on how we could set up a definite number of scrolling if possible! Thanks
@izzyanalytics4145 4 года назад ⁺¹
You could add a piece of code in a while loop that checks for the number of tweets gather, and breaks when that number is achieved. I've got a bit of code online (linked below) that improves the scrolling functionality and breaks out when the scrolling is done... however, you could easily add a check in the "while scrolling" block. As far as dates. there is actually an advanced search with twitter. Go to the url, add a few parameters and then see what it does to the URL. You can then use this pattern to filter by date range, tags, and other parameters.
Twitter Advanced Search: twitter.com/search-advanced?lang=en
GitHub Code: github.com/israel-dryer/Twitter-Scraper/blob/main/twitter-scraper-tut.ipynb
@izzyanalytics4145 4 года назад
See my reply to Olivia, and let me know if this works.
@beaumcginley6285 4 года назад ⁺¹
Dude. You are a beast. Thank you
@izzyanalytics4145 4 года назад
Np. Glad it was helpful.
@daddyallu1542 2 года назад
Thanks a lot man, you made my day ❤️❤️
@이다혜-c2k 3 года назад
Thanks a lot Izzy! This tutorial really helped me a lot!!
But I am struggling with some issues :(
I've noticed that some of the tweets were missing in the collected data. I have looked through the updated codes you've uploaded in GitHub, but still couldn't solve the problem.
I think something went wrong with scrolling down. Looking at the monitor, there were times it scrolls down way too much, skipping some tweets.
Is this an issue related to lookback limit? If so, how can I find the appropriate number for my computer?
Thank you so much!!
@devtest8259 3 года назад
I wonder if screen size makes a difference in terms of how much you would need to scroll. Thoughts? Or have you solved for this already?
@Keasterio 2 года назад
Can anyone help me please. I run into infinite loop
@tatavares1985 3 года назад ⁺¹
Very good my friend!
@izzyanalytics4145 3 года назад
Thanks!
@deathdefier45 4 года назад
Thank you so much helped me a lot for my research project.
@izzyanalytics4145 4 года назад
Glad it helped!
@damianwysokinski3285 3 года назад
Can such bots be used in production systems, when we, for example, want to scrap tweets twice per day?
@izzyanalytics4145 3 года назад
Possibly. You would need to build in more substantial error handling and check out periodically. Webscrapers are inherently fragile because they rely on an external document structure that you have no control over.
@MA-ti5iz 4 года назад
Can you make video on scraping google maps??
@izzyanalytics4145 4 года назад
Interesting. What data would you scrape from Google Maps?
@anthonychianlai6198 4 года назад ⁺¹
You are brilliant!!!
@izzyanalytics4145 4 года назад
Thank you! Cheers!
@mdb304 3 года назад ⁺¹
I like it, good work.
@izzyanalytics4145 3 года назад
Thanks!
@aartha6820 4 года назад ⁺²
Hi, your video helped me a lot. Thanks!
Can you make a tutorial on how to get twitter account profile data please?
To get when the account created, the following count, and the followers count.
Thanks a bunch!
@izzyanalytics4145 4 года назад ⁺¹
Yes. I'll post some code as soon as I get some time. In the meantime, how I would do this is build the profile URL from the user name scraped on the page. Then I would use the driver to navigate to that URL and scrape the profile info from the top of the page using the same methods I used in this video.
@ifeoluwawuraola4726 3 года назад
Awesome tutorial! Thank you.
@izzyanalytics4145 3 года назад
Thanks
@hkchan1339 4 года назад ⁺²
I saw in some youtube videos saying you need a twitter developer account to scrap Twitter data, can you explain the differences and limitations of yours and their method?
Big thanks btw, you are exactly what I am looking for
@izzyanalytics4145 4 года назад
You would need a developer account if you want to use the API. An API is a lot easier to use, but it's often limited in how much you can pull, or how far back it goes, and costs money for expanded access. You can check it out here to see if it's something that would work for your case. developer.twitter.com/en
@femiojo99 3 года назад
Great video!
@Gunner8070 3 года назад ⁺¹
Very useful!
@izzyanalytics4145 3 года назад
Thanks
@stevenedds 3 года назад
fire tutorial!!
@maxisancari 3 года назад
Just amazing!
@AwaisEhsanNaatcollection 3 года назад ⁺¹
bro please Make same video on how to scrape ebay And aliexpress please plesae i am waiting for your kind response
@izzyanalytics4145 3 года назад
github.com/israel-dryer/Ebay-Scraper
@AwaisEhsanNaatcollection 3 года назад
@@izzyanalytics4145 thanks bro may you live a long and happy life
@AwaisEhsanNaatcollection 3 года назад ⁺¹
bro please Make video on how to scrape ebay And aliexpress please plesae i am waiting for your kind response
@izzyanalytics4145 3 года назад
github.com/israel-dryer/Ebay-Scraper
@AwaisEhsanNaatcollection 3 года назад
@@izzyanalytics4145 thanks bro thankyou very much very much appreciated
@AwaisEhsanNaatcollection 3 года назад
@@izzyanalytics4145 thankyou thankyou thankyou
@abdulbasitalmomen4311 3 года назад
Awesome And great , thanks for your helpful video.
@sachinshajan8066 4 года назад ⁺¹
can you do a tutorial on , how to extract Instagram followers list,scrap all the followers of a user including their user profile to a .csv
@izzyanalytics4145 4 года назад ⁺¹
good idea. I'll put it on my list.
@sachinshajan8066 4 года назад
@@izzyanalytics4145 May we know, when can we see it. :)
@izzyanalytics4145 4 года назад ⁺¹
Hopefully end of Nov or beg of December. I haven't had any time to do videos lately because of work. 😔 But, I do have lot of items on the list.
@sachinshajan8066 4 года назад ⁺¹
@@izzyanalytics4145 Please let us know,If you are looking forward to code it and make it available on your github repo any soon,before the video comes. So that we don't need to wait till the video is out and will be happy to test it :)
@erlanggabimatara75 4 года назад ⁺¹
thank you, your video help me so much
@izzyanalytics4145 4 года назад
Glad to hear it!
@shumyyla_ Год назад ⁺¹
Hey, I'm trying to replicate what you've done, and the code seems to be working fine. However, when I check the CSV file after scraping, it's empty. I've also attempted to use your code from GitHub, but I'm facing the same issue. Could you please help me figure out the reason behind this?
@cissie0202 Год назад
do you solve it sucessfully, It seems that i meet the same qustion
@sufyanhamid9560 4 года назад ⁺¹
Hi @izzy_analytics.
This code working well for infinite scrolling but when I try to scrape limited number of tweets like > 200 < it skip the tweets and reach very fast at the last page and stop scrolling and the file have 120-130 tweets while the account have 200 tweets. So plz help me, how I can fix this issue because I want to scrape many profile which contains very small amount of tweets.
I used chrome web driver on windows 7, I think so this isn't a issue of this problem.
Thanks in advance :)
@izzyanalytics4145 4 года назад
Hi, try using the script I've posted on GitHub. I've added a few more things that might be helpful: github.com/israel-dryer/Twitter-Scraper
@izzyanalytics4145 4 года назад
Let me know if this is still an issue after you try my code
@sufyanhamid9560 4 года назад
@@izzyanalytics4145 the problem becomes same, when I search (at the rate)gregmaffei, the profile show that he made 380 tweets but when I try to scrape then the it have 224 tweets and every time the number of scraped tweets becomes change.
Also my last code is also updated according to your github code but the result remains same. Kindly help me.
Thanks.
@sufyanhamid9560 4 года назад
@izzy analytics Do you check that issue?? I''ve to complete my analysis within few days kindly help me.
Thanks.
@izzyanalytics4145 4 года назад
One thing I tried to get around 280 was to change the slice in `for card in page_cards[-15:]` to 40. This looks back on more tweets... however, there could be several things going on and it will take time to properly diagnose... I know that it's getting to the end, because I can see results from the first tweets in the timeline. So, tweets are presumably getting lost between the beginning and the end. Some things that need to be looked into: It says there are 380 tweets, but is that how many that are really available? Is there something different on some cards that's causing it to to return empty from the `get tweet data` function? I'll keep you posted.
@justicechukwuka5872 4 года назад
thanks for this tutorial, while i was trying to replicate, I had error for this line of code, please advice.
password.send_keys(Keys.RETURN) #same action as clicking the login button
AttributeError: 'NoneType' object has no attribute 'send_keys'
@hkchan1339 4 года назад
Can you just log in to Twitter and skip that part?
@izzyanalytics4145 4 года назад
It looks like it's not finding the password field. Double check the code for identifying the password element.
@justicechukwuka5872 4 года назад
@@izzyanalytics4145 thanks , I would check
@justicechukwuka5872 4 года назад
@@hkchan1339 thanks chan...i tried that part and it did work
@sohaibehsan3016 3 года назад
Hey Izzy How are you Can You make web scrapper for (Ali Express/Ali baba) And also Make Video on ali express i will be very thankful to you
@gianpieryupanqui3613 2 года назад ⁺¹
13:33
@mdmahditajwarraeed7482 Год назад
Did you steal the video from TechWithTim
@FRANKWHITE1996 3 года назад
Thanks for this video
@izzyanalytics4145 3 года назад
Most welcome
@nycstev428 4 года назад ⁺¹
good work
@izzyanalytics4145 4 года назад
Thanks
@nycstev428 4 года назад
@@izzyanalytics4145 Message: stale element reference: element is not attached to the page document
When i tried to do: card.find_element_by_xpath('.//span').text
do you maybe know why? cheers
@izzyanalytics4145 4 года назад
possibly the page might not have fully loaded. You could try adding a delay using the sleep function before that code. Sometimes I will also use implicitly wait at the beginning of my code. See these sourced for more info:
(1) www.geeksforgeeks.org/implicitly_wait-driver-method-selenium-python/#:~:text=implicitly_wait%20driver%20method%20%E2%80%93%20Selenium%20Python.%20Selenium%E2%80%99s%20Python,API%20to%20write%20functional%2Facceptance%20tests%20using%20Selenium%20WebDriver.
(2) sqa.stackexchange.com/questions/40942/stale-element-reference-element-is-not-attached-to-the-page-of-the-document

Следующие

Автовоспроизведение

Login and Scrape Data with Playwright and Python