Big fan of your work sentdex, I'm a Master's degree student in NLP (about that, your series on NLTK was very useful). and it's always a pleasure to learn from your videos. Currently I'm very interested in words embedding, will you make a video or a series about word embedding, word2vec, etc ?
Also, it's obvious TextBlob is "slightly better", since your condition for counting a sentence as valid is that it should have (for positive accuracy) a polarity greater or equal to 0.0001. It's then obvious that any sentence that passes that test will have a polarity greater than 0. You are counting them twice. You will always get 100% accuracy.
The reason I used the thresholds was to filter out degrees of uncertainty. The 100.0% accuracy is a certainty, but what I was moreso trying to look for was the sample # at that point.
Andrea Ramazzina I think he can. Since every single line of the file is set to be either positive or negative, in accordance with the file, you can compare directly the two methods by seeing how many positive results they report.
18:27; line 27; why here you can leave a = sign? in this case there will be pos = neg and both are less than 0.1, which can not be treated as neg, and therefore increase the counter
Hi! Thank you for this video. I have a question about the compound scores. The VADERSentiment documentation states that the threshold is >0.05, = 0.05 neutral sentiment: (compound score > -0.05) and (compound score < 0.05) negative sentiment: compound score
Michael Beckett I like it cause I have all my tools in one place, and I found Sublime too clunky to set up. But I understand what you mean. I used to use Visual Studio Community a lot, but when I started doing scripting at my work place I had to stop using it due to the obvious licensing issues. So I did spend a lot of time trying to find an IDE that does what I need it to do. I tried IDLE, Spyder, Sublime. No luck with them, only frustration. And when VS Code came out, it was perfect. Although, to be honest, I should have given PyCharm a try.
I probably will convert in the next few weeks based on people's recommendations. I think I just feel a certain loyalty to my sublime setup since it took so long to get the way I like. I still need to get my VS Code to 'see' my folders and change the font...maybe after my latest 'golang' project. :)
Well as such Atom is really good and open source..... My recommendations are sublime and atom... VS code is good and perfect, but when I do extensive 'for loop' coding like neural networks, then the vscode lags sooooooooo much
in the later textblob analysis (19:17 onwards) you did a dumb. you only took the sample with polarity of greater than 0.5 and then asked then if the polarity was greater than 0 and of course it was, since 0.5 > 0 and > is transitive. of course you were getting 100% "accuracy", but thats cheating. of course you classify correctly on those samples where you classifiy correctly.
Thanks, it is very useful. Could I ask - do I need to define: for line in f.read().split(' '): analysis = TextBlob(line) if analysis.sentiment.polarity
It would be interesting to see a comparison with a dataset of tweets or Reddit comments. VADER claims it's "specifically attuned to sentiments expressed in social media". Emoticons, slang, all caps, initialisms, acronyms, etc.
using your final example with textblob, the number of negative samples textblob identifies is only 2072 out of a total of 5332. If we use the negative.txt file to actually test for positive sentiment instead, we get 2345 which is higher than negative. So, textblob is actually saying that our negative sentiment file contains MORE positive sentiment than negative sentiment lol, which is entirely wrong for our case. Is there a better way of representing negative sentiment? Because if we were to go in blind not knowing our dataset contains negative sentiment, we would actually end up with polarity leaning towards the positive end. (first time doing sentiment analysis so trying to wrap my head around this aha)
Hey, so you definitely have a lot more experience with this than I do. When using your methods to make Vader Sentiment more accurate along with Text Blob while using the sample texts provided, I keep running into major accuracy issues. For example, on the negative text file, both Vader and Text Blob end up classifying the text as roughly 50/50 positive and negative, which is no better than flipping a coin. Do you know why this might be?
If you are getting this ERROR UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4645: ordinal not in range(128) try: with open("positive.txt","r", encoding='utf-8') as f:
I haven't found any really good sentiment analysis with tensorflow, other than via some API. Despite that, this would almost certainly be far more lightweight than running a neural network. Not sure on the speed. I doubt TF would go faster in classifications, but speed could be comparable. A neural network could conceivably be far more accurate, but at certain costs.
I remember looking at this one. It claims to be 79% accurate ahmedbesbes.com/sentiment-analysis-on-twitter-using-word2vec-and-keras.html#comment-3548018958
@@sentdex Typically you use a 1D CNN with FastText to do sentiment classification e.g. using Keras. A Deep Learning model requires a huge dataset. With a small dataset, you'll get most likely a much higher variance than with a hard-coded algorithm such as VADER.
The one where you get 100% accuracy, you are considering the polarity either to be negative or positive and checking the same in the next statement. if x < -0.001 then it is definitely less than 0 and if x > 0.0001 then it is clearly greater than 0. How is this logic even working? I am lost in this. Can you explain it clearly ?
PLEASE someone help! i want to follow along so bad and spent the whole day trying to fix this error, UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6573: ordinal not in range(128) it happens in both vader and text blob. i can't find any fix for it , it wont even run simple from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer analyzer = SentimentIntensityAnalyzer() vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!") print(vs) pleassssse to save my sanity and so i can follow along. thanks
You may have found an answer by now. The solution I found for this specific problem was to use the open function from the io package in python. I use that open function because you can then specify encoding. That solves the problem for at least TextBlob because one of the lines in the provided texts is causing the problem.
Hello. Just a quick question off topic. I am using Python35 with Sublime as the IDE. I am trying every possible command to comment out lines but they do not seem to be working. Can anyone suggest a solution? Thanks. I also tried ctrl + / but no luck.
i really like your videos , but if its possible use a chrome plugin to invert the white color of webpages to something darker (eg. CareYourEyes plugin). The dark ide (alt-tab) white web page (alt-tab) dark ide, blows my eyes
will this code work on any dataset, like I have the reddit comment dataset,where each line is an individual comment or a reply. How accurate will it predict if I will set the polarity between 0.0001 to -0.0001.. Please reply soon!
If anyone could potentially help me with the issue in the link below, I'd be very appreciative. stackoverflow.com/questions/51398378/python-nlp-code-not-functioning-as-should
You are making the assumption that every single line in the text you're analyzing is positive or negative, respectively. A review can be a mix of positive, neutral, and negative statements. As such, your accuracy metrics are irrelevant.
Yep, one file is all "positive" reviews and the other is all "negative" reviews. If you read them though, you'd probably suggest not all of them are clearly one way or the other. That's why I like using this set, it's a fairly realistic set that is quite challenging, and maybe even a bit noisy. Good for testing a classifier and it's confidence in scoring.
Thanks for the comparison. I have used Textblob on call centre data and thought it was ok, but wondered if there were alternatives. I had never heard of Vader, only NLTK. Given Textblob's ability to do sentiment analysis, text classification and tokenisation I think I'll stick to the Blob. BTW it is interesting to use MatplotLib to scatter chart out sentiment vs polarity to see how your test data looks.
I'm a simple man, I see new sentdex video, I like
Brilliant video dude !!!!! thanks a million (keep them coming) I just ran the code on a Dutch corpus and its doing fine!
this is the greatest video i've ever seen
Big fan of your work sentdex, I'm a Master's degree student in NLP (about that, your series on NLTK was very useful). and it's always a pleasure to learn from your videos. Currently I'm very interested in words embedding, will you make a video or a series about word embedding, word2vec, etc ?
Also, it's obvious TextBlob is "slightly better", since your condition for counting a sentence as valid is that it should have (for positive accuracy) a polarity greater or equal to 0.0001. It's then obvious that any sentence that passes that test will have a polarity greater than 0. You are counting them twice. You will always get 100% accuracy.
The reason I used the thresholds was to filter out degrees of uncertainty. The 100.0% accuracy is a certainty, but what I was moreso trying to look for was the sample # at that point.
Ok, but still you cannot conclude that TextBlob performs better than the other library, as you are not really comparing the two methods.
Andrea Ramazzina I think he can. Since every single line of the file is set to be either positive or negative, in accordance with the file, you can compare directly the two methods by seeing how many positive results they report.
sentdex To be honest, you could have been a bit clearer about what the two sample files contain :)
Hi at time 7:09 when you mention positive and negative accuracy, That actually should be precision.
18:27; line 27;
why here you can leave a = sign? in this case there will be pos = neg and both are less than 0.1, which can not be treated as neg, and therefore increase the counter
Hi! Thank you for this video. I have a question about the compound scores. The VADERSentiment documentation states that the threshold is >0.05, = 0.05
neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
negative sentiment: compound score
Agreed. He got that wrong in the video. It should have been < -0.10 in his example.
I am glad u r finally using sublime...
Big fan always 😊
Ikr? Now one more step until he starts using VS Code with a proper debugger and programming environment.
The text editor wars never cease.
I've been trying to convert to VS Code but it just doesn't *feel* as comfortable as my sublime setup.
Michael Beckett I like it cause I have all my tools in one place, and I found Sublime too clunky to set up. But I understand what you mean. I used to use Visual Studio Community a lot, but when I started doing scripting at my work place I had to stop using it due to the obvious licensing issues. So I did spend a lot of time trying to find an IDE that does what I need it to do. I tried IDLE, Spyder, Sublime. No luck with them, only frustration. And when VS Code came out, it was perfect. Although, to be honest, I should have given PyCharm a try.
I probably will convert in the next few weeks based on people's recommendations. I think I just feel a certain loyalty to my sublime setup since it took so long to get the way I like.
I still need to get my VS Code to 'see' my folders and change the font...maybe after my latest 'golang' project. :)
Well as such Atom is really good and open source..... My recommendations are sublime and atom...
VS code is good and perfect, but when I do extensive 'for loop' coding like neural networks, then the vscode lags sooooooooo much
Finally you have started using an IDE :D
IDLE is an IDE
@@sentdex They don't understand that.
in the later textblob analysis (19:17 onwards) you did a dumb. you only took the sample with polarity of greater than 0.5 and then asked then if the polarity was greater than 0 and of course it was, since 0.5 > 0 and > is transitive. of course you were getting 100% "accuracy", but thats cheating. of course you classify correctly on those samples where you classifiy correctly.
Thanks, it is very useful. Could I ask - do I need to define:
for line in f.read().split('
'):
analysis = TextBlob(line)
if analysis.sentiment.polarity
Hi, Line 22 of the code where we are comparing the polarity to a factor. Should it be - 0.1 instead of + 0.1? Just a thought.
It would be interesting to see a comparison with a dataset of tweets or Reddit comments. VADER claims it's "specifically attuned to sentiments expressed in social media". Emoticons, slang, all caps, initialisms, acronyms, etc.
there are research papers out there I believe, but yes vader performs better than something like sentiwordnet for social media
#HoldIt!
I speak Spanish and the translation seems accurate
mexican here, translation looks legit. one word off but the rest is good.
Python keeps complaining that "ModuleNotFoundError: No module named 'vaderSentiment'". This happens while using Spyder. Kindly help out
Does anyone know if the sentiment scores provided by VADER can be improved using word stemming techniques?
using your final example with textblob, the number of negative samples textblob identifies is only 2072 out of a total of 5332. If we use the negative.txt file to actually test for positive sentiment instead, we get 2345 which is higher than negative. So, textblob is actually saying that our negative sentiment file contains MORE positive sentiment than negative sentiment lol, which is entirely wrong for our case. Is there a better way of representing negative sentiment? Because if we were to go in blind not knowing our dataset contains negative sentiment, we would actually end up with polarity leaning towards the positive end. (first time doing sentiment analysis so trying to wrap my head around this aha)
hi sentdex!! is it not possible to fetch tweets older than a week or so????
Is there a playlist for this serie?
Hey, so you definitely have a lot more experience with this than I do. When using your methods to make Vader Sentiment more accurate along with Text Blob while using the sample texts provided, I keep running into major accuracy issues. For example, on the negative text file, both Vader and Text Blob end up classifying the text as roughly 50/50 positive and negative, which is no better than flipping a coin. Do you know why this might be?
+1 for the move to sublime! what made you move?
great video as usual, keep em coming!
I have installed VADER using pip, but I cannot get it to work. I have an excel file with tweets that I want to analyse... Can someone please help me?
Could you suggest library for Turkish language sentiment analysis. I am working on fasttext. I am wondering your opinion.
If you are getting this ERROR UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4645: ordinal not in range(128)
try:
with open("positive.txt","r", encoding='utf-8') as f:
also watch how you save your .txt files. make sure you save as utf-8.
How do these compair to the
sentiment analysis built with tensorflow?
I haven't found any really good sentiment analysis with tensorflow, other than via some API. Despite that, this would almost certainly be far more lightweight than running a neural network. Not sure on the speed. I doubt TF would go faster in classifications, but speed could be comparable. A neural network could conceivably be far more accurate, but at certain costs.
I remember looking at this one. It claims to be 79% accurate
ahmedbesbes.com/sentiment-analysis-on-twitter-using-word2vec-and-keras.html#comment-3548018958
@@sentdex Typically you use a 1D CNN with FastText to do sentiment classification e.g. using Keras. A Deep Learning model requires a huge dataset. With a small dataset, you'll get most likely a much higher variance than with a hard-coded algorithm such as VADER.
The one where you get 100% accuracy, you are considering the polarity either to be negative or positive and checking the same in the next statement. if x < -0.001 then it is definitely less than 0 and if x > 0.0001 then it is clearly greater than 0. How is this logic even working? I am lost in this. Can you explain it clearly ?
At this step, I am purely trying to get the sample count up. It will be 100% accuracy since it passes the first check.
I get it. Thanks
can we use any of these packages for review baesd project??
Do you know any tool that classifies tweets on basis of emotions?
or just identifies the emotions in that tweet ?
I need it for a project
You found one? :D would be also interested..
PLEASE someone help! i want to follow along so bad and spent the whole day trying to fix this error,
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6573: ordinal not in range(128)
it happens in both vader and text blob. i can't find any fix for it , it wont even run simple
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyzer = SentimentIntensityAnalyzer()
vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!")
print(vs)
pleassssse to save my sanity and so i can follow along.
thanks
You may have found an answer by now. The solution I found for this specific problem was to use the open function from the io package in python. I use that open function because you can then specify encoding. That solves the problem for at least TextBlob because one of the lines in the provided texts is causing the problem.
I am getting the following error.
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
1 from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
2
----> 3 analyzer = SentimentIntensityAnalyzer()
4 vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!")
5 print(vs)
/Applications/anaconda2/lib/python2.7/site-packages/vaderSentiment/vaderSentiment.pyc in __init__(self, lexicon_file, emoji_lexicon)
210 _this_module_file_path_ = os.path.abspath(getsourcefile(lambda: 0))
211 lexicon_full_filepath = os.path.join(os.path.dirname(_this_module_file_path_), lexicon_file)
--> 212 with open(lexicon_full_filepath, encoding='utf-8') as f:
213 self.lexicon_full_filepath = f.read()
214 self.lexicon = self.make_lex_dict()
Hello. Just a quick question off topic. I am using Python35 with Sublime as the IDE. I am trying every possible command to comment out lines but they do not seem to be working. Can anyone suggest a solution? Thanks. I also tried ctrl + / but no luck.
Never mind I manged to install the necessary package and customize the theme in open resources.
4:22 The translation feature seems to be the old Google translate, so not really good...
does anyone know the translate to language codes in TextBlob().translate()?
i really like your videos , but if its possible use a chrome plugin to invert the white color of webpages to something darker (eg. CareYourEyes plugin). The dark ide (alt-tab) white web page (alt-tab) dark ide, blows my eyes
will this code work on any dataset, like I have the reddit comment dataset,where each line is an individual comment or a reply. How accurate will it predict if I will set the polarity between 0.0001 to -0.0001..
Please reply soon!
cannot install vaderSentiment on Anaconda.
I just freaked out when I saw Sublime on the thumbnail...
Why
Where'd u get that shirt?
pythonprogramming.net/store/
does this work with emojis?
vader does
Kindly share a tutorial on how to install VADER in python. Thank you.
How about voice sentiment analysis @sentdex
That's actually a really cool idea! I'll have to ponder on that
@@sentdex Ever watched the TV series Lie to Me? They used it in profiling ...
hey great video!
Id be interested in the implementation of some unsupervised generative models, like GAN.
print(analysis.translate(from_lang="en",to='ta'))
Thanks bro
thank you sentdex
please can get new series from android kivy
If anyone could potentially help me with the issue in the link below, I'd be very appreciative.
stackoverflow.com/questions/51398378/python-nlp-code-not-functioning-as-should
Awesome more dash
You are making the assumption that every single line in the text you're analyzing is positive or negative, respectively. A review can be a mix of positive, neutral, and negative statements. As such, your accuracy metrics are irrelevant.
In the case of my sample data, the text I am analyzing *is* either positive or negative. In reality, not everything is, but, in this case....it is :P
Ok, fair enough. I thought that each sample text is a review, when it looks like they are sentences picked from reviews.
Yep, one file is all "positive" reviews and the other is all "negative" reviews. If you read them though, you'd probably suggest not all of them are clearly one way or the other. That's why I like using this set, it's a fairly realistic set that is quite challenging, and maybe even a bit noisy. Good for testing a classifier and it's confidence in scoring.
You made a mistake when using textblob the second time, which is why you kept getting 100% accuracy; it should not be 100% accuracy.
eats shoots and leaves.
dataset original source www.cs.cornell.edu/people/pabo/movie-review-data/
good one bro..if possible make video on raspberry coded with python for camera and it detects the object and send you mail and message in your email
Yes.
Please update the GTA V bot!!!
notification squaaad
You should start by saying what it is all about. Sentiment Analysis in 4 Minutes by Siraj is better video
No thanks.
Thanks for the comparison.
I have used Textblob on call centre data and thought it was ok, but wondered if there were alternatives. I had never heard of Vader, only NLTK. Given Textblob's ability to do sentiment analysis, text classification and tokenisation I think I'll stick to the Blob. BTW it is interesting to use MatplotLib to scatter chart out sentiment vs polarity to see how your test data looks.