Sentiment Analysis in Python with TextBlob and VADER Sentiment (also Dash p.6)

Поделиться
HTML-код
  • Опубликовано: 22 дек 2024

Комментарии • 96

  • @MarkJay
    @MarkJay 6 лет назад +39

    I'm a simple man, I see new sentdex video, I like

  • @MaxLatif
    @MaxLatif 4 года назад

    Brilliant video dude !!!!! thanks a million (keep them coming) I just ran the code on a Dutch corpus and its doing fine!

  • @mailistfajar7518
    @mailistfajar7518 3 года назад

    this is the greatest video i've ever seen

  • @antoine109
    @antoine109 6 лет назад +1

    Big fan of your work sentdex, I'm a Master's degree student in NLP (about that, your series on NLTK was very useful). and it's always a pleasure to learn from your videos. Currently I'm very interested in words embedding, will you make a video or a series about word embedding, word2vec, etc ?

  • @CristiNeagu
    @CristiNeagu 6 лет назад +7

    Also, it's obvious TextBlob is "slightly better", since your condition for counting a sentence as valid is that it should have (for positive accuracy) a polarity greater or equal to 0.0001. It's then obvious that any sentence that passes that test will have a polarity greater than 0. You are counting them twice. You will always get 100% accuracy.

    • @sentdex
      @sentdex  6 лет назад

      The reason I used the thresholds was to filter out degrees of uncertainty. The 100.0% accuracy is a certainty, but what I was moreso trying to look for was the sample # at that point.

    • @andrear1989
      @andrear1989 6 лет назад +1

      Ok, but still you cannot conclude that TextBlob performs better than the other library, as you are not really comparing the two methods.

    • @CristiNeagu
      @CristiNeagu 6 лет назад

      Andrea Ramazzina I think he can. Since every single line of the file is set to be either positive or negative, in accordance with the file, you can compare directly the two methods by seeing how many positive results they report.

    • @CristiNeagu
      @CristiNeagu 6 лет назад

      sentdex To be honest, you could have been a bit clearer about what the two sample files contain :)

  • @MetodNovak
    @MetodNovak 6 лет назад +5

    Hi at time 7:09 when you mention positive and negative accuracy, That actually should be precision.

  • @skorpimish
    @skorpimish 6 лет назад

    18:27; line 27;
    why here you can leave a = sign? in this case there will be pos = neg and both are less than 0.1, which can not be treated as neg, and therefore increase the counter

  • @ElliV87
    @ElliV87 5 лет назад +3

    Hi! Thank you for this video. I have a question about the compound scores. The VADERSentiment documentation states that the threshold is >0.05, = 0.05
    neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
    negative sentiment: compound score

    • @PatrickBateman12420
      @PatrickBateman12420 5 лет назад +1

      Agreed. He got that wrong in the video. It should have been < -0.10 in his example.

  • @wolfisraging
    @wolfisraging 6 лет назад +4

    I am glad u r finally using sublime...
    Big fan always 😊

    • @CristiNeagu
      @CristiNeagu 6 лет назад +1

      Ikr? Now one more step until he starts using VS Code with a proper debugger and programming environment.

    • @beckettman42
      @beckettman42 6 лет назад

      The text editor wars never cease.
      I've been trying to convert to VS Code but it just doesn't *feel* as comfortable as my sublime setup.

    • @CristiNeagu
      @CristiNeagu 6 лет назад +1

      Michael Beckett I like it cause I have all my tools in one place, and I found Sublime too clunky to set up. But I understand what you mean. I used to use Visual Studio Community a lot, but when I started doing scripting at my work place I had to stop using it due to the obvious licensing issues. So I did spend a lot of time trying to find an IDE that does what I need it to do. I tried IDLE, Spyder, Sublime. No luck with them, only frustration. And when VS Code came out, it was perfect. Although, to be honest, I should have given PyCharm a try.

    • @beckettman42
      @beckettman42 6 лет назад

      I probably will convert in the next few weeks based on people's recommendations. I think I just feel a certain loyalty to my sublime setup since it took so long to get the way I like.
      I still need to get my VS Code to 'see' my folders and change the font...maybe after my latest 'golang' project. :)

    • @wolfisraging
      @wolfisraging 6 лет назад +1

      Well as such Atom is really good and open source..... My recommendations are sublime and atom...
      VS code is good and perfect, but when I do extensive 'for loop' coding like neural networks, then the vscode lags sooooooooo much

  • @nomanssky09
    @nomanssky09 6 лет назад +1

    Finally you have started using an IDE :D

  • @SirFloIII
    @SirFloIII 6 лет назад +10

    in the later textblob analysis (19:17 onwards) you did a dumb. you only took the sample with polarity of greater than 0.5 and then asked then if the polarity was greater than 0 and of course it was, since 0.5 > 0 and > is transitive. of course you were getting 100% "accuracy", but thats cheating. of course you classify correctly on those samples where you classifiy correctly.

  • @plamenyankov8476
    @plamenyankov8476 3 года назад

    Thanks, it is very useful. Could I ask - do I need to define:
    for line in f.read().split('
    '):
    analysis = TextBlob(line)
    if analysis.sentiment.polarity

  • @guruappapadasali7106
    @guruappapadasali7106 6 лет назад +1

    Hi, Line 22 of the code where we are comparing the polarity to a factor. Should it be - 0.1 instead of + 0.1? Just a thought.

  • @sellen2u
    @sellen2u 6 лет назад +1

    It would be interesting to see a comparison with a dataset of tweets or Reddit comments. VADER claims it's "specifically attuned to sentiments expressed in social media". Emoticons, slang, all caps, initialisms, acronyms, etc.

    • @dandogamer
      @dandogamer 6 лет назад

      there are research papers out there I believe, but yes vader performs better than something like sentiwordnet for social media

  • @hedleypanama
    @hedleypanama 6 лет назад +4

    #HoldIt!
    I speak Spanish and the translation seems accurate

  • @mvaldes
    @mvaldes 6 лет назад

    mexican here, translation looks legit. one word off but the rest is good.

  • @GaxtonOkobah
    @GaxtonOkobah 8 месяцев назад

    Python keeps complaining that "ModuleNotFoundError: No module named 'vaderSentiment'". This happens while using Spyder. Kindly help out

  • @ThePellski
    @ThePellski 6 лет назад +1

    Does anyone know if the sentiment scores provided by VADER can be improved using word stemming techniques?

  • @TheBluCypher
    @TheBluCypher 5 лет назад

    using your final example with textblob, the number of negative samples textblob identifies is only 2072 out of a total of 5332. If we use the negative.txt file to actually test for positive sentiment instead, we get 2345 which is higher than negative. So, textblob is actually saying that our negative sentiment file contains MORE positive sentiment than negative sentiment lol, which is entirely wrong for our case. Is there a better way of representing negative sentiment? Because if we were to go in blind not knowing our dataset contains negative sentiment, we would actually end up with polarity leaning towards the positive end. (first time doing sentiment analysis so trying to wrap my head around this aha)

  • @reemawangkheirakpam8165
    @reemawangkheirakpam8165 6 лет назад

    hi sentdex!! is it not possible to fetch tweets older than a week or so????

  • @edelciojunior3917
    @edelciojunior3917 4 года назад

    Is there a playlist for this serie?

  • @JuggernautProducts
    @JuggernautProducts 5 лет назад

    Hey, so you definitely have a lot more experience with this than I do. When using your methods to make Vader Sentiment more accurate along with Text Blob while using the sample texts provided, I keep running into major accuracy issues. For example, on the negative text file, both Vader and Text Blob end up classifying the text as roughly 50/50 positive and negative, which is no better than flipping a coin. Do you know why this might be?

  • @smadgulkar
    @smadgulkar 6 лет назад

    +1 for the move to sublime! what made you move?
    great video as usual, keep em coming!

  • @chrisvlachos
    @chrisvlachos 6 лет назад

    I have installed VADER using pip, but I cannot get it to work. I have an excel file with tweets that I want to analyse... Can someone please help me?

  • @yusufbaysal7796
    @yusufbaysal7796 6 лет назад

    Could you suggest library for Turkish language sentiment analysis. I am working on fasttext. I am wondering your opinion.

  • @justinhouck1245
    @justinhouck1245 6 лет назад

    If you are getting this ERROR UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 4645: ordinal not in range(128)
    try:
    with open("positive.txt","r", encoding='utf-8') as f:

    • @justinhouck1245
      @justinhouck1245 6 лет назад

      also watch how you save your .txt files. make sure you save as utf-8.

  • @FuZZbaLLbee
    @FuZZbaLLbee 6 лет назад +1

    How do these compair to the
    sentiment analysis built with tensorflow?

    • @sentdex
      @sentdex  6 лет назад

      I haven't found any really good sentiment analysis with tensorflow, other than via some API. Despite that, this would almost certainly be far more lightweight than running a neural network. Not sure on the speed. I doubt TF would go faster in classifications, but speed could be comparable. A neural network could conceivably be far more accurate, but at certain costs.

    • @FuZZbaLLbee
      @FuZZbaLLbee 6 лет назад +1

      I remember looking at this one. It claims to be 79% accurate
      ahmedbesbes.com/sentiment-analysis-on-twitter-using-word2vec-and-keras.html#comment-3548018958

    • @PatrickBateman12420
      @PatrickBateman12420 5 лет назад

      @@sentdex Typically you use a 1D CNN with FastText to do sentiment classification e.g. using Keras. A Deep Learning model requires a huge dataset. With a small dataset, you'll get most likely a much higher variance than with a hard-coded algorithm such as VADER.

  • @TradeWithMVR
    @TradeWithMVR 6 лет назад +2

    The one where you get 100% accuracy, you are considering the polarity either to be negative or positive and checking the same in the next statement. if x < -0.001 then it is definitely less than 0 and if x > 0.0001 then it is clearly greater than 0. How is this logic even working? I am lost in this. Can you explain it clearly ?

    • @sentdex
      @sentdex  6 лет назад +1

      At this step, I am purely trying to get the sample count up. It will be 100% accuracy since it passes the first check.

    • @TradeWithMVR
      @TradeWithMVR 6 лет назад

      I get it. Thanks

  • @muhammadusmanakram406
    @muhammadusmanakram406 5 лет назад

    can we use any of these packages for review baesd project??

  • @siddharthabiswas2147
    @siddharthabiswas2147 6 лет назад +1

    Do you know any tool that classifies tweets on basis of emotions?
    or just identifies the emotions in that tweet ?
    I need it for a project

    • @RnDcompany
      @RnDcompany 3 года назад

      You found one? :D would be also interested..

  • @mrooney9596
    @mrooney9596 6 лет назад

    PLEASE someone help! i want to follow along so bad and spent the whole day trying to fix this error,
    UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 6573: ordinal not in range(128)
    it happens in both vader and text blob. i can't find any fix for it , it wont even run simple
    from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
    analyzer = SentimentIntensityAnalyzer()
    vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!")
    print(vs)
    pleassssse to save my sanity and so i can follow along.
    thanks

    • @MrFinnagle
      @MrFinnagle 5 лет назад

      You may have found an answer by now. The solution I found for this specific problem was to use the open function from the io package in python. I use that open function because you can then specify encoding. That solves the problem for at least TextBlob because one of the lines in the provided texts is causing the problem.

  • @pavan540
    @pavan540 6 лет назад +1

    I am getting the following error.
    ---------------------------------------------------------------------------
    TypeError Traceback (most recent call last)
    in ()
    1 from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
    2
    ----> 3 analyzer = SentimentIntensityAnalyzer()
    4 vs = analyzer.polarity_scores("VADER Sentiment looks interesting, I have high hopes!")
    5 print(vs)
    /Applications/anaconda2/lib/python2.7/site-packages/vaderSentiment/vaderSentiment.pyc in __init__(self, lexicon_file, emoji_lexicon)
    210 _this_module_file_path_ = os.path.abspath(getsourcefile(lambda: 0))
    211 lexicon_full_filepath = os.path.join(os.path.dirname(_this_module_file_path_), lexicon_file)
    --> 212 with open(lexicon_full_filepath, encoding='utf-8') as f:
    213 self.lexicon_full_filepath = f.read()
    214 self.lexicon = self.make_lex_dict()

  • @step7steveX
    @step7steveX 5 лет назад

    Hello. Just a quick question off topic. I am using Python35 with Sublime as the IDE. I am trying every possible command to comment out lines but they do not seem to be working. Can anyone suggest a solution? Thanks. I also tried ctrl + / but no luck.

    • @step7steveX
      @step7steveX 5 лет назад

      Never mind I manged to install the necessary package and customize the theme in open resources.

  • @eragonritter6436
    @eragonritter6436 6 лет назад

    4:22 The translation feature seems to be the old Google translate, so not really good...

  • @debjitchattopadhyay7627
    @debjitchattopadhyay7627 4 года назад

    does anyone know the translate to language codes in TextBlob().translate()?

  • @panoss4149
    @panoss4149 6 лет назад +1

    i really like your videos , but if its possible use a chrome plugin to invert the white color of webpages to something darker (eg. CareYourEyes plugin). The dark ide (alt-tab) white web page (alt-tab) dark ide, blows my eyes

  • @ashishagrawal5483
    @ashishagrawal5483 6 лет назад

    will this code work on any dataset, like I have the reddit comment dataset,where each line is an individual comment or a reply. How accurate will it predict if I will set the polarity between 0.0001 to -0.0001..
    Please reply soon!

  • @acid123ist
    @acid123ist 4 года назад

    cannot install vaderSentiment on Anaconda.

  • @Eurley66
    @Eurley66 6 лет назад +4

    I just freaked out when I saw Sublime on the thumbnail...

  • @asdfasdfuhf
    @asdfasdfuhf 6 лет назад

    Where'd u get that shirt?

    • @sentdex
      @sentdex  6 лет назад

      pythonprogramming.net/store/

  • @kirisko3067
    @kirisko3067 3 года назад

    does this work with emojis?

  • @srabanibiswas7285
    @srabanibiswas7285 Месяц назад

    Kindly share a tutorial on how to install VADER in python. Thank you.

  • @ashutoshpatole6262
    @ashutoshpatole6262 5 лет назад

    How about voice sentiment analysis @sentdex

    • @sentdex
      @sentdex  5 лет назад

      That's actually a really cool idea! I'll have to ponder on that

    • @PatrickBateman12420
      @PatrickBateman12420 5 лет назад

      @@sentdex Ever watched the TV series Lie to Me? They used it in profiling ...

  • @cruso2711
    @cruso2711 6 лет назад

    hey great video!
    Id be interested in the implementation of some unsupervised generative models, like GAN.

  • @nithiyashrees3456
    @nithiyashrees3456 2 года назад

    print(analysis.translate(from_lang="en",to='ta'))

  • @ankushsharma-gu7co
    @ankushsharma-gu7co 6 лет назад

    Thanks bro

  • @pysuhayb15
    @pysuhayb15 6 лет назад

    thank you sentdex
    please can get new series from android kivy

  • @bamber101
    @bamber101 6 лет назад

    If anyone could potentially help me with the issue in the link below, I'd be very appreciative.
    stackoverflow.com/questions/51398378/python-nlp-code-not-functioning-as-should

  • @iwannawatchDavid
    @iwannawatchDavid 6 лет назад

    Awesome more dash

  • @CristiNeagu
    @CristiNeagu 6 лет назад

    You are making the assumption that every single line in the text you're analyzing is positive or negative, respectively. A review can be a mix of positive, neutral, and negative statements. As such, your accuracy metrics are irrelevant.

    • @sentdex
      @sentdex  6 лет назад

      In the case of my sample data, the text I am analyzing *is* either positive or negative. In reality, not everything is, but, in this case....it is :P

    • @CristiNeagu
      @CristiNeagu 6 лет назад

      Ok, fair enough. I thought that each sample text is a review, when it looks like they are sentences picked from reviews.

    • @sentdex
      @sentdex  6 лет назад +3

      Yep, one file is all "positive" reviews and the other is all "negative" reviews. If you read them though, you'd probably suggest not all of them are clearly one way or the other. That's why I like using this set, it's a fairly realistic set that is quite challenging, and maybe even a bit noisy. Good for testing a classifier and it's confidence in scoring.

  • @yuanyuanfan6666
    @yuanyuanfan6666 4 года назад

    You made a mistake when using textblob the second time, which is why you kept getting 100% accuracy; it should not be 100% accuracy.

  • @bashisobsolete.pythonismyn6321
    @bashisobsolete.pythonismyn6321 6 лет назад

    eats shoots and leaves.

  • @asriomar11
    @asriomar11 6 лет назад

    dataset original source www.cs.cornell.edu/people/pabo/movie-review-data/

  • @dulalsandip7950
    @dulalsandip7950 6 лет назад

    good one bro..if possible make video on raspberry coded with python for camera and it detects the object and send you mail and message in your email

  • @johnnyboss1561
    @johnnyboss1561 6 лет назад

    Please update the GTA V bot!!!

  • @thorodinson7467
    @thorodinson7467 6 лет назад +1

    notification squaaad

  • @sandeepvk
    @sandeepvk 6 лет назад

    You should start by saying what it is all about. Sentiment Analysis in 4 Minutes by Siraj is better video

  • @xxXXCarbon6XXxx
    @xxXXCarbon6XXxx 6 лет назад

    Thanks for the comparison.
    I have used Textblob on call centre data and thought it was ok, but wondered if there were alternatives. I had never heard of Vader, only NLTK. Given Textblob's ability to do sentiment analysis, text classification and tokenisation I think I'll stick to the Blob. BTW it is interesting to use MatplotLib to scatter chart out sentiment vs polarity to see how your test data looks.