Text Classification Using Naive Bayes

Поделиться
HTML-код
  • Опубликовано: 23 окт 2024

Комментарии • 120

  • @nidustash6964
    @nidustash6964 5 лет назад +5

    It's been 3 years but man, your explanation and example just heal my weeks of depressing looking forward to understand this Bayesian and Feature Hashing algorithm. My thanks from Vietnam and thanks from my team member to you! Hope you're having a great life, sir!

  • @CensSka
    @CensSka 8 лет назад +2

    Awesome, thank you dude. My teacher's not very good at explaining and I was stuck trying to figure out what this is all about. You made it real easy

  • @jasonthomas2908
    @jasonthomas2908 8 лет назад +2

    +Francisco, correct me if I'm wrong, but this looks like multinomial naive bayes? Is this right? Do you think you should state this on the video?

  • @maqboolurrahimkhan
    @maqboolurrahimkhan 9 лет назад +3

    i was searching whole day that how to use naive bayes for text classification but i failed again n again but then i found this tutorial the most amazing and simplest tutorial brother u r great (Y) thanks

  • @deepaksingh9318
    @deepaksingh9318 4 года назад +1

    Wow.. What an explanation..
    The way uh explained things can make anyone work on algos by hands without even needing any ml library..
    Loved the way uh explained it step by step..
    Subscribed.. 👍..

    • @hassnain95
      @hassnain95 4 года назад

      loved your dataseries on medium :)

  • @MrDYLANCOPELAND
    @MrDYLANCOPELAND 9 лет назад +1

    Hey,
    Just wanted to say thanks.
    I appreciated this video, and found it helpful.
    Don't be deterred from making more videos because of low views - it was very easy to understand and follow along.
    Kudos!

    • @HimalayaGarg
      @HimalayaGarg 4 года назад

      Low views? See after 4 years...
      Motivation for others too 🙂

  • @edupyxel
    @edupyxel 8 лет назад +1

    You calculate P(+) as positive docs over total docs but I think you should calculate it as positive words over total words like 14/20=0.7 instead of the 3/5=0.6 you point. The way you do it you are mixing probabilities on docs and probs on words. If we calculate P(I|+)=(P(+|I)P(I))/P(+) it gives 1/14 using P(+)=14/20=0.7, instead if we use your P(+)=0.6 it gives us 1/12. Let me know if my thoughts on this are correct please.
    Thanks for the video!

  • @Garet43
    @Garet43 8 лет назад

    This is the best explanation of the Laplace smoothing method I've found in hours of searching. Thank you so much!!

  • @naraendrareddy273
    @naraendrareddy273 2 года назад

    Bro you're a lifesaver. Thanks

  • @JyoPari
    @JyoPari 8 лет назад +2

    Such a great video, it was really worth the watch!

  • @teamsarmuliadi6960
    @teamsarmuliadi6960 7 лет назад

    Concise and straight to the point. Thanks!

  • @annwang2990
    @annwang2990 4 года назад

    The best explanation! Thank you!

  • @cupidvogel
    @cupidvogel 8 лет назад +1

    I have one question - if in place of "I hated the poor acting", the sentence was "I hated the poor acting, the direction was even more poor and I hated that too", the words "hated" and "poor" are there like the previous example, but twice now. Where in the deduction will this be taken into account?

    • @cardmaverick
      @cardmaverick 8 лет назад

      +Kaustav Mukherjee look up Multinomial Naive Bayes.

    • @jasonthomas2908
      @jasonthomas2908 8 лет назад +1

      If a word occurs twice in a sentence, then the table the video showed would show the value 2 instead of 1. Then the formula is the same.

  • @prasenjitgiri919
    @prasenjitgiri919 8 лет назад

    Hi Francisco - thank you, but I wasn't able to understand what are 14 & 10. I tried to count everything but I wasn't able to get to those number. Would you kindly explain.

  • @alialgeboory7727
    @alialgeboory7727 8 лет назад +1

    do you have any video about algorithms SVM , decision tree ?

  • @krishnaprasad9378
    @krishnaprasad9378 4 года назад

    Acting word coming in both classes in once but while prob caliculating for acting word you considered only posstive class word but not at all considered negitive class word, any reason?

  • @deepakc83
    @deepakc83 2 года назад

    What are we supposed to do if we have a word repeated in the target sentence? And when a target word is not in the training sentence?

  • @hunglikehuang
    @hunglikehuang 9 лет назад +2

    Thanks for this video!
    I have a confusion about how you derived the formula for p(wk|+)=nk+1/n+|vocabulary|
    Intuitively, as you said, it seems like the formula should just simply be nk/n, the number of instances of that word in the positive case out of the total number of words in the positive case.
    You said the extra parts is so that if the word does not occur, the probability is not 0. Why would you not want the probability to be 0 if the word doesn't occur? If the word doesn't occur, the probability that it occurs is 0? That seems very intuitive.
    It seems very arbitrary to me. Could you explain it?

    • @fiacobelli
      @fiacobelli  9 лет назад

      it is a way to simulate the occurrence of a new word in a way that is less probable than the existing words.

    • @luisramirez8136
      @luisramirez8136 7 лет назад +1

      Because this example uses multinomial naive bayes, which is better for this kind of classifiers, since if you doesn't have some word at the training process, when you prove the model there is still a chance of being classified

    • @syedmdismail7478
      @syedmdismail7478 5 лет назад +1

      Suppose If there is a sentence like "How are you doing" and we don't have 'you' in our vocabulary list, then the probability of whole sentence might become 0( P(How/+)* P(Are/+) * 0 * P(Doing/+)) because of the absence of one new word. So I think it is a concept of Laplacian smoothening, where even the absence of a word in the current vocabulary list won't hurt the probability of a sentence.
      Please don't hurl me with negative comments in case I am wrong, I read that topic today and thought that this situation relates to it

    • @vishalrajmane5857
      @vishalrajmane5857 5 лет назад

      @@syedmdismail7478 You are right bro.

  • @muditjha7235
    @muditjha7235 6 лет назад

    When finding P(word|class), why is the denominator n+ Vocabulary instead of just n?

  • @rrmm122
    @rrmm122 7 лет назад +5

    At 8:50, the word movie occurs 4 times, so n_k = 4, and not 2.

  •  4 года назад

    Why is used "number of times the word appears in that class/total number of words in the class" instead of using "number of times the word appears in that class/total number of documents of the class" ?
    With the first option maintaining independence among features?

  • @Trynx91
    @Trynx91 8 лет назад

    Can we pretend that in the new sentence the word poor occures twice? Do I have to calculate the propability P(poor|+) and P(poor|-) twice, then? The word poor is a identification that the senteces is critique and belongs to - . So having the word "poor" more often in one sentence has enlarge to propability that it belongs to - . Am I right?

  • @someirrelevantthings6198
    @someirrelevantthings6198 7 лет назад

    what if some other words comes into picture which are not in training data.suppose the movie is excellent. here the excellent is not in training data then whats the probability we have to take for it

  • @NelsonZepeda733
    @NelsonZepeda733 3 года назад

    Great explanation!

  • @tpof314
    @tpof314 8 лет назад

    Thanks for this fantastic video. It is quite clear and really really easy to understand.

  • @AtifImamAatuif
    @AtifImamAatuif 5 лет назад

    Before converting the texts into vector why didn't you remove the stop words?

  • @musickeys1838
    @musickeys1838 7 лет назад

    Dear Fransisco,
    I have a doubt in this video.Consider a statement, "I am not happy". Actually the statement is negative but how the words 'not' and 'happy' is processed. How this statement is classified?

  • @unboxordinary
    @unboxordinary 7 лет назад

    but assigning a very short number to unknown words in query can make product roughly equals to 0 , can't we simply neglect it? Very nice explanation though, loved it.

  • @aysegulsayn7984
    @aysegulsayn7984 7 лет назад +5

    Thanks for this video!
    I'm trying to make 3 classes; positive,negative and neutral with naive bayesian. How can implement this tutorial?

    • @mishalgautam2697
      @mishalgautam2697 5 лет назад +1

      Can You please fwd a detailed report on this topic , i need it for my project, kindly make a link available for the same . You can help me tremendously . Thanks .

  • @floweraddictxx6238
    @floweraddictxx6238 3 года назад +1

    In actual program can we leave the data like that? Don't we need to convert the word to numbers so the computer can read the data?? What about tf-idf?? Please im really confused now

    • @fiacobelli
      @fiacobelli  3 года назад +1

      You can do any number of optimizations on this. For example, giving each word an integer Id, saving only nonzero probabilities, and if you want you can use tfidf instead of word presence. Then you would have to discretize those values. All those are optimizations on the basic algorithm.

    • @floweraddictxx6238
      @floweraddictxx6238 3 года назад +1

      @@fiacobelli thanks sir 👍👍

    • @kris3246
      @kris3246 3 года назад

      @@fiacobelli so tf idf function just like an optimization for the algorithm??

  • @Geetwilight
    @Geetwilight 8 лет назад

    This is so good! do you have any suggestion how can I learn to implement this? so i have a corpus that i need to classify into more than 2 categories. hope u can help! thanks!

  • @khld163
    @khld163 6 лет назад +1

    I have implemented this code in Python after watching this tutorial. I hope you won't mind if I attach your tutorial link in my description.

    • @ahsanhabib5521
      @ahsanhabib5521 5 лет назад +1

      Will you please give me the link of your code(python). It will be very much helpful

  • @mihirjoshi2187
    @mihirjoshi2187 7 лет назад +4

    If I am not wrong this is multinomial naive bayes?

    • @unboxordinary
      @unboxordinary 7 лет назад +1

      yes it is

    • @insane2539
      @insane2539 5 лет назад

      @@unboxordinary at 13:50 what does he mean by if the value is positive in p(+)? which value is talking about?

    • @unboxordinary
      @unboxordinary 5 лет назад

      @@insane2539 lol i forgot. now m graduated :P

  • @AnAN-bn1ol
    @AnAN-bn1ol 5 лет назад

    do you have the dataset to compare the document to what you show

  • @floweraddictxx6238
    @floweraddictxx6238 3 года назад +1

    Please where can i get a source code about analysis classification naive bayes??

  • @trungho1989lx
    @trungho1989lx 8 лет назад

    Hi Francisco lacobelli,
    Thanks for the video. There is one thing I didn't understand which is -7 and -5 at 13:09 minute. Can you clarify this for me, please?

    • @jasonthomas2908
      @jasonthomas2908 8 лет назад

      Did you get this in the end @Kenny? The -7 and -5 refer to the position of the decimal point. The -7 means the decimal point moves seven spots to the left. So therefore, -5 is larger than -7 and the second number is larger

  • @khushboomrugendershah3786
    @khushboomrugendershah3786 7 лет назад

    Is this the same as Bernoulli classification ?

  • @yusufxyz9144
    @yusufxyz9144 6 лет назад

    thank u so much, this video really r helping me out to understand this algo.. Hope u gonna answer my question if I've problem further with this case

  • @akshayakki3631
    @akshayakki3631 3 года назад

    How to use this theoram in digital marketing?

  • @luis96xd
    @luis96xd 6 лет назад

    Excellent video! Thanks!

  • @rachittrivedi5243
    @rachittrivedi5243 7 лет назад

    I made a program in which I have a list of positive and negative words list. I am doing sentiment analysis based on the weight of that. Is it KNN algorithm or not??

    • @yashbansal1548
      @yashbansal1548 6 лет назад

      This is an example of supervised learning algorithm, where you have been provided the training sets with class labels. So at the start there has been given few sentences which are already labelled as +ve and -ve and on that basis we already know that good belongs to +ve class and hated is from -ve class

  • @ussefnames1536
    @ussefnames1536 7 лет назад +1

    hi , please can you tell me how to deal with the "dont" case ????

  • @aakritigupta5841
    @aakritigupta5841 7 лет назад

    Thanks.. nice video... quite easy to understand.

  • @stevemartin7775
    @stevemartin7775 8 лет назад

    any idea on how I can assign values to the unknown words? any algorithm?

  • @dheerajnair1998
    @dheerajnair1998 7 лет назад

    Thank you so much for the wonderful explanation :)

  • @soobinkim8960
    @soobinkim8960 7 лет назад

    Great explanation! Thank you

  • @yu-anchung6769
    @yu-anchung6769 8 лет назад

    Great tutorial, thanks for your effort! Can you provide the slides?

  • @fiacobelli
    @fiacobelli  9 лет назад +3

    Thanks!

  • @arunjose8687
    @arunjose8687 8 лет назад

    Nice video.... Thanks for the explanation....

  • @Dosflamingos
    @Dosflamingos 7 лет назад

    you are a life saver, thanks

  • @anuragmiglani
    @anuragmiglani 8 лет назад

    Where can I get the slides on this video?

  • @lionheart2352
    @lionheart2352 8 лет назад

    great explanation ! thanks.

  • @shahrzadkananizadeh7442
    @shahrzadkananizadeh7442 5 лет назад

    great video! Thanks

  • @ravi_krishna_reddy
    @ravi_krishna_reddy 6 лет назад

    Great explanation, thanks a lot. :)

  • @TimJosephRyan
    @TimJosephRyan 9 лет назад

    Really helpful video, thank you

  • @larissafeliciana7071
    @larissafeliciana7071 6 лет назад

    How can i use tf-idf with naive bayes?

  • @janithasarangakapilarathna1969
    @janithasarangakapilarathna1969 9 лет назад

    Thanks for a nice tutorial with simple

  • @XYZmmc
    @XYZmmc 4 года назад

    thank you very much

  • @sanjay.choudhary
    @sanjay.choudhary 7 лет назад

    thanks for this video now my all doubts is clear

  • @systemsoftwareandcompilers3440
    @systemsoftwareandcompilers3440 5 лет назад

    Well explained. Thank you sir

  • @MrWilliWin
    @MrWilliWin 6 лет назад

    great video!

  • @someirrelevantthings6198
    @someirrelevantthings6198 7 лет назад

    y dont remove the STOP words like "i"

  • @sarthakpawar1477
    @sarthakpawar1477 8 лет назад

    very good explanation

  • @tmpcox
    @tmpcox 4 года назад

    live savior!

  • @Kgotso_Koete
    @Kgotso_Koete 6 лет назад

    Life saver!

  • @KJ..
    @KJ.. 9 лет назад

    Thank you for this video :)

  • @GauravSHegde
    @GauravSHegde 9 лет назад

    Nice video! Thank You! :)

  • @beibut6799
    @beibut6799 9 лет назад

    Can you upload your lecture slides (PPT)?

  • @MrMdellias
    @MrMdellias 8 лет назад

    Thanks a lot! You helped me a lot in understanding the concept of NB in text classification.
    However, I have a question in classification of text by topic, for example I have 5 texts. I want to classify each of them by topic e.g. politic, joke, advertisement, entertainment and health.
    Is this mean I have to prepare 5 training dataset for politic, joke, advertisement, entertainment and health e,g, politic (yes/no), joke (yes/no) and so...
    thanks in advance

  • @shubhamjain120
    @shubhamjain120 5 лет назад

    Thanks

  • @rakshataamberker612
    @rakshataamberker612 8 лет назад

    Thank you

  • @MayankKumar_DataScience
    @MayankKumar_DataScience 7 лет назад

    very good explanation :)

  • @alicjasanestrzik2174
    @alicjasanestrzik2174 8 лет назад

    Man, I love you for this video!

  • @nasratullahzia3668
    @nasratullahzia3668 5 лет назад

    thanks sir for a breif explination .
    can any one help me ,how to find its result when the data is dependent on each other?

    • @matinebrahimkhani8038
      @matinebrahimkhani8038 5 лет назад

      of course, you can use different algorithms for classifying documents. can you say how your data ("words") are dependent on each other?

  • @alialgeboory7727
    @alialgeboory7727 8 лет назад

    thank you so mach

  • @rianasmaraputra
    @rianasmaraputra 5 лет назад

    how to get value 6.03 x 10 -7 ?

    • @tamilselvanj
      @tamilselvanj 5 лет назад

      Multiply all the values for the words with prior

  • @sathyacharanya.c2743
    @sathyacharanya.c2743 6 лет назад

    Sir I need the calculation for I HATED THE POOR ACTIVITY
    Pls explain that

  • @Naveen_2580
    @Naveen_2580 5 лет назад

    Helloo sir...can you sent me the pdf or ppt of this video?

  • @sawsanalshakarchi4265
    @sawsanalshakarchi4265 7 лет назад

    can i have Source code in c#

  • @davidlanday6102
    @davidlanday6102 7 лет назад

    I think n is actually equal to 13 not 14

  • @JK-sy4ym
    @JK-sy4ym 8 лет назад +1

    Well explained although some minor errors. Thank you!

  • @davetetreault6795
    @davetetreault6795 5 лет назад

    Super sick video, fix that damn mouse

  • @avishdev9028
    @avishdev9028 7 лет назад

    can u send me your presentation pls

  • @denoevrm1083
    @denoevrm1083 6 лет назад

    gardaş ingilizce anlamıoz

  • @dimar4150
    @dimar4150 6 лет назад

    He trips a lot over his words, muddles the comprehension experience

  • @HariKrishnaReddy7696
    @HariKrishnaReddy7696 7 лет назад

    what if some other words comes into picture which are not in training data.suppose the movie is excellent. here the excellent is not in training data then whats the probability we have to take for it

  • @HariKrishnaReddy7696
    @HariKrishnaReddy7696 7 лет назад

    what if some other words comes into picture which are not in training data.suppose the movie is excellent. here the excellent is not in training data then whats the probability we have to take for it

  • @HariKrishnaReddy7696
    @HariKrishnaReddy7696 7 лет назад

    what if some other words comes into picture which are not in training data.suppose the movie is excellent. here the excellent is not in training data then whats the probability we have to take for it

  • @HariKrishnaReddy7696
    @HariKrishnaReddy7696 7 лет назад

    what if some other words comes into picture which are not in training data.suppose the movie is excellent. here the excellent is not in training data then whats the probability we have to take for it

  • @HariKrishnaReddy7696
    @HariKrishnaReddy7696 7 лет назад

    what if some other words comes into picture which are not in training data.suppose the movie is excellent. here the excellent is not in training data then whats the probability we have to take for it

  • @HariKrishnaReddy7696
    @HariKrishnaReddy7696 7 лет назад +1

    what if some other words comes into picture which are not in training data.suppose the movie is excellent. here the excellent is not in training data then whats the probability we have to take for it

    • @fiacobelli
      @fiacobelli  7 лет назад +1

      Hari Krishna Reddy check 15:24

    • @HariKrishnaReddy7696
      @HariKrishnaReddy7696 7 лет назад

      sorry i didnt concentrated at last minutes.sorry to bother you.
      can u make a video on meeting such scenarios

    • @unboxordinary
      @unboxordinary 7 лет назад

      but assigning a very short number can make product roughly equals o 0 , can't we simply neglect it?

    • @HariKrishnaReddy7696
      @HariKrishnaReddy7696 7 лет назад

      Pranav Sarda I have neglect the words which are not present in training data, in my sentimental analysis project

    • @unboxordinary
      @unboxordinary 7 лет назад

      ohk, coz we are ultimately going to check which product is the higher one , so not taking that word in any product won't make any difference, thanks for that :) (fo eg. 3*4 > 2*4 but simply 3>2)