Real-World Python Machine Learning Tutorial w/ Scikit Learn (sklearn basics, NLP, classifiers, etc)

Поделиться
HTML-код
  • Опубликовано: 2 окт 2024

Комментарии • 315

  • @KeithGalli
    @KeithGalli  5 лет назад +92

    Video outline!
    0:20 - What we will be doing!
    3:40 - Sci-Kit Learn Overview
    6:38 - How do we find training data?
    9:33 - Download data
    11:45 - Load our data into Jupyter Notebook
    16:38 - Cleaning our code a bit (building data class)
    20:13 - Using Enums
    22:50 - Converting text to numerical vectors, bag of words (BOW) explanation
    25:45 - Training/Test Split (make sure to "pip install sklearn" !)
    33:45 - Bag of words in sklearn (CountVectorizer)
    40:05 - fit_transform, fit, transform methods
    42:05 - Model Selection (SVM, Decision Tree, Naive Bayes, Logistic Regression) & Classification
    47:50 - predict method
    53:35 - Analysis & Evaluation (using clf.score() method)
    56:58 - F1 score
    1:01:01 - Improving our model (evenly distributing positive & negative examples and loading in more data)
    1:20:36 - Let's see our model in action! (qualitative testing)
    1:22:24 - Tfidf Vectorizer
    1:25:40 - GridSearchCv to automatically find the best parameters
    1:31:30 - Further NLP improvement opportunities
    1:32:50 - Saving our model (Pickle) and reloading it later
    1:36:37 - Category Classifier
    1:39:14 - Confusion Matrix
    Thank you for watching! Make sure to like & subscribe if you enjoyed :)

    • @shagufta3247
      @shagufta3247 5 лет назад +1

      thanks so much
      please make videos on Django python full tutorial using visual studio

    • @girishvenkatachalam8793
      @girishvenkatachalam8793 4 года назад +1

      Thanks man

    • @mimididi8689
      @mimididi8689 4 года назад +1

      Is there anyway I could import another random dataset into my trained model and see if he can predict me the category from the other database (the one I used to trained my model)

    • @joneandrewharris8225
      @joneandrewharris8225 4 года назад

      can you help out with my error in the comments

    • @alexvidu4517
      @alexvidu4517 3 года назад

      This is glorious, been searching for "learn tennis betting game" for a while now, and I think this has helped. Ever heard of - Aiyenjamin Prefatory Approach - (should be on google have a look ) ? It is a good one of a kind guide for discovering how to get a unique tennis betting formula minus the hard work. Ive heard some super things about it and my buddy got amazing results with it.

  • @mucahitugurlu7324
    @mucahitugurlu7324 4 года назад +50

    you're the reason that I've got an internship in a great company :) well.. I'm broke now :D but when I earn tons of money( I hope we all do :D ) I'll donate you Keith !

    • @xyphoes345
      @xyphoes345 2 года назад

      How are you doing now, man?
      Any updates?

    • @joelrichmond6256
      @joelrichmond6256 2 года назад

      Hey man how you doing now

    • @KeithGalli
      @KeithGalli  2 года назад +6

      Doing well now finally! :). Will be back on youtube very soon

    • @uchechukwumazi6512
      @uchechukwumazi6512 2 года назад

      A quick one for those into machine learning. On a scale of 1-10 how sufficiently enough does this tutorial cover machine learning. I am developing certain skills in data analytics and wanted to add Machine learning into the mix but don’t want to start diving too much into it. Just the necessary I will be need for a day in day out machine learning job requirements

    • @uchechukwumazi6512
      @uchechukwumazi6512 2 года назад

      A quick one for those into machine learning. On a scale of 1-10 how sufficiently enough does this tutorial cover machine learning. I am developing certain skills in data analytics and wanted to add Machine learning into the mix but don’t want to start diving too much into it. Just the necessary I will be need for a day in day out machine learning job requirements

  • @AmandeepSingh-cv5qz
    @AmandeepSingh-cv5qz 2 года назад

    keith ,you are like an elder brother teaching us how to do sums.thanksssssssssssssssssssssssssss a lottttttttttttttttttttttttttt bruhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh

  • @MrTaken-tl4bw
    @MrTaken-tl4bw 3 года назад

    In the first exercise if any of you feels like laughing a bit do this:
    if float(review['overall']) < 2:
    print(review['reviewText']+ '
    ')
    Also, great video! Didn't know I could enjoy Data Science as much as I am.

  • @ManishSharma-xq9be
    @ManishSharma-xq9be 5 лет назад +168

    He not only teaches the good stuff but also teach how to google things and get the job done.
    Keep going brother!. You are Awesome.

    • @KeithGalli
      @KeithGalli  5 лет назад +32

      My goal is for you guys to be able to do this type of stuff on your own! Thanks for the support man, I appreciate it :)

    • @shuhratjonzikiryaev9685
      @shuhratjonzikiryaev9685 3 года назад +2

      Yes, I agree with you 100%. He is the only person I know on youtube that actually teaches the material so well! I hope to see this channel grow to millions of subscribers.

    • @niteshprajapat7918
      @niteshprajapat7918 3 года назад +1

      yess exactly.... I was confused how to use stackoverflow...but after watching his real world problem tutorial.. I learnt this skill too

  • @BennyHarassi
    @BennyHarassi 5 лет назад +55

    Please keep uploading you're one of the best tutorial channels.

    • @KeithGalli
      @KeithGalli  5 лет назад +4

      Thank you!! Will do my best

  • @DataversePH
    @DataversePH 4 года назад

    This video is so underrated. Should have atleast 500K views.

  • @jenn6997
    @jenn6997 4 года назад +27

    I like it when you showed us how you would use online resources, all the Googling and documentation stuff, so that we are not afraid to actually go online ourselves and explore more new functions :) Thanks Keith!! Stay healthy! :)

  • @overgeared
    @overgeared 4 года назад +1

    practical and nicely done. thanks! please do more videos on sklearn, maybe regression & clustering...

  • @khalifaothman3963
    @khalifaothman3963 4 года назад +4

    from sklearn.naive_bayes import GaussianNB
    clf_gnb = GaussianNB()
    clf_gnb.fit(train_x_vectors, train_y)
    clf_gnb.predict(test_x_vectors[0])
    what is the fault
    TypeError: A sparse matrix was passed, but dense data is required. Use X.toarray() to convert to a dense numpy array.

    • @WeAreSWAGent
      @WeAreSWAGent 4 года назад

      having the same issue

    • @WeAreSWAGent
      @WeAreSWAGent 4 года назад +5

      fixed it. clf_nb.fit(train_x_vectors.todense(),train_y)
      clf_nb.predict(test_x_vectors.todense()[0])

    • @neila.9195
      @neila.9195 4 года назад

      @@WeAreSWAGent Thanks!

  • @resh8198
    @resh8198 4 года назад +6

    Hello if your Gaussian naive Bayes keeps coming up with an error, try this:
    from sklearn.naive_bayes import GaussianNB
    clf_gnb = GaussianNB()
    clf_gnb.fit(train_x_vectors.todense(), train_y)
    clf_gnb.predict(test_x_vectors[0].todense())

  • @barbaracosta4183
    @barbaracosta4183 4 года назад +31

    Thanks for the cool tutorial! Just a quick correction: when you're classifying using Naive Bayes, you used the Decision Tree Classifier, copied from the previous case. It's not super critical, but when I tried to use what I believed to be the corrected version, I found an unexpected error, and to use a working Naive Bayes, had to convert the train vector to a dense matrix using ".todense()".
    I'm not sure if this is correct though, if you have any input on this, it would be greatly appreciated! Thanks again :)
    How I tried to do it:
    clf_gnb = GaussianNB()
    clf_gnb.fit(train_x_vectors.todense(), train_y)
    clf_gnb.predict(test_x_vectors[0].todense())

    • @WhiteError37
      @WhiteError37 4 года назад

      thanks dude

    • @choutos404
      @choutos404 4 года назад +4

      Not working for me, I used .toarray() instead.

    • @extremelyfunnyvideos9052
      @extremelyfunnyvideos9052 4 года назад +1

      I also had a problem with this.
      Where should we insert the fix?
      this : clf_gnb.predict(test_x_vectors[0].todense()) .. seems to late
      Anyone manage to get it working?
      Thanks

    • @SasanMaleki
      @SasanMaleki 4 года назад

      Thank you!

    • @timmdunker8507
      @timmdunker8507 4 года назад +1

      Thanks! :)

  • @vilasjagtap6165
    @vilasjagtap6165 4 года назад

    Great stuff Keith. Really good. Keep doing your bit for all of us. Thanks a lot.

  • @tobiasksr23
    @tobiasksr23 5 лет назад +1

    Awesome. Are you planning making more of this Machine Learning Videos? It would be great if you could include more about the preprocessing part, maybe trying to get data from a source where it is not ordered and with lot of outliers.

  • @deufrai1
    @deufrai1 Год назад +5

    50 y.o. software developer here.
    this is the first hands on video I watch on the subject of ML.
    As a first step into the subject, I'm very sarisfied with the time I spent with you.
    You covered the basics, from data prep to model save and load.
    Surely a good starting point for further personal explorations.
    Also enjoying your Pandas related content
    Keep up the good work, and maybe use Jupyter's tab-completion, sometimes ;)

  • @FraserMyersMusic
    @FraserMyersMusic 5 лет назад +30

    I was waiting for this! You sir, are a legend

    • @stickmanjournal
      @stickmanjournal 2 года назад

      @wise guy I think discrete math would help you grasp this

  • @sunritjana4573
    @sunritjana4573 3 года назад +9

    I have been doing a lot of courses for ML in scikit, I found this last week, and learnt it. And to be honest, I mastered things, which they couldn't cover in the so-called "mega" courses. You're awesome and also really helpful!

    • @rawmetal3052
      @rawmetal3052 3 года назад +5

      This guy is like the human version of W3school, his content is simple, succinct and well thought out

  • @imadudin1522
    @imadudin1522 Месяц назад +1

    Big thank for you, Keith...!!!
    May be just requesting for pytorch tutorial.

  • @hemanthshankar4520
    @hemanthshankar4520 3 года назад

    i really like the way u explain

  • @shoaibsh2872
    @shoaibsh2872 4 года назад +1

    It was a really good video. But I have this one question. Why didn't we change the sentiment into numbers (i.e: "NEGATIVE=0, POSITIVE=1) and then try to fit the model. How does the model can understand the string values?

  • @DJSEWWES
    @DJSEWWES 2 года назад

    big fan of what you are doing keep it up (y)

  • @Max-my6rk
    @Max-my6rk 4 года назад +14

    Keith Galli: "I'm going insane!" ahahah

  • @robingeorge7217
    @robingeorge7217 5 лет назад +6

    Please upload a real world predictive model project

  • @borisljevar3126
    @borisljevar3126 Год назад

    I suppose that the Bag of Words representation doesn't consider the word order in the sentence. Is that correct?
    While it generates distinct vectors for the following examples:
    "The book is good, but ..."
    "The book is bad, but ..."
    It fails to distinguish between the following two examples:
    "The book is not good, but ..."
    "The book is good, but not ..."
    Is there an alternative representation that considers the actual word order in the sentence?
    If not, I would suggest preprocessing the input data to form pairs (and/or triples) of adjacent words by concatenating them (thus treating them as a single word). This approach would capture common linguistic associations. Do you believe this could enhance the model, or might it introduce other issues?

  • @michaelpritchard2350
    @michaelpritchard2350 Год назад

    Keith has got it DOWN! Very instructive, thank you.

  • @dhruvmk3055
    @dhruvmk3055 4 года назад +3

    Great video, but I tested a few other algorithms on the data-set and they seemed to work even better on the data. The algorithms were: Nearest Centroid Classifier and Stochastic Gradient Descent. Thanks for the video though, really helped me.

  • @jenn6997
    @jenn6997 4 года назад +3

    Phew, finally finished watching this one:) A lot to take in, but super helpful and interesting! Thanks, Keith! :) Gonna start your real-world task with Pandas tomorrow!

  • @chuanjiang6931
    @chuanjiang6931 4 года назад +1

    Improving accuracy by balancing test data does not mean your model is robust enough. I expected your model would have worked well even for unbalanced test data. Do you have any suggestions on how to do that?

  • @Max-my6rk
    @Max-my6rk 4 года назад +3

    i always am being directed back and stay at Keith's video... just awesome...

  • @celina6204
    @celina6204 4 года назад +3

    Loved this video! I followed along writing my own code and it helped me put what I've learned into practice. Thank you so much for the practical advice, I can't wait to start on my own projects. Liked and subscribed! :)
    p.s. Did anyone have issues getting the output from the GridSearch portion? That was the only part that messed up for me.
    My output:
    GridSearchCV(cv=5, estimator=SVC(),
    param_grid={'C': (1, 4, 8, 16, 32), 'kernel': ('linear', 'rbf')})

    • @adarshparihar3300
      @adarshparihar3300 2 года назад

      I was also having this problem, there was no result from GridSearchCV, hope you have got the solution for it.

    • @uchechukwumazi6512
      @uchechukwumazi6512 2 года назад

      Hello. Good day. A quick one for those into machine learning. On a scale of 1-10 how sufficiently enough does this tutorial cover machine learning. I am developing certain skills in data analytics and wanted to add Machine learning into the mix but don’t want to start diving too much into it. Just the necessary I will be need for a day in day out machine learning job requirements

  • @MrDaveaneo
    @MrDaveaneo 4 года назад +1

    I don't like seeing all that positive data go to waste. Is it beneficial/worthwhile to create synthetic data (?) based off of the existing negative reviews so that no positive data goes to waste? Or what about altering the training parameters to put 10x more weight into incorrect positives, a ratio that more closely resembles our data?

    • @uchechukwumazi6512
      @uchechukwumazi6512 2 года назад

      Hello! A quick one for those into machine learning. On a scale of 1-10 how sufficiently enough does this tutorial cover machine learning. I am developing certain skills in data analytics and wanted to add Machine learning into the mix but don’t want to start diving too much into it. Just the necessary I will be need for a day in day out machine learning job requirements

  • @atharvapatil7549
    @atharvapatil7549 4 года назад +3

    This was really helpful, I have been watching your videos since last few days. They are really aweseome. Subscribed. Can you please make a video for Face recognition using CNN or suggest me a link to watch.

    • @bhoomimukadam8221
      @bhoomimukadam8221 4 года назад +1

      @keithgalli even I want to learn face recognition using cnn plz make a video for that

    • @vedantvraj
      @vedantvraj 4 года назад +1

      Yes Please make a video on Face Recognition.

  • @mosesmasinya3145
    @mosesmasinya3145 4 года назад +1

    Thanks keith..I have issues with my Jupyter notebook can't recall the modules have installed in the terminal..Help Asap

  • @hiukecil
    @hiukecil 4 года назад +1

    Hi keith, what a great video.
    I think i know why the result of SVC is still the same. Because the best parameters are c=1 and kernel= linear, not rbf. You can check the best parameter of the gridsearchcv by running clf.best_params_

  • @maherelouahabi6440
    @maherelouahabi6440 5 лет назад +2

    Correct me if I am wrong but I think that there is a problem. The vectorized version of the text are different between test and training because they come from differente dictionary. You can check it. If you get the length of any vectorized version of the train text and you compare it with the vectorized version of the test text, they are different. I think that the vectorized version you should get it before the split.

    • @KeithGalli
      @KeithGalli  5 лет назад +2

      Make sure you didn't "fit" or "fit_transform" to the test vectors. We want to only fit the vectorizer to the training data, then simply "transform" the test data. In the real world, our training data is the only thing we get to see. We create a dictionary on that and then use that dictionary on any incoming test vector. Some words during test won't be in the dictionary and that's life, we won't be able to handle those properly. Our hope is that our training set was big enough and included enough words that during test time we can use that same vectorizer and get good results. Hopefully this makes sense.

    • @maherelouahabi6440
      @maherelouahabi6440 5 лет назад

      @@KeithGalli ooooooh man! OK ok hahah I find my error. Thank you Keith I was fitting two times and getting two different dictionaries haha. Thank you!!

  • @saptarshisanyal4869
    @saptarshisanyal4869 2 года назад +2

    This one is just one heck of tutorial. Thanks a ton Keith. I am a Java Architect with 17 years of extensive experience, looking to shift to ML/Data Science. It took me 3 hours to cover this video. I must say first one hour was realy easy to follow but probably you covered a lot of things in the last 40 minutes.

  • @rarethamaren913
    @rarethamaren913 3 года назад +1

    great tutorial keith. you are incredible !!
    anyways, do you have any book recommendation for studying? I'm still a new in machine learning so, it would be nice if I read a lot of book first than start studying machine learning in practically. thanks in advance!!

  • @maherelouahabi6440
    @maherelouahabi6440 5 лет назад +2

    Could you make a short video explaining what are the differences between deep learning, machine learning and AI from your point of view. Thank you and good luck

    • @andyn6053
      @andyn6053 4 года назад +2

      learn to google man

  • @mohitkishore8494
    @mohitkishore8494 3 года назад +3

    This is by far the most useful tutorial that I have ever seen. You are an amazing teacher.

  • @gregmaland5318
    @gregmaland5318 3 года назад +1

    Thank you! This was extremely helpful. (POSITIVE)

  • @haraldlons
    @haraldlons 4 года назад +9

    Just watched the video in one sitting. It was great! I learned so much, and I loved you showed the entire process from data to evaluation of model. Keep up the good work :)

    • @KeithGalli
      @KeithGalli  4 года назад +2

      Thank you! Glad it was helpful :)

  • @safizaidi2787
    @safizaidi2787 4 года назад +1

    Keith man. This is an awsome video. Please make some more videos just like you did "Solving real world data science task" video.

  • @briannnnnnnnnn1037
    @briannnnnnnnnn1037 4 года назад +2

    This is great! Looking forward to more ML content like regression, decision trees, SVM.

  • @matrix_root
    @matrix_root 4 года назад +2

    Basically, very cool video, Keith!
    But for NLP I recommend to use spacy.io
    Also waiting more deeply videos about ML.
    Can you make any about PyTorch or TensorFlow in feature please?

  • @aligh18
    @aligh18 4 года назад +2

    Wow Keith, you're an absolute legend! I can't wait to get through your other videos and see your future work :D

  • @hollmanbaez1423
    @hollmanbaez1423 4 года назад +2

    You are so good, explaining the hardest things in common language and makes it easy to understand to even my grandma.... Thanks so much for making this simple!

  • @JMatthews
    @JMatthews Год назад

    phew.. that was heavy, not sure if you were rushing in with a lot of things, but then maybe there's so much to cover there anyways. You did well in spite of all anyways, but yeah there's a lot to pick up from this video.. good work

  • @shawnjames3242
    @shawnjames3242 4 года назад +1

    Is a previous knowledge of machine learning required for this video to be helpful ???

    • @KeithGalli
      @KeithGalli  4 года назад

      It might help, but it's not required!

    • @shawnjames3242
      @shawnjames3242 4 года назад

      @@KeithGalli Ok keith, Tnx 😊

  • @extremelyfunnyvideos9052
    @extremelyfunnyvideos9052 4 года назад

    I am not getting any recommendation from GridSearchCV
    My code:
    from sklearn.model_selection import GridSearchCV
    parameters = { 'kernel': ('linear', 'rbf'), 'C': (1,4,8,16,32)}
    svc=svm.SVC()
    clf=GridSearchCV(svc, parameters, cv=5)
    clf.fit(train_x_vectors, train_y)
    The response:
    GridSearchCV(cv=5, estimator=SVC(),
    param_grid={'C': (1, 4, 8, 16, 32), 'kernel': ('linear', 'rbf')})
    Nothing else !! Did I miss something, or has this stopped working?
    Thanks

  • @constantineveres3731
    @constantineveres3731 4 года назад +1

    Hey Keith. Looks like the issue with relatively low score (~80%) is caused by imperfection of training data. I'm talking about conversion of Star rate to one of three classifiers: NEGATIVE, POSITIVE, NEUTRAL. The problem is that Stars are assigned by Customers but not Amazon AI Engine. People are treating say 3 star rate in very different way. Even if Customer is not really happy with product and giving fairly negative feedback, he/she still can provide 3 stars rate. So, while 5-4 stars rate is working well for POSITIVE as well as 2-1 stars - for NEGATIVE, there is a little bit uncertainty with 3 Stars rate. I think (5-4-3 stars for POSITIVE and 2-1 stars for NEGATIVE) or (5 stars for PERFECT, 4-3 for POSITIVE and 2-1 NEGATIVE) logic should give us 90-95% score. Thoughts?

    • @KeithGalli
      @KeithGalli  4 года назад +2

      Yeah you're very right with your thoughts. The meaning of the 3-star classification is pretty ambiguous and we can't reliably count on the data rated this way. Ultimately though the models that were being scored with ~80% were only classifying between NEGATIVE (1-2 star) & POSITIVE (4-5 star) so our model had more issues than just how we categorized the data. If we want to get that score up higher we will want to apply some additional processing to our text. Some ideas would be removing stop words (words like "the", "this", "that", etc), lemmatizing/stemming (converting words to a base form), and utilizing bigrams (pairs of words instead of single words). Another reason for a relatively low score is that our data is not perfect. Even some of the 5 star reviews probably have no meaningful information that conveys positive sentiment in their review text. Same goes for 1 star reviews. Potentially doing some manual review of our training data would be another way to improve the score. Hope this information is helpful!
      BTW, I'm a huge hockey fan and after noticing the hat in your profile picture I have to quickly say.... Go Bruins!!! ;)

  • @veronicaalejandramas6641
    @veronicaalejandramas6641 3 года назад

    I have the impression that the transformation from text to vector has to be done before doing the "evenly distribution of the training sample because we would otherwise have different word to vector transformation, does that make sense?

  • @sonuanand77
    @sonuanand77 3 года назад

    Any Idea, why linear regression doesn't work
    from sklearn.linear_model import LinearRegression
    clf_log = LinearRegression()
    clf_log.fit(train_x_vectors, train_y)
    clf_log.predict(test_x_vectors[0])
    error -->ValueError: could not convert string to float: 'POSITIVE'

  • @alexanderscott2456
    @alexanderscott2456 4 года назад +1

    43:00
    He's referring to Patrick Winston. By sheer chance I was watching one of his lectures on YT early this morning.

  • @lokeshnagarajan7495
    @lokeshnagarajan7495 4 года назад +1

    Amazing video. One won't find such tutorial on Python and Machine learning modules. It's the very video helped to complete my project.

  • @deepakkumarasawa9970
    @deepakkumarasawa9970 3 года назад

    Hi, Sorry got some doubt. Why have you used DecisionTressClassifier() in Naive Bayes. Instead why we are not using GaussianNB(). in[37] code

  • @shahinshafibeyli1372
    @shahinshafibeyli1372 2 года назад

    Thanks Keith ,for tutorial. I tried to do something more with imbalanced data(well,at least i though in this was :) ).I tried SMOTEEN ,SMOTE + tomek,smote,combined over and under sampling ,neither of them showed better perfomance than yours,by manully distributing. All these i tried on data ,loaded as pandas dataframe .
    In addition,i tried almost the same exact things,on dataframe,that you did on json .f1 scores was 2 times lower. Now i wonder, is there any difference between json and pandas dataframe?Could you make it clear?Perhaps you also tried it ,to load as dataframe. Keen on to know your opinion. Thanks

  • @taruvinga
    @taruvinga 5 лет назад +2

    Great video. Thank you very much. Is there a way of importing the contents of the variable file_name into a dataframe. So instead on print(line) you direct output into a data frame?

    • @KeithGalli
      @KeithGalli  5 лет назад +1

      You're very welcome, glad you enjoyed. Check out this link: stackoverflow.com/questions/20037430/reading-multiple-json-records-into-a-pandas-dataframe
      It should answer your question! :)

  • @jamalford576
    @jamalford576 3 года назад

    I am at the 44:00 mark.
    and I keep on getting that ValueError Traceback (most recent call last)
    in
    3 clf_svm = svm.SVC(kernel='linear')
    4
    ----> 5 clf_svm.fit(train_x_vectors,train_y)
    ~/anaconda3/lib/python3.8/site-packages/sklearn/svm/_base.py in fit(self, X, y, sample_weight)
    162 accept_large_sparse=False)
    163
    --> 164 y = self._validate_targets(y)
    165
    166 sample_weight = np.asarray([]
    ~/anaconda3/lib/python3.8/site-packages/sklearn/svm/_base.py in _validate_targets(self, y)
    547 classes=cls, y=y_)
    548 if len(cls) < 2:
    --> 549 raise ValueError(
    550 "The number of classes has to be greater than one; got %d"
    551 " class" % len(cls))
    ValueError: The number of classes has to be greater than one; got 1 class
    Do you mind helping me with this please?

  • @davidlee5715
    @davidlee5715 4 года назад

    thank you mate! that's amazing!

  • @JakoBueno
    @JakoBueno 4 года назад

    Hey! How u doing? I dont know if u are going to see this, but when i run the f1 score with the 10.000 file jupyter says: MemoryError: Unable to allocate 1.33 GiB for an array with shape (6700, 26615) and data type int64
    I googled it but i couldnt find an answer... can u help me please? Greetings from Argentina!

  • @dikshyantthapa3367
    @dikshyantthapa3367 4 года назад +1

    You kept appearing on my thumbnail.. I didn't care at first.. Later for once i opened the data science video.. Man.. It was so useful. The application videos of machine learning, data science were awesome. Thanks Keith ❤️.

    • @KeithGalli
      @KeithGalli  4 года назад +1

      Well I'm happy that you ended up clicking on a video :). Also glad that you have found the videos useful. I appreciate the support!

  • @azrulfyz1162
    @azrulfyz1162 4 года назад +1

    Wow, that is one comprehensive tutorial. Thanks for the time and effort.

  • @jp2nyy
    @jp2nyy 4 года назад

    Hi there, any advice on using sklearn to predict using multiple csv's and manually input data (i.e. I type in some data)

  • @tak68tak
    @tak68tak 4 года назад +1

    Sooo POSITIVE. You really saved me. Thanks a lot!

  • @keensaj
    @keensaj 4 года назад

    great tutorial! thanks 😊

  • @gannoncondon1864
    @gannoncondon1864 2 года назад +1

    Another great video. Really appreciate minimal slides paired with the 'live' coding feel.

  • @msc-clk
    @msc-clk 4 года назад +1

    I am like machines. I am always learning... Not watched but I believe you made your best.

    • @msc-clk
      @msc-clk 4 года назад

      Edit: I just finished this tutorial and I still support my first comment. NOICE. You are real deal!

  • @jamesriri1810
    @jamesriri1810 4 года назад +1

    @Keith Galli this is really dope. Totally love how how you teach the tutorial. Amazing stuff here.

  • @gisleberge4363
    @gisleberge4363 11 месяцев назад

    Real helpful, made me realise New possibilities on how to go agout text data - thanks 🙂

  • @BM-vz2nb
    @BM-vz2nb 4 года назад +2

    Very good and cool Tutorial Keith! Thanks a Ton! Loved it!

  • @andrewtupua4039
    @andrewtupua4039 3 года назад

    Started getting invalid syntax at 18. the .text kept coming back as not being listed as a tuple

  • @Nahzh-m1r
    @Nahzh-m1r Месяц назад

    The only reason I can't learn anything from your videos is because you are too cute and hot to not watch your face rather than screen!

  • @iftekharamin2486
    @iftekharamin2486 4 года назад +1

    @Keith Galli, really awesome tutorial to watch & try in parallel. however, can you please further clarify how to fit Gaussian Naive Bayes as per this video ?

    • @veronicaalejandramas6641
      @veronicaalejandramas6641 3 года назад

      from sklearn.naive_bayes import GaussianNB
      clf_gnb = GaussianNB()
      train_x_vectors_array = train_x_vectors.toarray()
      test_x_vectors_array=test_x_vectors.toarray()
      clf_gnb.fit(train_x_vectors_array, train_y)
      clf_gnb.predict(test_x_vectors_array)
      This works, but not sure the fit is correct

  • @Locke19901
    @Locke19901 4 года назад +2

    Keith, this is incredibly helpful. Your teaching style is to be commended. I look forward to more like this for ML.

  • @amankumarsingh6242
    @amankumarsingh6242 4 года назад +1

    Your videos are superb. I can see your videos and just get started applying it to my project. Thank you👍.

    • @KeithGalli
      @KeithGalli  4 года назад +2

      That's awesome! Glad you have enjoyed :)

  • @PreetiKumari-sj2rh
    @PreetiKumari-sj2rh 4 года назад

    Hi keith could you please help me how to load the json file in python. I am unable to do it

  • @ninjaduck3534
    @ninjaduck3534 3 года назад +1

    Dude you are an excellent educator, thank you so much for this well structured, well explained video!!

  • @pangeayoutube
    @pangeayoutube 15 часов назад

    can't import the data books small can anyone help mE ?

  • @vijaykumar-od7kx
    @vijaykumar-od7kx 5 месяцев назад

    Excellent tutorial to learn the fundamentals of SCI-Kit

  • @thecuriousguy1531
    @thecuriousguy1531 2 года назад

    Why won't you use stratify to distribute evenly data

  • @charmz973
    @charmz973 4 года назад +1

    hey bro am getting a TypeError: '

    • @charmz973
      @charmz973 4 года назад

      TypeError Traceback (most recent call last)
      in
      ----> 1 clf_svm.fit(train_x_vector,train_y)
      C:\Anaconda\lib\site-packages\sklearn\svm\base.py in fit(self, X, y, sample_weight)
      145 order='C', accept_sparse='csr',
      146 accept_large_sparse=False)
      --> 147 y = self._validate_targets(y)
      148
      149 sample_weight = np.asarray([]
      C:\Anaconda\lib\site-packages\sklearn\svm\base.py in _validate_targets(self, y)
      513 def _validate_targets(self, y):
      514 y_ = column_or_1d(y, warn=True)
      --> 515 check_classification_targets(y)
      516 cls, y = np.unique(y_, return_inverse=True)
      517 self.class_weight_ = compute_class_weight(self.class_weight, cls, y_)
      C:\Anaconda\lib\site-packages\sklearn\utils\multiclass.py in check_classification_targets(y)
      164 y : array-like
      165 """
      --> 166 y_type = type_of_target(y)
      167 if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',
      168 'multilabel-indicator', 'multilabel-sequences']:
      C:\Anaconda\lib\site-packages\sklearn\utils\multiclass.py in type_of_target(y)
      285 return 'continuous' + suffix
      286
      --> 287 if (len(np.unique(y)) > 2) or (y.ndim >= 2 and len(y[0]) > 1):
      288 return 'multiclass' + suffix # [1, 2, 3] or [[1., 2., 3]] or [[1, 2]]
      289 else:
      in unique(*args, **kwargs)
      C:\Anaconda\lib\site-packages
      umpy\lib\arraysetops.py in unique(ar, return_index, return_inverse, return_counts, axis)
      260 ar = np.asanyarray(ar)
      261 if axis is None:
      --> 262 ret = _unique1d(ar, return_index, return_inverse, return_counts)
      263 return _unpack_tuple(ret)
      264
      C:\Anaconda\lib\site-packages
      umpy\lib\arraysetops.py in _unique1d(ar, return_index, return_inverse, return_counts)
      308 aux = ar[perm]
      309 else:
      --> 310 ar.sort()
      311 aux = ar
      312 mask = np.empty(aux.shape, dtype=np.bool_)
      TypeError: '

    • @cardinalpizza2902
      @cardinalpizza2902 4 года назад

      Me too dude, have you solved?

    • @cardinalpizza2902
      @cardinalpizza2902 4 года назад +1

      Ok i've solved, in my case the problem was with the Enum class, i was putting a comma after the sentiment string, and that created a list of labels with commas

  • @hassanrevel
    @hassanrevel 3 года назад

    Can you please make videos on pytorch or tensorflow.

  • @muliwang9771
    @muliwang9771 2 года назад

    why distribute evenly the testing data and f1 score better?

  • @utkarshkapil
    @utkarshkapil 3 месяца назад

    Relevant and super helpful in 2024 too
    !

  • @pradyumnarao5649
    @pradyumnarao5649 4 года назад

    At the step of classifiers,while using Naive Bayes,the line where you tell
    from sklearn.naive_bayes import GaussianNB
    clf_gnb = GaussianNB()
    clf_gnb.fit(train_x_vectors,train_y)
    clf_gnb.predict(test_x_vectors[0])
    The third line .fit(---) is not working,there is a sparsing error it is throwing,
    Can you please check it @Keith Galli
    Thank you

  • @ninadkawade4681
    @ninadkawade4681 2 месяца назад

    naive bayes gives error for .fit method how to solve it ?

  • @nabilelbilali9569
    @nabilelbilali9569 4 года назад

    I think this way it's much clear in the Prep Data
    x=[t.text for t in reviews]
    y=[z.sentiment for z in reviews]
    x_train,y_train,x_teat,y_test=train_test_split(x,y,test_size=0.3)

  • @yahyasheikhnejad
    @yahyasheikhnejad 4 года назад

    while the model is trained equally by POSITIVE and NEGATIVE labels, it shouldn't biased to any of them! (but we see there is a meaningful difference in f1-score) and also, number of data for testing model shouldn't affect the model performance as our model already constructed and will not change by the number of test cases. am I right?

  • @rand5858
    @rand5858 3 года назад

    I get blanks ( DecisionTreeClassifier()
    ) w/out the classifier information in my code; any ideas?
    Also thanks these are great videos!
    As a fellow Sigma Chia; In Hoc!

  • @TogrulKazimov
    @TogrulKazimov 4 года назад

    I think you haven't define test_x while working on classification. I got an error message then I saw it on the Reviewcontainer section. Did I missed something?

  • @nirmaltheprogrammer510
    @nirmaltheprogrammer510 4 года назад

    How to use it in games like snake, Mario,etc . Pls tell

  • @KaranSingh-fg2nt
    @KaranSingh-fg2nt 4 года назад

    hey bro
    make a video on sql for data science ...PLZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ

  • @kccchiu
    @kccchiu 3 года назад

    wouldn't be easier if we just import a resampling library like imblearn and oversample negative reviews / undersample positive reviews?
    That said, I learn some oop and love this tutorial so much. I somehow got an offer as a data scientist that works on NLP, so this actually gives me some confidence lol

  • @lfmtube
    @lfmtube 3 года назад

    Very good video! New subscriber and added to my “ Perfect videos” list. Thanks for sharing your knowledge.

  • @Kris-to7vh
    @Kris-to7vh 3 года назад

    Best channel ever to learn any Python library!
    1:05 i wonder what the outcome will be for sarcasm, something like: 'beautiful restaurant that made me puke, raccomand'

  • @saikumargatla4706
    @saikumargatla4706 Год назад

    Your videos. Are changing my life

  • @hobbz4921
    @hobbz4921 4 года назад

    hey keith, can you build an app for PC that can text any cellphone, can receive replies, maybe store a contacts list, I know that plenty of websites can do this, but an app for PC that doesn't require the cell phone that you're texting to download said app?

  • @MBG-ck9ou
    @MBG-ck9ou 4 года назад

    Can anyone tell me if there is a way to predict more than one value? An example of this would be to have a studentId and be able to predict not only if he will pass, but what grades he would probably get as well. There would obviously be more data from this specific student to learn from

  • @TheAbiya
    @TheAbiya 4 года назад

    Absolutely lovely tutorials! I follow all your data science projects. keep doing this :)
    I have encountered an issue while solving this, have posted my error and code on StackOverflow
    Link : stackoverflow.com/questions/62347528/trouble-fitting-my-model-on-sklearn-from-svm
    anyone who can figure out, please comment on on the solution

  • @yongsuhuang6195
    @yongsuhuang6195 3 года назад

    Almost exact 2 years later, a crappy student is watching ur video at 3am, trying to finish his assignment