Machine Learning in Python: Building a Classification Model

Поделиться
HTML-код
  • Опубликовано: 11 дек 2024

Комментарии • 102

  • @DataProfessor
    @DataProfessor  4 года назад +5

    Code of this tutorial is available as a Jupyter notebook via GitHub (link below).
    📎CODE: github.com/dataprofessor/code/tree/master/python/iris

    • @ramprasadsapkota1013
      @ramprasadsapkota1013 3 года назад

      When I open that link save into my computer and when I open this file in Jupiter notebook it shows Json format instead of Jupiter note book code

    • @bruhm0ment767
      @bruhm0ment767 3 года назад

      @@ramprasadsapkota1013 are you doing it through conda or have you downloaded jupyter lab separately

  • @forestsunrise26
    @forestsunrise26 2 года назад +22

    I learn better in this 20 min video than in 1 semester at the university. Thank you so much!

  • @sc4tterw1nd
    @sc4tterw1nd 5 месяцев назад +1

    Thank you so much! I was able to come up 17 places in this ML competition because of this. Short and to the point.

  • @siddharthachaganti5639
    @siddharthachaganti5639 3 года назад +1

    Never though people who study biology are interesting but now i change my opinion, you are the best ..

  • @nicholflowers2077
    @nicholflowers2077 3 года назад +7

    I really appreciate your organized approach to making this video very clear and simple. Thank you Professor!

    • @DataProfessor
      @DataProfessor  3 года назад

      You're very welcome! Thanks for the kind words!

  • @friederikebauer7810
    @friederikebauer7810 2 года назад +4

    Ran my first test for my thesis! Super informative, thanks :)

  • @thomastimjensen
    @thomastimjensen Год назад +1

    Hi, Data Professor! Thank you so much for a very lucid and well structured walk-through of how to build a classification model. I am a master student in data driven organisational change at a university in Denmark, and your course is just perfect to expand my knowledge. Thank you!

  • @soeleos5309
    @soeleos5309 2 года назад

    Hey prof, I'm a prof @FSU, direct a ML lab, great tutorial, will use this onboarding new students, Many thanks!.

  • @ullaskunder
    @ullaskunder 3 года назад +2

    Awesome.....better then our college syllabus..........

  • @galnahum4349
    @galnahum4349 4 года назад +1

    ครูชานินวีดีโอคนนี้ดีมากครับ Looking forward to the next video in the series. ขอบคุณมากนะครับ ⁦🙏🏻⁩

    • @DataProfessor
      @DataProfessor  4 года назад

      Thanks for the comment and kind words!

  • @DataOverEverything
    @DataOverEverything Год назад +1

    Such a good tutorial.. there aren't many that cover non binary classification. Thank you

  • @ahmadaltaweel4981
    @ahmadaltaweel4981 3 года назад

    My first project is to spell your name correctly :) Love you professor.

  • @1UniverseGames
    @1UniverseGames 3 года назад +3

    Help Please: While I type clf.fit(X, Y) > I only get a output like:- RandomForestClassifier()
    It not showing the whole details in my notebook, I wrote the same code above and this line shows different output result, can you help to solve this, is there any syntax to get all detailed output

    • @DataProfessor
      @DataProfessor  3 года назад

      Hi, what output are you getting? It should show the parameters used for building the model as the output.

    • @dhwanitrivedi5604
      @dhwanitrivedi5604 3 года назад +1

      @@DataProfessor Hello Sir, I am facing the same issue I was able to do the prediction however while printing the score I got 0% accuracy. I have wrote the same code as shown in the notebook. Please help. Thank you very much.

    • @DataProfessor
      @DataProfessor  3 года назад

      @@dhwanitrivedi5604 Please check to see if the data is loaded properly and that the data variable is read into the fit function. Also check that the variable names match since if nothing is read in then it will not produce the desired results.

  • @mankomyk
    @mankomyk 2 года назад +1

    Trying to use this guide. But noticed, that I need to choose how to import CSV file - as pandas dataframe or as NumPy array. Your instructions and code are for the NumPy array. My CSV file has 35000 rows and 280 columns. The first row is for column names. The first column has string target classification (y) 'good'/'bad' and all other columns are some numeric features. What should I choose?

    • @mankomyk
      @mankomyk 2 года назад +1

      And when I'm trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';',names=True) I get strange array shape (35000,) with column names in variables viewer. Trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';') and get array shape (35001,280) but column names imported as first row :(

    • @DataProfessor
      @DataProfessor  2 года назад +1

      Hi, have you tried importing using pandas
      import pandas as pd
      df = pd.read_csv('file.csv')
      Afterwards you can separate df to X and y.

    • @mankomyk
      @mankomyk 2 года назад +1

      @@DataProfessor Thanks. Finally I've done like this:
      data = pd.read_csv('datafile.csv',sep=';')
      data = pd.DataFrame(imp.transform(data), columns=data.columns)
      dataArray = data.to_numpy()
      X = dataArray[:,1:]
      X.astype('float64')
      Y = dataArray[:,0]
      Y.astype('float64')
      Y = pd.to_numeric(Y)

  • @nicholflowers2077
    @nicholflowers2077 3 года назад +1

    Hi Professor, Was "Feature Importance" added to the original dataset? How/When is it calculated for new data that has not already been modeled? 2. How does the dataset.load method know where to get your dataset from?

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Hi Nichol,
      1. Feature importance is performed after a model has been built using the random forest algorithm (which has a built-in feature importance function). We can get these important features as follows (notice the feature_importances_ function):
      clf = RandomForestClassifier()
      clf.fit(X_train, Y_train)
      clf.feature_importances_

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Q: How/When is it calculated for new data that has not already been modeled?
      A: Feature importance can be calculated only for the data that was used to train the model. We can incorporate new data into the dataset, rebuild the model and recompute the feature importance.

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Q: 2. How does the dataset.load method know where to get your dataset from?
      A: The datasets.load_iris function loads the Iris dataset from the Scikit-learn package and assigns to a variable that we specify such as assigning it to a variable called "iris".
      iris = datasets.load_iris()
      In addition, there are several other datasets provided by Scikit-learn as specified in the Scikit-learn documentation scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
      For example, we can replace "load_iris" in "datasets.load_iris()" with load_boston, load_breast_cancer, load_diabetes, load_digits, load_wine
      to use these other datasets
      Hope this helps.

    • @nicholflowers2077
      @nicholflowers2077 3 года назад

      @@DataProfessor oh! I didn't know that. thanks for pointing that out.

    • @nicholflowers2077
      @nicholflowers2077 3 года назад

      @@DataProfessor Makes perfect sense. thanks!

  • @sushmaramesh7902
    @sushmaramesh7902 6 месяцев назад

    Oh my God !! This is great stuff !! Thankyou so much !!

  • @ronaldomartins7006
    @ronaldomartins7006 2 месяца назад

    On Output 11 -> Y.shape comes up for (3,) but I've tried different ways and it comes up as (150,). Would you have any idea?? Btw your video is excellent, thanks for that

  • @thanzaw3883
    @thanzaw3883 4 года назад +2

    Thank you for great video, the explanation is very clear for me(beginner). I don't have much programming background but when I watch your video I still can understand very well. I'm new ML student and I need to do my school 1st ML project . Could you give me any suggestion where can I get
    simple dataset and which algorithm should I used. Thank you so much Sir. I'm wishing you a lots of success and happiness in your life.

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Hi Than. Start with simple datasets that you understand what the features means such as the iris dataset for classification task and Boston housing dataset for regression. I’m sure there may be others but these 2 came to mind. The algorithm to start with, I would recommend linear regression for your regression tasks and a tree-based method like decision tree or random forest for your classification tasks.

    • @thanzaw3883
      @thanzaw3883 4 года назад

      Dear Sir, Thank you so much for your reply. I will always take note all of your best suggestion.

    • @bruhm0ment767
      @bruhm0ment767 3 года назад

      try mnist dataset, its for handwritten digits. Or you could just make your own input vectors and expected output vectors with floats and just predict those right?

  • @Alok-lk4ql
    @Alok-lk4ql 4 года назад +1

    Sir if we don't have balance class for target variable then what to do?

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Great question then we will have to balance the target variable by performing under sampling or over sampling

  • @HimaniChauhan
    @HimaniChauhan 4 года назад +1

    sir how to use k means algorithm as a training dataset in python

    • @DataProfessor
      @DataProfessor  4 года назад +1

      You'll need to import the necessary library
      from sklearn.cluster import KMeans
      and use the KMeans() function

    • @HimaniChauhan
      @HimaniChauhan 4 года назад

      @@DataProfessor thanku sir

  • @shailabhshankar884
    @shailabhshankar884 3 года назад

    Lets say, i have a data set that has two features to be included in the Y set... how do we do that..
    I have a data set with columns - class name,X,YW,H, cluster Number.
    I want the model to predict the class name based on the X,Y,W,H and cluster number...
    For all the same cluster number , i want the model to take into consideration the X,YW,H of only respective cluster number
    Cluster number are actually the template number of invoices , X,YW,H are co-ordinates... and class names are fields name.. so the problem statement is that... we know the cluster number and X,YWH co-ordinates and we want the system to predict which set of co-ordinates are which data fields..
    so the model must only take into account those X,Y,WH for specific cluster number rather than taking all X.YW,H into account.
    Thanks in advance.

  • @carlosventura1308
    @carlosventura1308 3 года назад +1

    Hi thank you for the video. When I was doing print(clf.feature_importances_) I get a value error that says found input variavles with inconsistent numbers of samples: [150,3]. Could it be that I typed something wrong? Thank you for the help

    • @DataProfessor
      @DataProfessor  3 года назад

      Hi, thanks for watching! The feature_importances_ should work after the model is built via model.fit(X,Y)
      Can you try again and make sure that all code cells were run.

    • @carlosventura1308
      @carlosventura1308 3 года назад

      @@DataProfessor I got it. Thank you so much for the quick reply

  • @blessingadeyemi1289
    @blessingadeyemi1289 2 года назад +1

    Thanks a lot for this tutorial. I wanted to ask why you made a prediction with an instance from the training set instead of using new data. Doesn't this cause overfitting?

    • @DataProfessor
      @DataProfessor  2 года назад

      Yes, that is correct. It was used for the initial demo purpose of using the scikit-learn classifier function, which later in the video talked about the use of the train_test_split function for performing data splitting followed by model building and evaluating on the test data.

  • @wen-chiyeh4332
    @wen-chiyeh4332 3 года назад +1

    Super helpful!!! Thank you so much!!!

  • @moniquediaz674
    @moniquediaz674 4 года назад +1

    loved the video. You teach very well

  • @shiyuran625
    @shiyuran625 2 года назад

    Thank you so much for your patient explaination! I wonder what X[[0]] means?

  • @marcofestu
    @marcofestu 4 года назад +3

    Nice video, I've just started using python as well, so I hope u can keep up updating video on python as well as R 😁

    • @DataProfessor
      @DataProfessor  4 года назад

      Thanks Marco for your support! More coming up.

  • @todymaverick
    @todymaverick Год назад

    great job man!

  • @ramprasadsapkota1013
    @ramprasadsapkota1013 3 года назад +1

    Hi,
    How can I find the file in GitHub l tried but got different files

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Hi the link to the Jupyter notebook file on GitHub repo is normally in the video descriptions, here’s the link again github.com/dataprofessor/code/tree/master/python/iris

    • @ramprasadsapkota1013
      @ramprasadsapkota1013 3 года назад +1

      Thanks heaps, your tutorial is awesome !!!

  • @abhipsatripathy3934
    @abhipsatripathy3934 4 года назад +1

    Prof. I am new to Python, and following your videos regularly. I have a problem.When i am creating a matrix by myself in Jupiter notebook, than "shape" command is working i.e. it shows the no. of rows and no. of columns. But when I am importing the iris dataset from sklearn using the code from sklearn.datasets import load_iris,iris = load_iris() and then using "iris.shape". ERROR is occurring. It shows Keyerror "shape". What can be the reason????????? Please suggest me something because I have been stuck in this.

    • @DataProfessor
      @DataProfessor  4 года назад

      can you try assigning X = iris.data and assign Y = iris.target where iris.data contains the 4 X variables and iris.target contains the Y variable (the species class label). Afterwards you can run X.shape and Y.shape

    • @abhipsatripathy3934
      @abhipsatripathy3934 4 года назад +1

      @@DataProfessor I'll try and tell you. I split the data like X = iris.data & Y = iris.target, and then tried to find the shape. shape is X is coming to be (150,4) and that of Y is (150,). Is it okay?

    • @DataProfessor
      @DataProfessor  4 года назад

      yes that is correct, 150 means that there are 150 rows and 4 means there are columns

    • @abhipsatripathy3934
      @abhipsatripathy3934 4 года назад +1

      @@DataProfessor Thanks a ton Prof.

    • @DataProfessor
      @DataProfessor  4 года назад

      @@abhipsatripathy3934 You're welcome 😃

  • @kusumakusuma1150
    @kusumakusuma1150 3 года назад +1

    Thanks Data Professor, clear and informative video! Question in the model performance : why we compare X_test and Y_test? since they are subset of the data? why not the Y_test and Y_predict?

    • @DataProfessor
      @DataProfessor  3 года назад +2

      For evaluating model performance, there are 2 ways:
      (1) via the score() function as in model.score(X_test, Y_test) which will automatically use r2 for regression models and accuracy for classification models.
      (2) via the r2_score(Y_test, Y_test_pred) if it is a regression model or via accuracy_score(Y_test, Y_test_pred) if it is a classification model.

    • @kusumakusuma1150
      @kusumakusuma1150 3 года назад +1

      @@DataProfessor noted. thanks

  • @nguyendaominh1078
    @nguyendaominh1078 3 года назад +1

    Very useful video. Thanks a lot!

  • @larrysizemore2891
    @larrysizemore2891 3 года назад

    How exactly should we do this on our own?

  • @thestorm4633
    @thestorm4633 3 года назад

    Hi Prof, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic.
    if you can give advice on how to go about it, my department reject the use of sklearn libraries.
    currently studying in africa.
    Your candid advice will be greatly appreciated.

    • @DataProfessor
      @DataProfessor  3 года назад

      Cybersecurity isn't really my domain. I would approach the problem by asking experts in the field on what is the current gold standard method to perform this task. Then I would research research papers in the field. Then aggregate all information to plan my own approach. That's how I would do it. Hope this helps.

  • @maths4you819
    @maths4you819 2 года назад

    You have xplained it nicely..Plz explain machine learning in Python using Brain Arteries data set...

  • @nuramirahsyahirahzainurin6151
    @nuramirahsyahirahzainurin6151 4 года назад

    can I do a classification model in burnout?

  • @dd15277
    @dd15277 3 года назад

    Thank you for sharing, great explanation!!

  • @shwetaredkar734
    @shwetaredkar734 4 года назад +1

    Simply awesome.

    • @DataProfessor
      @DataProfessor  4 года назад

      Thanks again for the kind words 😃

  • @Mayglie
    @Mayglie 4 года назад +1

    HI Data Professor, could you demo , cifar10 dataset and also teach how to save the trained data and load the trained data and make prediction. Thank you !

    • @DataProfessor
      @DataProfessor  4 года назад

      Great suggestion! I will definitely take a look and consider for future videos 😃

  • @RaselAhmed-ix5ee
    @RaselAhmed-ix5ee 2 года назад

    hello how can i contact you?

  • @kanimozhipanneerselvam3017
    @kanimozhipanneerselvam3017 4 года назад +3

    Great Video Professor as Always!! Kindly upload videos for Handling Sensor Collected \ IoT related times series data & model building!! Thanks in Advance!! 🙂

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Thanks for the suggestion, I’ll definitely consider this for future videos 😃

    • @thestorm4633
      @thestorm4633 3 года назад +1

      @@DataProfessor please do, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic.
      if you can give advice on how to go about it, my department reject the use of sklearn libraries.
      Your candid advice will be greatly appreciated.

    • @l3gcy337
      @l3gcy337 Год назад

      ​@thestorm4633 how did your project go?

  • @viet-bacnguyen1830
    @viet-bacnguyen1830 4 года назад +1

    Many thanks!

    • @DataProfessor
      @DataProfessor  4 года назад

      A pleasure! Thanks for watching 😃

  • @brokerkamil5773
    @brokerkamil5773 Год назад +1

    thx😀

  • @ismaelnadaf2870
    @ismaelnadaf2870 4 года назад +1

    A great video sir, i am an UG student .i want to build an machine learning web app for multi language detection(eg:english,french,chinese,japanese).Please guide me how to do from basic

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Thanks for your comment. What you need to do to make this web app is use the Shiny package in R for building the web app where you can build a ML model and plug it into the web app.
      I made several videos on this topic. Please check it out below.
      Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
      ruclips.net/video/ceg7MMQNln8/видео.html
      There are 4 other related videos and it guides you from the beginning:
      1. Building your First Web Application in R | Shiny Tutorial Ep 1
      ruclips.net/video/tfN10IUX9Lo/видео.html
      2. Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
      ruclips.net/video/lC1Dk6gUbe0/видео.html
      3. Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
      ruclips.net/video/CYXvVuklWRM/видео.html
      5. Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
      ruclips.net/video/9EQ6cwBQpvo/видео.html

  • @pramishprakash
    @pramishprakash 2 года назад

    thanks a lot sir

  • @pavankalyan6927
    @pavankalyan6927 9 месяцев назад

    thank you so much

  • @Mohamm-ed
    @Mohamm-ed 4 года назад +1

    Thanks for the video, could you pleas build signal classifier model in Python such as EEG and ECG signal or recommend me a materials for that.... thanks in advance

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Thanks for the comment. You might want to check out this GitHub page for some repository that have the code of EEG classifiers github.com/topics/eeg-classification

    • @Mohamm-ed
      @Mohamm-ed 4 года назад

      @@DataProfessor thanks so much

  • @kodediego
    @kodediego 4 года назад +1

    great work and I LOVE YOUR NAME

  • @T-BWT
    @T-BWT 10 месяцев назад

    my professor is a finger print candidate

  • @damilolamatthew9677
    @damilolamatthew9677 3 месяца назад

    Like seriously you don’t even explain some figures in it 😢😢😢 I keep repeating the file over 10times😢 your tutorial is very low