Machine Learning in Python: Building a Classification Model

Поделиться
HTML-код
  • Опубликовано: 28 июн 2024
  • In this video, I will show you how to build a simple machine learning model in Python. Particularly, we will be using the scikit-learn package in Python to build a simple classification model (for classifying Iris flowers) using the random forest algorithm.
    🌟 Buy me a coffee: www.buymeacoffee.com/dataprof...
    📎CODE: github.com/dataprofessor/code...
    ⭕ Playlist:
    Check out our other videos in the following playlists.
    ✅ Data Science 101: bit.ly/dataprofessor-ds101
    ✅ Data Science RUclipsr Podcast: bit.ly/datascience-youtuber-p...
    ✅ Data Science Virtual Internship: bit.ly/dataprofessor-internship
    ✅ Bioinformatics: bit.ly/dataprofessor-bioinform...
    ✅ Data Science Toolbox: bit.ly/dataprofessor-datascie...
    ✅ Streamlit (Web App in Python): bit.ly/dataprofessor-streamlit
    ✅ Shiny (Web App in R): bit.ly/dataprofessor-shiny
    ✅ Google Colab Tips and Tricks: bit.ly/dataprofessor-google-c...
    ✅ Pandas Tips and Tricks: bit.ly/dataprofessor-pandas
    ✅ Python Data Science Project: bit.ly/dataprofessor-python-ds
    ✅ R Data Science Project: bit.ly/dataprofessor-r-ds
    ⭕ Subscribe:
    If you're new here, it would mean the world to me if you would consider subscribing to this channel.
    ✅ Subscribe: ruclips.net/user/dataprofessor...
    ⭕ Recommended Tools:
    Kite is a FREE AI-powered coding assistant that will help you code faster and smarter. The Kite plugin integrates with all the top editors and IDEs to give you smart completions and documentation while you’re typing. I've been using Kite and I love it!
    ✅ Check out Kite: www.kite.com/get-kite/?...
    ⭕ Recommended Books:
    ✅ Hands-On Machine Learning with Scikit-Learn : amzn.to/3hTKuTt
    ✅ Data Science from Scratch : amzn.to/3fO0JiZ
    ✅ Python Data Science Handbook : amzn.to/37Tvf8n
    ✅ R for Data Science : amzn.to/2YCPcgW
    ✅ Artificial Intelligence: The Insights You Need from Harvard Business Review: amzn.to/33jTdcv
    ✅ AI Superpowers: China, Silicon Valley, and the New World Order: amzn.to/3nghGrd
    ⭕ Stock photos, graphics and videos used on this channel:
    ✅ 1.envato.market/c/2346717/628...
    ⭕ Follow us:
    ✅ Medium: bit.ly/chanin-medium
    ✅ FaceBook: / dataprofessor
    ✅ Website: dataprofessor.org/ (Under construction)
    ✅ Twitter: / thedataprof
    ✅ Instagram: / data.professor
    ✅ LinkedIn: / chanin-nantasenamat
    ✅ GitHub 1: github.com/dataprofessor/
    ✅ GitHub 2: github.com/chaninlab/
    ⭕ Disclaimer:
    Recommended books and tools are affiliate links that gives me a portion of sales at no cost to you, which will contribute to the improvement of this channel's contents.
    #dataprofessor #machinelearning #datascienceproject #iris #classification #randomforest #decisiontree #python #learnpython #pythonprogramming #datascience #datamining #bigdata #datascienceworkshop #dataminingworkshop #dataminingtutorial #datasciencetutorial #ai #artificialintelligence #tutorial #dataanalytics #dataanalysis #machinelearningmodel
  • НаукаНаука

Комментарии • 100

  • @DataProfessor
    @DataProfessor  4 года назад +5

    Code of this tutorial is available as a Jupyter notebook via GitHub (link below).
    📎CODE: github.com/dataprofessor/code/tree/master/python/iris

    • @ramprasadsapkota1013
      @ramprasadsapkota1013 3 года назад

      When I open that link save into my computer and when I open this file in Jupiter notebook it shows Json format instead of Jupiter note book code

    • @bruhm0ment767
      @bruhm0ment767 2 года назад

      @@ramprasadsapkota1013 are you doing it through conda or have you downloaded jupyter lab separately

  • @sc4tterw1nd
    @sc4tterw1nd 6 дней назад +1

    Thank you so much! I was able to come up 17 places in this ML competition because of this. Short and to the point.

  • @forestsunrise26
    @forestsunrise26 Год назад +18

    I learn better in this 20 min video than in 1 semester at the university. Thank you so much!

  • @nicholflowers2077
    @nicholflowers2077 3 года назад +7

    I really appreciate your organized approach to making this video very clear and simple. Thank you Professor!

    • @DataProfessor
      @DataProfessor  3 года назад

      You're very welcome! Thanks for the kind words!

  • @friederikebauer7810
    @friederikebauer7810 Год назад +3

    Ran my first test for my thesis! Super informative, thanks :)

  • @soeleos2846
    @soeleos2846 2 года назад

    Hey prof, I'm a prof @FSU, direct a ML lab, great tutorial, will use this onboarding new students, Many thanks!.

  • @thomastimjensen
    @thomastimjensen Год назад +1

    Hi, Data Professor! Thank you so much for a very lucid and well structured walk-through of how to build a classification model. I am a master student in data driven organisational change at a university in Denmark, and your course is just perfect to expand my knowledge. Thank you!

  • @wen-chiyeh4332
    @wen-chiyeh4332 2 года назад +1

    Super helpful!!! Thank you so much!!!

  • @galnahum4349
    @galnahum4349 4 года назад +1

    ครูชานินวีดีโอคนนี้ดีมากครับ Looking forward to the next video in the series. ขอบคุณมากนะครับ ⁦🙏🏻⁩

    • @DataProfessor
      @DataProfessor  4 года назад

      Thanks for the comment and kind words!

  • @dd15277
    @dd15277 3 года назад

    Thank you for sharing, great explanation!!

  • @shiyuran625
    @shiyuran625 Год назад

    Thank you so much for your patient explaination! I wonder what X[[0]] means?

  • @siddharthachaganti5639
    @siddharthachaganti5639 3 года назад +1

    Never though people who study biology are interesting but now i change my opinion, you are the best ..

  • @sushmaramesh7902
    @sushmaramesh7902 Месяц назад

    Oh my God !! This is great stuff !! Thankyou so much !!

  • @ullaskunder
    @ullaskunder 3 года назад +2

    Awesome.....better then our college syllabus..........

  • @nguyendaominh1078
    @nguyendaominh1078 3 года назад +1

    Very useful video. Thanks a lot!

  • @kusumakusuma1150
    @kusumakusuma1150 2 года назад +1

    Thanks Data Professor, clear and informative video! Question in the model performance : why we compare X_test and Y_test? since they are subset of the data? why not the Y_test and Y_predict?

    • @DataProfessor
      @DataProfessor  2 года назад +2

      For evaluating model performance, there are 2 ways:
      (1) via the score() function as in model.score(X_test, Y_test) which will automatically use r2 for regression models and accuracy for classification models.
      (2) via the r2_score(Y_test, Y_test_pred) if it is a regression model or via accuracy_score(Y_test, Y_test_pred) if it is a classification model.

    • @kusumakusuma1150
      @kusumakusuma1150 2 года назад +1

      @@DataProfessor noted. thanks

  • @todymaverick
    @todymaverick 10 месяцев назад

    great job man!

  • @marcofestu
    @marcofestu 4 года назад +3

    Nice video, I've just started using python as well, so I hope u can keep up updating video on python as well as R 😁

    • @DataProfessor
      @DataProfessor  4 года назад

      Thanks Marco for your support! More coming up.

  • @blessingadeyemi1289
    @blessingadeyemi1289 2 года назад +1

    Thanks a lot for this tutorial. I wanted to ask why you made a prediction with an instance from the training set instead of using new data. Doesn't this cause overfitting?

    • @DataProfessor
      @DataProfessor  2 года назад

      Yes, that is correct. It was used for the initial demo purpose of using the scikit-learn classifier function, which later in the video talked about the use of the train_test_split function for performing data splitting followed by model building and evaluating on the test data.

  • @thanzaw3883
    @thanzaw3883 4 года назад +2

    Thank you for great video, the explanation is very clear for me(beginner). I don't have much programming background but when I watch your video I still can understand very well. I'm new ML student and I need to do my school 1st ML project . Could you give me any suggestion where can I get
    simple dataset and which algorithm should I used. Thank you so much Sir. I'm wishing you a lots of success and happiness in your life.

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Hi Than. Start with simple datasets that you understand what the features means such as the iris dataset for classification task and Boston housing dataset for regression. I’m sure there may be others but these 2 came to mind. The algorithm to start with, I would recommend linear regression for your regression tasks and a tree-based method like decision tree or random forest for your classification tasks.

    • @thanzaw3883
      @thanzaw3883 4 года назад

      Dear Sir, Thank you so much for your reply. I will always take note all of your best suggestion.

    • @bruhm0ment767
      @bruhm0ment767 2 года назад

      try mnist dataset, its for handwritten digits. Or you could just make your own input vectors and expected output vectors with floats and just predict those right?

  • @maths4you819
    @maths4you819 2 года назад

    You have xplained it nicely..Plz explain machine learning in Python using Brain Arteries data set...

  • @ahmadaltaweel4981
    @ahmadaltaweel4981 2 года назад

    My first project is to spell your name correctly :) Love you professor.

  • @DataOverEverything
    @DataOverEverything 11 месяцев назад +1

    Such a good tutorial.. there aren't many that cover non binary classification. Thank you

  • @moniquediaz674
    @moniquediaz674 3 года назад +1

    loved the video. You teach very well

  • @pavankalyan6927
    @pavankalyan6927 3 месяца назад

    thank you so much

  • @shwetaredkar734
    @shwetaredkar734 4 года назад +1

    Simply awesome.

    • @DataProfessor
      @DataProfessor  4 года назад

      Thanks again for the kind words 😃

  • @viet-bacnguyen1830
    @viet-bacnguyen1830 3 года назад +1

    Many thanks!

    • @DataProfessor
      @DataProfessor  3 года назад

      A pleasure! Thanks for watching 😃

  • @kanimozhipanneerselvam3017
    @kanimozhipanneerselvam3017 3 года назад +3

    Great Video Professor as Always!! Kindly upload videos for Handling Sensor Collected \ IoT related times series data & model building!! Thanks in Advance!! 🙂

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Thanks for the suggestion, I’ll definitely consider this for future videos 😃

    • @thestorm4633
      @thestorm4633 3 года назад +1

      @@DataProfessor please do, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic.
      if you can give advice on how to go about it, my department reject the use of sklearn libraries.
      Your candid advice will be greatly appreciated.

    • @l3gcy337
      @l3gcy337 8 месяцев назад

      ​@thestorm4633 how did your project go?

  • @nicholflowers2077
    @nicholflowers2077 3 года назад +1

    Hi Professor, Was "Feature Importance" added to the original dataset? How/When is it calculated for new data that has not already been modeled? 2. How does the dataset.load method know where to get your dataset from?

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Hi Nichol,
      1. Feature importance is performed after a model has been built using the random forest algorithm (which has a built-in feature importance function). We can get these important features as follows (notice the feature_importances_ function):
      clf = RandomForestClassifier()
      clf.fit(X_train, Y_train)
      clf.feature_importances_

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Q: How/When is it calculated for new data that has not already been modeled?
      A: Feature importance can be calculated only for the data that was used to train the model. We can incorporate new data into the dataset, rebuild the model and recompute the feature importance.

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Q: 2. How does the dataset.load method know where to get your dataset from?
      A: The datasets.load_iris function loads the Iris dataset from the Scikit-learn package and assigns to a variable that we specify such as assigning it to a variable called "iris".
      iris = datasets.load_iris()
      In addition, there are several other datasets provided by Scikit-learn as specified in the Scikit-learn documentation scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
      For example, we can replace "load_iris" in "datasets.load_iris()" with load_boston, load_breast_cancer, load_diabetes, load_digits, load_wine
      to use these other datasets
      Hope this helps.

    • @nicholflowers2077
      @nicholflowers2077 3 года назад

      @@DataProfessor oh! I didn't know that. thanks for pointing that out.

    • @nicholflowers2077
      @nicholflowers2077 3 года назад

      @@DataProfessor Makes perfect sense. thanks!

  • @pramishprakash
    @pramishprakash 2 года назад

    thanks a lot sir

  • @brokerkamil5773
    @brokerkamil5773 8 месяцев назад +1

    thx😀

  • @nuramirahsyahirahzainurin6151
    @nuramirahsyahirahzainurin6151 3 года назад

    can I do a classification model in burnout?

  • @thestorm4633
    @thestorm4633 3 года назад

    Hi Prof, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic.
    if you can give advice on how to go about it, my department reject the use of sklearn libraries.
    currently studying in africa.
    Your candid advice will be greatly appreciated.

    • @DataProfessor
      @DataProfessor  3 года назад

      Cybersecurity isn't really my domain. I would approach the problem by asking experts in the field on what is the current gold standard method to perform this task. Then I would research research papers in the field. Then aggregate all information to plan my own approach. That's how I would do it. Hope this helps.

  • @Alok-lk4ql
    @Alok-lk4ql 4 года назад +1

    Sir if we don't have balance class for target variable then what to do?

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Great question then we will have to balance the target variable by performing under sampling or over sampling

  • @carlosventura1308
    @carlosventura1308 3 года назад +1

    Hi thank you for the video. When I was doing print(clf.feature_importances_) I get a value error that says found input variavles with inconsistent numbers of samples: [150,3]. Could it be that I typed something wrong? Thank you for the help

    • @DataProfessor
      @DataProfessor  3 года назад

      Hi, thanks for watching! The feature_importances_ should work after the model is built via model.fit(X,Y)
      Can you try again and make sure that all code cells were run.

    • @carlosventura1308
      @carlosventura1308 3 года назад

      @@DataProfessor I got it. Thank you so much for the quick reply

  • @Mayglie
    @Mayglie 3 года назад +1

    HI Data Professor, could you demo , cifar10 dataset and also teach how to save the trained data and load the trained data and make prediction. Thank you !

    • @DataProfessor
      @DataProfessor  3 года назад

      Great suggestion! I will definitely take a look and consider for future videos 😃

  • @ismaelnadaf2870
    @ismaelnadaf2870 4 года назад +1

    A great video sir, i am an UG student .i want to build an machine learning web app for multi language detection(eg:english,french,chinese,japanese).Please guide me how to do from basic

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Thanks for your comment. What you need to do to make this web app is use the Shiny package in R for building the web app where you can build a ML model and plug it into the web app.
      I made several videos on this topic. Please check it out below.
      Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
      ruclips.net/video/ceg7MMQNln8/видео.html
      There are 4 other related videos and it guides you from the beginning:
      1. Building your First Web Application in R | Shiny Tutorial Ep 1
      ruclips.net/video/tfN10IUX9Lo/видео.html
      2. Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
      ruclips.net/video/lC1Dk6gUbe0/видео.html
      3. Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
      ruclips.net/video/CYXvVuklWRM/видео.html
      5. Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
      ruclips.net/video/9EQ6cwBQpvo/видео.html

  • @larrysizemore2891
    @larrysizemore2891 2 года назад

    How exactly should we do this on our own?

  • @HimaniChauhan
    @HimaniChauhan 4 года назад +1

    sir how to use k means algorithm as a training dataset in python

    • @DataProfessor
      @DataProfessor  4 года назад +1

      You'll need to import the necessary library
      from sklearn.cluster import KMeans
      and use the KMeans() function

    • @HimaniChauhan
      @HimaniChauhan 4 года назад

      @@DataProfessor thanku sir

  • @1UniverseGames
    @1UniverseGames 3 года назад +3

    Help Please: While I type clf.fit(X, Y) > I only get a output like:- RandomForestClassifier()
    It not showing the whole details in my notebook, I wrote the same code above and this line shows different output result, can you help to solve this, is there any syntax to get all detailed output

    • @DataProfessor
      @DataProfessor  3 года назад

      Hi, what output are you getting? It should show the parameters used for building the model as the output.

    • @dhwanitrivedi5604
      @dhwanitrivedi5604 2 года назад +1

      @@DataProfessor Hello Sir, I am facing the same issue I was able to do the prediction however while printing the score I got 0% accuracy. I have wrote the same code as shown in the notebook. Please help. Thank you very much.

    • @DataProfessor
      @DataProfessor  2 года назад

      @@dhwanitrivedi5604 Please check to see if the data is loaded properly and that the data variable is read into the fit function. Also check that the variable names match since if nothing is read in then it will not produce the desired results.

  • @shailabhshankar884
    @shailabhshankar884 3 года назад

    Lets say, i have a data set that has two features to be included in the Y set... how do we do that..
    I have a data set with columns - class name,X,YW,H, cluster Number.
    I want the model to predict the class name based on the X,Y,W,H and cluster number...
    For all the same cluster number , i want the model to take into consideration the X,YW,H of only respective cluster number
    Cluster number are actually the template number of invoices , X,YW,H are co-ordinates... and class names are fields name.. so the problem statement is that... we know the cluster number and X,YWH co-ordinates and we want the system to predict which set of co-ordinates are which data fields..
    so the model must only take into account those X,Y,WH for specific cluster number rather than taking all X.YW,H into account.
    Thanks in advance.

  • @Mohamm-ed
    @Mohamm-ed 4 года назад +1

    Thanks for the video, could you pleas build signal classifier model in Python such as EEG and ECG signal or recommend me a materials for that.... thanks in advance

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Thanks for the comment. You might want to check out this GitHub page for some repository that have the code of EEG classifiers github.com/topics/eeg-classification

    • @Mohamm-ed
      @Mohamm-ed 4 года назад

      @@DataProfessor thanks so much

  • @mankomyk
    @mankomyk Год назад +1

    Trying to use this guide. But noticed, that I need to choose how to import CSV file - as pandas dataframe or as NumPy array. Your instructions and code are for the NumPy array. My CSV file has 35000 rows and 280 columns. The first row is for column names. The first column has string target classification (y) 'good'/'bad' and all other columns are some numeric features. What should I choose?

    • @mankomyk
      @mankomyk Год назад +1

      And when I'm trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';',names=True) I get strange array shape (35000,) with column names in variables viewer. Trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';') and get array shape (35001,280) but column names imported as first row :(

    • @DataProfessor
      @DataProfessor  Год назад +1

      Hi, have you tried importing using pandas
      import pandas as pd
      df = pd.read_csv('file.csv')
      Afterwards you can separate df to X and y.

    • @mankomyk
      @mankomyk Год назад +1

      @@DataProfessor Thanks. Finally I've done like this:
      data = pd.read_csv('datafile.csv',sep=';')
      data = pd.DataFrame(imp.transform(data), columns=data.columns)
      dataArray = data.to_numpy()
      X = dataArray[:,1:]
      X.astype('float64')
      Y = dataArray[:,0]
      Y.astype('float64')
      Y = pd.to_numeric(Y)

  • @abhipsatripathy3934
    @abhipsatripathy3934 4 года назад +1

    Prof. I am new to Python, and following your videos regularly. I have a problem.When i am creating a matrix by myself in Jupiter notebook, than "shape" command is working i.e. it shows the no. of rows and no. of columns. But when I am importing the iris dataset from sklearn using the code from sklearn.datasets import load_iris,iris = load_iris() and then using "iris.shape". ERROR is occurring. It shows Keyerror "shape". What can be the reason????????? Please suggest me something because I have been stuck in this.

    • @DataProfessor
      @DataProfessor  4 года назад

      can you try assigning X = iris.data and assign Y = iris.target where iris.data contains the 4 X variables and iris.target contains the Y variable (the species class label). Afterwards you can run X.shape and Y.shape

    • @abhipsatripathy3934
      @abhipsatripathy3934 4 года назад +1

      @@DataProfessor I'll try and tell you. I split the data like X = iris.data & Y = iris.target, and then tried to find the shape. shape is X is coming to be (150,4) and that of Y is (150,). Is it okay?

    • @DataProfessor
      @DataProfessor  4 года назад

      yes that is correct, 150 means that there are 150 rows and 4 means there are columns

    • @abhipsatripathy3934
      @abhipsatripathy3934 4 года назад +1

      @@DataProfessor Thanks a ton Prof.

    • @DataProfessor
      @DataProfessor  4 года назад

      @@abhipsatripathy3934 You're welcome 😃

  • @kelvinedozieobed4899
    @kelvinedozieobed4899 3 года назад +1

    great work and I LOVE YOUR NAME

  • @ramprasadsapkota1013
    @ramprasadsapkota1013 3 года назад +1

    Hi,
    How can I find the file in GitHub l tried but got different files

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Hi the link to the Jupyter notebook file on GitHub repo is normally in the video descriptions, here’s the link again github.com/dataprofessor/code/tree/master/python/iris

    • @ramprasadsapkota1013
      @ramprasadsapkota1013 3 года назад +1

      Thanks heaps, your tutorial is awesome !!!

  • @RaselAhmed-ix5ee
    @RaselAhmed-ix5ee 2 года назад

    hello how can i contact you?

  • @user-ic6ow5wo9e
    @user-ic6ow5wo9e 4 месяца назад

    my professor is a finger print candidate