Machine Learning in R: Building a Classification Model

Поделиться
HTML-код
  • Опубликовано: 11 дек 2024

Комментарии • 104

  • @DataProfessor
    @DataProfessor  5 лет назад +5

    QUESTION OF THE DAY: Prior to watching this video, have you built a machine learning model before or intend to build one soon? If so, what interesting problem do you intend to tackle with data science? Comments down below! 😃

    • @walterv2769
      @walterv2769 5 лет назад +1

      Never, but went to a course in Buenos Aires and saw some algorithms applied to an eye diseases database. Loved the conclusions. After that experience i'm trying to learn how to apply this in predictions on patients effective treatments and other fields such as marketing and business reports. I work on a small clinic and i began building a small database for each case. But i don't know how much data would be enough or which algorithm to choose for every case. Thank you for helping us start.

    • @DataProfessor
      @DataProfessor  5 лет назад +2

      Thanks@@walterv2769 for your comment and support and for sharing with us your start in data science. I will make a future video on "how much data would be enough or which algorithm o choose". Wish you success in your data science endeavors.

    • @MasterStroke.
      @MasterStroke. 5 лет назад

      Good morning. How do I install R version 3.6.2 (Dark and Stormy Night) 2019-12-12 on Ubuntu 14.04 using the terminal. My computer is old and is a 32-bits system.

    • @DataProfessor
      @DataProfessor  5 лет назад

      Master Stroke Have you tried installing using:
      sudo apt-get install r-base

    • @MasterStroke.
      @MasterStroke. 5 лет назад

      @@DataProfessor Yes i did and it gives a version from 2015 even though I use update and upgrade comands.

  • @saicharanbogasamudram4566
    @saicharanbogasamudram4566 3 года назад +4

    The concept is very well explained and easily understood even for the people like me who are new to the programming world.

    • @DataProfessor
      @DataProfessor  3 года назад

      Glad to hear that, thanks for watching! 😊

  • @optimusprime1317
    @optimusprime1317 6 месяцев назад +2

    The first ever ML code that I ran successfully.
    AlhamduLlillah

  • @arlenehetherington421
    @arlenehetherington421 Год назад +1

    Hi Data Professor! I am new to R and just trying to learn how to do analyses with R, thank you for the video!

  • @alfredowaltertincodomingue1625
    @alfredowaltertincodomingue1625 5 лет назад +1

    Hello from Peru! I really liked this video, it was very simple to understand. I'll wait for more videos! thanks

    • @DataProfessor
      @DataProfessor  5 лет назад

      Alfredo Walter Tinco Domínguez Thanks Alfredo for your support 😄

  • @curiouswanderer793
    @curiouswanderer793 3 года назад +1

    Thanks Sir. I'm reviewing my knowledge. The explanation is very lucid and it is easy to match my understanding.

    • @DataProfessor
      @DataProfessor  3 года назад

      Thanks for the kind words, glad it was helpful! :)

    • @curiouswanderer793
      @curiouswanderer793 3 года назад

      @@DataProfessor is there a twitter Id or social media Id to contact, Sir. I just need the code that you used to create the box plot in video that's there in the plot window of R..

    • @SimoBenziane
      @SimoBenziane 3 года назад +1

      @@curiouswanderer793 he literally went over it word by word. Just copy

  • @WallStreetNewscast
    @WallStreetNewscast 5 лет назад +2

    Really enjoy your videos as helpful for newbie just getting into field

    • @DataProfessor
      @DataProfessor  5 лет назад

      WallStreetNewscast Thanks for your support and comment, more practical tutorials coming up 😃

  • @imtiazahmad4456
    @imtiazahmad4456 2 года назад

    Thanks sir ...Your explanation at every step of code is very good.. good teaching skills you have ..

  • @gassolo
    @gassolo 5 лет назад +1

    I found your channel from facebook! Thank you for sharing this knowledge!

    • @DataProfessor
      @DataProfessor  5 лет назад

      Thanks for your comment, glad to have your support!

  • @biologicalstatistics3320
    @biologicalstatistics3320 10 месяцев назад

    I prefer splitting the data and many of the pre-processing without additional packages and performing traditional exploratory data analysis (EDA) and statistical tests before building ML models. It teaches students much about the concepts, especially when it involves cross-validation, and also improves programming muscle memory.

  • @theforester_
    @theforester_ 2 года назад +1

    thanks for sharing the knowledge. greetings from brazil

  • @Insipidityy
    @Insipidityy 5 лет назад +2

    Thanks for the video :) The author of Caret is now transitioning to full-time development on Parsnip, which is designed to be a more robust and tidy meta engine. Parsnip, along with other ML-related packages, form a group of packages called "Tidymodels" - much like "Tidyverse". Would love if you could dedicate a future to tidymodels :)

    • @DataProfessor
      @DataProfessor  5 лет назад +2

      Thanks for the comment and suggestion! I will definitely look into parsnip and the tidy models and make a video about it.

  • @NAME3ify
    @NAME3ify 3 года назад +1

    Wow very informative video, thank you Data Professor

    • @DataProfessor
      @DataProfessor  3 года назад

      Glad it was helpful!

    • @NAME3ify
      @NAME3ify 3 года назад

      @@DataProfessor Im trying to apply this analytical concept to my data but I wasn't sure how applicable it could be. I am trying to characterize the performance of a set of maize hybrids under organic farming systems. I collected of categorical variables for different agronomic management practices forexample, weed control, planting density, cover crop type, type of manure applied, rate of manure application etc. So I am trying to regress the performance of each hybrid to different management practices. Is there a way I can use CART algorithms to predict the yield of a hybrid given a set of management practices?

  • @ajayvaidya7
    @ajayvaidya7 3 года назад +3

    Hi! I have a question regarding the training model, when you write a code "Species ~ . , why do you use the tilde and the comma after the species? Thank you for this video.

    • @whysolow
      @whysolow 3 года назад +1

      It's a formula and using all other variables for prediction. That's what . is there. You can use Species ~ sepal.length if you want to use only the variable for prediction

  • @stretch8390
    @stretch8390 3 года назад +1

    These videos really are so informative; thankyou!

    • @DataProfessor
      @DataProfessor  3 года назад

      You're so welcome!

    • @stretch8390
      @stretch8390 3 года назад +1

      @@DataProfessor Also, for the homework around the ~5min mark, I think the distribution is roughly the same between the 80% and 20% subsets!

  • @andizaky5563
    @andizaky5563 Год назад

    I use this for my homework:
    # Create plots of TrainingSet and TestingSet
    p1

  • @norbertlabonne7280
    @norbertlabonne7280 4 года назад +1

    What an absolutely amazing video!

    • @DataProfessor
      @DataProfessor  4 года назад

      Glad you enjoyed it! 😊

    • @norbertlabonne7280
      @norbertlabonne7280 4 года назад

      @@DataProfessor All your videos are great to watch! On this one, I'm however getting "Error: `data` and `reference` should be factors with the same levels." when I reach your "Model performance (Displays confusion matrix and statistics)" section :( Would you know why? Thanks!

  • @alexandrekouakou8385
    @alexandrekouakou8385 7 месяцев назад

    Thanks Sir for your explanation. Very good pedagogy. I want to know which books did you read or courses did you do to clear all these concepts. I really appreciate the way you are comfortable with these terminologies and I want to have this confidence too

  • @SaiiNattacha
    @SaiiNattacha 3 года назад +1

    ขอบคุณอาจารย์มากค่ะ ช่วยโปรเจ็คได้มากเลยค่ะ

  • @alirezahabibi7708
    @alirezahabibi7708 2 года назад

    Many thanks for video
    How can work with "prob" and calculate probability in SVM model???
    thakns

  • @JavierHernandez-nq7gg
    @JavierHernandez-nq7gg 2 года назад

    Hello, i got an error. It said that R could not found the function train. I already load the package caret. What could be the problem?

  • @jeyasheelamj2350
    @jeyasheelamj2350 3 года назад

    clear explanation professor

  • @MohamedFediBelaid
    @MohamedFediBelaid 7 месяцев назад

    thank so much professor

  • @OnlineGreg
    @OnlineGreg 3 года назад +1

    thanks for this great video. but how do i actually use my classificator now on data that is unknown to my computer, without label to compare?

    • @DataProfessor
      @DataProfessor  3 года назад

      Thanks, you can do that by using these new data as the test set. So you can apply the model to make predictions on the test set using model.predict(X_test) where model is the Instantiated model such as model = RandomForestClassifier(max_depth=2, random_state=42)

    • @OnlineGreg
      @OnlineGreg 3 года назад

      @@DataProfessor thank you very much!!

  • @joeyng7366
    @joeyng7366 3 года назад +1

    Thank you so much!!

  • @jonasmuller6725
    @jonasmuller6725 3 года назад +1

    Great content very helpful 🙏

  • @kf_oreginalshorts6882
    @kf_oreginalshorts6882 Год назад

    This is great, is there any other video discussing about the parameters and tuning as a continuation for this video? in R of course.

  • @jeyasheelamj2350
    @jeyasheelamj2350 3 года назад

    professor would you give the code to scatterplot training and testing set

  • @TimmRAH
    @TimmRAH 4 месяца назад

    Love your video, but your shirt as well haha! :) Greetings from Germany

  • @yeongjaekim2511
    @yeongjaekim2511 3 года назад

    Dear professor! Can I use this algorithm to classify continuous dependent variable? And generates one feature importance figure?

  • @xanboyyy
    @xanboyyy 3 года назад +1

    For the homework:
    I used this code:
    plot(TrainingSet, main="TrainingSet")
    plot(TestingSet, main="TestingSet")
    the distribtion was roughly the same. I was wondering though, how do you put two plots next to each other. With this one I used the arrows to go between plots. Is there a way to put them next to each other?

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Hi, you can use the cowplot library to create a multi-plot figure created from ggplot

    • @whysolow
      @whysolow 3 года назад +1

      You can also use package gridExtra having grid.arrange function by assigning plot to a variable

  • @jehushaphat
    @jehushaphat 3 года назад

    Is the cross-validation model never applied to the testing set?

  • @fid8488
    @fid8488 3 года назад

    Is it just coincidence that the Model.training and Model.cv bring the same accuracy? I might missunderstood but why do we predict the TrainingSet at Model.cv? In this case we dont work with the TestingSet at all right?

  • @ruhinehri5607
    @ruhinehri5607 3 года назад

    Hi there... I m stuck in developing svm for college admission dataset in R... will u help me for this

  • @ahmethaci8158
    @ahmethaci8158 3 года назад +1

    hi, thanks for your great explanation, when I try to split the dataset, it says that could not find function "creatDataPartition". Which package should I install?

    • @ahmethaci8158
      @ahmethaci8158 3 года назад

      By the way, I have downloaded Caret package, is there any other packages should be downloaded?
      Thank you in advance

    • @DataProfessor
      @DataProfessor  3 года назад +1

      Hi, it's part of Caret and to split the dataset that's all you need.

    • @ahmethaci8158
      @ahmethaci8158 3 года назад +1

      @@DataProfessor Ok I will try again, thanks a lot

    • @whysolow
      @whysolow 3 года назад +1

      You can also use sample.split

  • @claudiocrespo4703
    @claudiocrespo4703 Год назад

    I have problems installing "caret". At the final of the instalation process: "The downloaded source packages are in
    ‘/private/var/folders/y1/0vypd8ps5pz3vtz4pctv7xf80000gn/T/RtmpB0tlRX/downloaded_packages’", but it does not work at the moment to load using library: Error in library("caret") : there is no package called ‘caret’. I am using RStudio Version 1.3.1093 from MacOs 10.12.6

  • @kanintarntira6450
    @kanintarntira6450 5 лет назад +1

    Hi Chanin, I have several questions about the svm polynimial kernal model
    1. Did you choose the svmPoly model because it suits the Iris dataset? or you just pick this model to give us as an example?
    2. Since I am not familiar with this model, I just want to know what kind of dataset that works great with this model?
    Good video by the way, thanks

    • @DataProfessor
      @DataProfessor  5 лет назад +2

      Thanks for your support and comment.
      1. The polynomial kernel of SVM is chosen randomly as an example.
      2. As every dataset is unique in its own right, therefore it is difficult to suggest the best ML algorithm in advance prior to model building. Furthermore, to yield the best possible performance it is recommended that hyperparameter optimization be peformed for the selected ML algorithm.
      In spite of the points mentioned in the response to the second question, there was a paper published by one of our colleague whereby they suggest a generally good starting point for the parameter values when using SVM. Please check it out at pubs.acs.org/doi/full/10.1021/ci500344v

    • @kanintarntira6450
      @kanintarntira6450 5 лет назад +1

      @@DataProfessor Thanks a lot for the answer and the additional resource, surely it will be helpful for me one day.
      Hope to see your next content soon.
      :)

    • @DataProfessor
      @DataProfessor  5 лет назад +1

      @@kanintarntira6450 It's a pleasure, next video should be out in the next 1-2 days. Please don't forget to subscribe and hit the notification bell and also smash the like button. Thanks again for your comment.

  • @hatho1965
    @hatho1965 3 года назад

    Hello! Amazing video! Unfortunately i have an error : "Error in match.arg(norm, c("none", "overall", "average")) :
    'arg' must be NULL or a character vector" when i try to run "fit.training.confusion

  • @minhaaj
    @minhaaj 4 года назад +1

    also please write the interpretation guidelines of the results in comments so that the beginners can understand and remember while running the code at their end. it would help newbies.

    • @DataProfessor
      @DataProfessor  4 года назад +1

      Thanks for the comment. I discussed about ML interpretation in another video ruclips.net/video/34yBgah8Uyg/видео.html and also made an infographic about it here github.com/dataprofessor/infographic/blob/master/05-Interpretability-of-Data-Science-Models.JPG

    • @minhaaj
      @minhaaj 4 года назад

      @@DataProfessor man thats brilliant! thank you so much! do you have a patreon or something?

    • @techwithmuchiri5921
      @techwithmuchiri5921 4 года назад

      Anyone getting this error when running the codes under displays confusion matrix and statistics, " Error: 'data' and 'reference ' should be factors with the same levels

  • @faristambas59
    @faristambas59 3 года назад

    Many thanks for this very informative video! I have a question about sensitivity. How can I fix sensitivity to a 100%? I want to have a model with a sensitivity of 100%, is there a way to fix it?

  • @masciacca1
    @masciacca1 Год назад

    This was a wonderful tutorial! Does this model assume independence between variables (i.e. petal length, petal width, etc.)? If so, how might I go about building a non-parametric model similarly?

  • @seshendravemuri
    @seshendravemuri 5 лет назад +1

    Hi, I am from India. I have been learning machine learning from past 4 months in R. I want to know how an machine learning model gets implemented in real world. Please explain or make video

    • @DataProfessor
      @DataProfessor  5 лет назад +1

      seshendra vemuri Hi Seshendra, thanks for your comment, now that makes a great idea for a video. Let me answer you this in a future video.

  • @johngallop5063
    @johngallop5063 10 месяцев назад

    Hi,
    I am super new but I am getting an error that says 'TrainingIndex' not found. Any idea what I am doing wrong?

    • @johngallop5063
      @johngallop5063 10 месяцев назад

      Nevermind, I just didn't load the caret package I think4. Got it! Thanks!

  • @19B81
    @19B81 4 года назад

    Thanks for the thorough explanation
    I have 2 questions please:
    - how would you calculate the roc curve for the classification?
    - to present the model (for example in publication), we mainly need the roc curve for the test set, correct? I mean auc for the training set can be misleadingly high?
    Thanks again for taking the time to prepare this useful material

    • @DataProfessor
      @DataProfessor  4 года назад

      To calculate the ROC curve, check out this video ruclips.net/video/uVJXPPrWRJ0/видео.html
      ROC AUC for Training is expected to give a high value but it could be used for comparison purpose with the Cross-Validation set and also a Test set.

    • @19B81
      @19B81 4 года назад

      @@DataProfessor excellent video. I am planning on learning python (hopefully soon). when you have time, can you create a video explaining the ROC curves using R? also there is a new package called MLeval that appears to do this job in a convenient way but not sure how accurate it is. if you can also comment on it in your video, will be great. Thanks again!

  • @cebo_sa1952
    @cebo_sa1952 Год назад

    Great video, as always. I just want to ask. I've created a random forest model in R and I am happy with the Error rate when predicting both training and test data. What I want to know is how do I now apply my model in the classification of non-classified data. Essentially, I want my model to look at new data which is like the data it was trained on, predict and fill an empty class column for me with the correct classification.

    • @DataProfessor
      @DataProfessor  Год назад

      You could try applying K-nearest neighbor or PCA to see whether the new data is similar to the original training data, if it is then the trained model would be applicable for the new data.

  • @salmaasghar9099
    @salmaasghar9099 2 года назад

    From where do u belong? Thailand??? Is there tonal aviation in Thai names also???

  • @Okwach_Kich
    @Okwach_Kich 2 года назад +3

    didn't catch your name

  • @neilloveminecraft
    @neilloveminecraft 4 года назад

    Are you from Thailand?