Hi, Data Professor! Thank you so much for a very lucid and well structured walk-through of how to build a classification model. I am a master student in data driven organisational change at a university in Denmark, and your course is just perfect to expand my knowledge. Thank you!
Help Please: While I type clf.fit(X, Y) > I only get a output like:- RandomForestClassifier() It not showing the whole details in my notebook, I wrote the same code above and this line shows different output result, can you help to solve this, is there any syntax to get all detailed output
@@DataProfessor Hello Sir, I am facing the same issue I was able to do the prediction however while printing the score I got 0% accuracy. I have wrote the same code as shown in the notebook. Please help. Thank you very much.
@@dhwanitrivedi5604 Please check to see if the data is loaded properly and that the data variable is read into the fit function. Also check that the variable names match since if nothing is read in then it will not produce the desired results.
Trying to use this guide. But noticed, that I need to choose how to import CSV file - as pandas dataframe or as NumPy array. Your instructions and code are for the NumPy array. My CSV file has 35000 rows and 280 columns. The first row is for column names. The first column has string target classification (y) 'good'/'bad' and all other columns are some numeric features. What should I choose?
And when I'm trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';',names=True) I get strange array shape (35000,) with column names in variables viewer. Trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';') and get array shape (35001,280) but column names imported as first row :(
@@DataProfessor Thanks. Finally I've done like this: data = pd.read_csv('datafile.csv',sep=';') data = pd.DataFrame(imp.transform(data), columns=data.columns) dataArray = data.to_numpy() X = dataArray[:,1:] X.astype('float64') Y = dataArray[:,0] Y.astype('float64') Y = pd.to_numeric(Y)
Hi Professor, Was "Feature Importance" added to the original dataset? How/When is it calculated for new data that has not already been modeled? 2. How does the dataset.load method know where to get your dataset from?
Hi Nichol, 1. Feature importance is performed after a model has been built using the random forest algorithm (which has a built-in feature importance function). We can get these important features as follows (notice the feature_importances_ function): clf = RandomForestClassifier() clf.fit(X_train, Y_train) clf.feature_importances_
Q: How/When is it calculated for new data that has not already been modeled? A: Feature importance can be calculated only for the data that was used to train the model. We can incorporate new data into the dataset, rebuild the model and recompute the feature importance.
Q: 2. How does the dataset.load method know where to get your dataset from? A: The datasets.load_iris function loads the Iris dataset from the Scikit-learn package and assigns to a variable that we specify such as assigning it to a variable called "iris". iris = datasets.load_iris() In addition, there are several other datasets provided by Scikit-learn as specified in the Scikit-learn documentation scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets For example, we can replace "load_iris" in "datasets.load_iris()" with load_boston, load_breast_cancer, load_diabetes, load_digits, load_wine to use these other datasets Hope this helps.
On Output 11 -> Y.shape comes up for (3,) but I've tried different ways and it comes up as (150,). Would you have any idea?? Btw your video is excellent, thanks for that
Thank you for great video, the explanation is very clear for me(beginner). I don't have much programming background but when I watch your video I still can understand very well. I'm new ML student and I need to do my school 1st ML project . Could you give me any suggestion where can I get simple dataset and which algorithm should I used. Thank you so much Sir. I'm wishing you a lots of success and happiness in your life.
Hi Than. Start with simple datasets that you understand what the features means such as the iris dataset for classification task and Boston housing dataset for regression. I’m sure there may be others but these 2 came to mind. The algorithm to start with, I would recommend linear regression for your regression tasks and a tree-based method like decision tree or random forest for your classification tasks.
try mnist dataset, its for handwritten digits. Or you could just make your own input vectors and expected output vectors with floats and just predict those right?
Lets say, i have a data set that has two features to be included in the Y set... how do we do that.. I have a data set with columns - class name,X,YW,H, cluster Number. I want the model to predict the class name based on the X,Y,W,H and cluster number... For all the same cluster number , i want the model to take into consideration the X,YW,H of only respective cluster number Cluster number are actually the template number of invoices , X,YW,H are co-ordinates... and class names are fields name.. so the problem statement is that... we know the cluster number and X,YWH co-ordinates and we want the system to predict which set of co-ordinates are which data fields.. so the model must only take into account those X,Y,WH for specific cluster number rather than taking all X.YW,H into account. Thanks in advance.
Hi thank you for the video. When I was doing print(clf.feature_importances_) I get a value error that says found input variavles with inconsistent numbers of samples: [150,3]. Could it be that I typed something wrong? Thank you for the help
Hi, thanks for watching! The feature_importances_ should work after the model is built via model.fit(X,Y) Can you try again and make sure that all code cells were run.
Thanks a lot for this tutorial. I wanted to ask why you made a prediction with an instance from the training set instead of using new data. Doesn't this cause overfitting?
Yes, that is correct. It was used for the initial demo purpose of using the scikit-learn classifier function, which later in the video talked about the use of the train_test_split function for performing data splitting followed by model building and evaluating on the test data.
Hi the link to the Jupyter notebook file on GitHub repo is normally in the video descriptions, here’s the link again github.com/dataprofessor/code/tree/master/python/iris
Prof. I am new to Python, and following your videos regularly. I have a problem.When i am creating a matrix by myself in Jupiter notebook, than "shape" command is working i.e. it shows the no. of rows and no. of columns. But when I am importing the iris dataset from sklearn using the code from sklearn.datasets import load_iris,iris = load_iris() and then using "iris.shape". ERROR is occurring. It shows Keyerror "shape". What can be the reason????????? Please suggest me something because I have been stuck in this.
can you try assigning X = iris.data and assign Y = iris.target where iris.data contains the 4 X variables and iris.target contains the Y variable (the species class label). Afterwards you can run X.shape and Y.shape
@@DataProfessor I'll try and tell you. I split the data like X = iris.data & Y = iris.target, and then tried to find the shape. shape is X is coming to be (150,4) and that of Y is (150,). Is it okay?
Thanks Data Professor, clear and informative video! Question in the model performance : why we compare X_test and Y_test? since they are subset of the data? why not the Y_test and Y_predict?
For evaluating model performance, there are 2 ways: (1) via the score() function as in model.score(X_test, Y_test) which will automatically use r2 for regression models and accuracy for classification models. (2) via the r2_score(Y_test, Y_test_pred) if it is a regression model or via accuracy_score(Y_test, Y_test_pred) if it is a classification model.
Hi Prof, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic. if you can give advice on how to go about it, my department reject the use of sklearn libraries. currently studying in africa. Your candid advice will be greatly appreciated.
Cybersecurity isn't really my domain. I would approach the problem by asking experts in the field on what is the current gold standard method to perform this task. Then I would research research papers in the field. Then aggregate all information to plan my own approach. That's how I would do it. Hope this helps.
HI Data Professor, could you demo , cifar10 dataset and also teach how to save the trained data and load the trained data and make prediction. Thank you !
Great Video Professor as Always!! Kindly upload videos for Handling Sensor Collected \ IoT related times series data & model building!! Thanks in Advance!! 🙂
@@DataProfessor please do, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic. if you can give advice on how to go about it, my department reject the use of sklearn libraries. Your candid advice will be greatly appreciated.
A great video sir, i am an UG student .i want to build an machine learning web app for multi language detection(eg:english,french,chinese,japanese).Please guide me how to do from basic
Thanks for your comment. What you need to do to make this web app is use the Shiny package in R for building the web app where you can build a ML model and plug it into the web app. I made several videos on this topic. Please check it out below. Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4 ruclips.net/video/ceg7MMQNln8/видео.html There are 4 other related videos and it guides you from the beginning: 1. Building your First Web Application in R | Shiny Tutorial Ep 1 ruclips.net/video/tfN10IUX9Lo/видео.html 2. Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2 ruclips.net/video/lC1Dk6gUbe0/видео.html 3. Building Data-Driven Web Application in R | Shiny Tutorial Ep 3 ruclips.net/video/CYXvVuklWRM/видео.html 5. Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5 ruclips.net/video/9EQ6cwBQpvo/видео.html
Thanks for the video, could you pleas build signal classifier model in Python such as EEG and ECG signal or recommend me a materials for that.... thanks in advance
Thanks for the comment. You might want to check out this GitHub page for some repository that have the code of EEG classifiers github.com/topics/eeg-classification
Code of this tutorial is available as a Jupyter notebook via GitHub (link below).
📎CODE: github.com/dataprofessor/code/tree/master/python/iris
When I open that link save into my computer and when I open this file in Jupiter notebook it shows Json format instead of Jupiter note book code
@@ramprasadsapkota1013 are you doing it through conda or have you downloaded jupyter lab separately
I learn better in this 20 min video than in 1 semester at the university. Thank you so much!
Glad to hear that :)
Thank you so much! I was able to come up 17 places in this ML competition because of this. Short and to the point.
Never though people who study biology are interesting but now i change my opinion, you are the best ..
Thanks!
I really appreciate your organized approach to making this video very clear and simple. Thank you Professor!
You're very welcome! Thanks for the kind words!
Ran my first test for my thesis! Super informative, thanks :)
Hi, Data Professor! Thank you so much for a very lucid and well structured walk-through of how to build a classification model. I am a master student in data driven organisational change at a university in Denmark, and your course is just perfect to expand my knowledge. Thank you!
Thanks Thomas for the kind words!
Hey prof, I'm a prof @FSU, direct a ML lab, great tutorial, will use this onboarding new students, Many thanks!.
Awesome.....better then our college syllabus..........
Thanks, awesome!
ครูชานินวีดีโอคนนี้ดีมากครับ Looking forward to the next video in the series. ขอบคุณมากนะครับ 🙏🏻
Thanks for the comment and kind words!
Such a good tutorial.. there aren't many that cover non binary classification. Thank you
You're very welcome!
My first project is to spell your name correctly :) Love you professor.
Help Please: While I type clf.fit(X, Y) > I only get a output like:- RandomForestClassifier()
It not showing the whole details in my notebook, I wrote the same code above and this line shows different output result, can you help to solve this, is there any syntax to get all detailed output
Hi, what output are you getting? It should show the parameters used for building the model as the output.
@@DataProfessor Hello Sir, I am facing the same issue I was able to do the prediction however while printing the score I got 0% accuracy. I have wrote the same code as shown in the notebook. Please help. Thank you very much.
@@dhwanitrivedi5604 Please check to see if the data is loaded properly and that the data variable is read into the fit function. Also check that the variable names match since if nothing is read in then it will not produce the desired results.
Trying to use this guide. But noticed, that I need to choose how to import CSV file - as pandas dataframe or as NumPy array. Your instructions and code are for the NumPy array. My CSV file has 35000 rows and 280 columns. The first row is for column names. The first column has string target classification (y) 'good'/'bad' and all other columns are some numeric features. What should I choose?
And when I'm trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';',names=True) I get strange array shape (35000,) with column names in variables viewer. Trying to import as numpy array with numpy.genfromtxt('file.csv',dtype=None,delimiter=';') and get array shape (35001,280) but column names imported as first row :(
Hi, have you tried importing using pandas
import pandas as pd
df = pd.read_csv('file.csv')
Afterwards you can separate df to X and y.
@@DataProfessor Thanks. Finally I've done like this:
data = pd.read_csv('datafile.csv',sep=';')
data = pd.DataFrame(imp.transform(data), columns=data.columns)
dataArray = data.to_numpy()
X = dataArray[:,1:]
X.astype('float64')
Y = dataArray[:,0]
Y.astype('float64')
Y = pd.to_numeric(Y)
Hi Professor, Was "Feature Importance" added to the original dataset? How/When is it calculated for new data that has not already been modeled? 2. How does the dataset.load method know where to get your dataset from?
Hi Nichol,
1. Feature importance is performed after a model has been built using the random forest algorithm (which has a built-in feature importance function). We can get these important features as follows (notice the feature_importances_ function):
clf = RandomForestClassifier()
clf.fit(X_train, Y_train)
clf.feature_importances_
Q: How/When is it calculated for new data that has not already been modeled?
A: Feature importance can be calculated only for the data that was used to train the model. We can incorporate new data into the dataset, rebuild the model and recompute the feature importance.
Q: 2. How does the dataset.load method know where to get your dataset from?
A: The datasets.load_iris function loads the Iris dataset from the Scikit-learn package and assigns to a variable that we specify such as assigning it to a variable called "iris".
iris = datasets.load_iris()
In addition, there are several other datasets provided by Scikit-learn as specified in the Scikit-learn documentation scikit-learn.org/stable/modules/classes.html#module-sklearn.datasets
For example, we can replace "load_iris" in "datasets.load_iris()" with load_boston, load_breast_cancer, load_diabetes, load_digits, load_wine
to use these other datasets
Hope this helps.
@@DataProfessor oh! I didn't know that. thanks for pointing that out.
@@DataProfessor Makes perfect sense. thanks!
Oh my God !! This is great stuff !! Thankyou so much !!
On Output 11 -> Y.shape comes up for (3,) but I've tried different ways and it comes up as (150,). Would you have any idea?? Btw your video is excellent, thanks for that
Thank you for great video, the explanation is very clear for me(beginner). I don't have much programming background but when I watch your video I still can understand very well. I'm new ML student and I need to do my school 1st ML project . Could you give me any suggestion where can I get
simple dataset and which algorithm should I used. Thank you so much Sir. I'm wishing you a lots of success and happiness in your life.
Hi Than. Start with simple datasets that you understand what the features means such as the iris dataset for classification task and Boston housing dataset for regression. I’m sure there may be others but these 2 came to mind. The algorithm to start with, I would recommend linear regression for your regression tasks and a tree-based method like decision tree or random forest for your classification tasks.
Dear Sir, Thank you so much for your reply. I will always take note all of your best suggestion.
try mnist dataset, its for handwritten digits. Or you could just make your own input vectors and expected output vectors with floats and just predict those right?
Sir if we don't have balance class for target variable then what to do?
Great question then we will have to balance the target variable by performing under sampling or over sampling
sir how to use k means algorithm as a training dataset in python
You'll need to import the necessary library
from sklearn.cluster import KMeans
and use the KMeans() function
@@DataProfessor thanku sir
Lets say, i have a data set that has two features to be included in the Y set... how do we do that..
I have a data set with columns - class name,X,YW,H, cluster Number.
I want the model to predict the class name based on the X,Y,W,H and cluster number...
For all the same cluster number , i want the model to take into consideration the X,YW,H of only respective cluster number
Cluster number are actually the template number of invoices , X,YW,H are co-ordinates... and class names are fields name.. so the problem statement is that... we know the cluster number and X,YWH co-ordinates and we want the system to predict which set of co-ordinates are which data fields..
so the model must only take into account those X,Y,WH for specific cluster number rather than taking all X.YW,H into account.
Thanks in advance.
Hi thank you for the video. When I was doing print(clf.feature_importances_) I get a value error that says found input variavles with inconsistent numbers of samples: [150,3]. Could it be that I typed something wrong? Thank you for the help
Hi, thanks for watching! The feature_importances_ should work after the model is built via model.fit(X,Y)
Can you try again and make sure that all code cells were run.
@@DataProfessor I got it. Thank you so much for the quick reply
Thanks a lot for this tutorial. I wanted to ask why you made a prediction with an instance from the training set instead of using new data. Doesn't this cause overfitting?
Yes, that is correct. It was used for the initial demo purpose of using the scikit-learn classifier function, which later in the video talked about the use of the train_test_split function for performing data splitting followed by model building and evaluating on the test data.
Super helpful!!! Thank you so much!!!
loved the video. You teach very well
Thank you 😊
Thank you so much for your patient explaination! I wonder what X[[0]] means?
Nice video, I've just started using python as well, so I hope u can keep up updating video on python as well as R 😁
Thanks Marco for your support! More coming up.
great job man!
Hi,
How can I find the file in GitHub l tried but got different files
Hi the link to the Jupyter notebook file on GitHub repo is normally in the video descriptions, here’s the link again github.com/dataprofessor/code/tree/master/python/iris
Thanks heaps, your tutorial is awesome !!!
Prof. I am new to Python, and following your videos regularly. I have a problem.When i am creating a matrix by myself in Jupiter notebook, than "shape" command is working i.e. it shows the no. of rows and no. of columns. But when I am importing the iris dataset from sklearn using the code from sklearn.datasets import load_iris,iris = load_iris() and then using "iris.shape". ERROR is occurring. It shows Keyerror "shape". What can be the reason????????? Please suggest me something because I have been stuck in this.
can you try assigning X = iris.data and assign Y = iris.target where iris.data contains the 4 X variables and iris.target contains the Y variable (the species class label). Afterwards you can run X.shape and Y.shape
@@DataProfessor I'll try and tell you. I split the data like X = iris.data & Y = iris.target, and then tried to find the shape. shape is X is coming to be (150,4) and that of Y is (150,). Is it okay?
yes that is correct, 150 means that there are 150 rows and 4 means there are columns
@@DataProfessor Thanks a ton Prof.
@@abhipsatripathy3934 You're welcome 😃
Thanks Data Professor, clear and informative video! Question in the model performance : why we compare X_test and Y_test? since they are subset of the data? why not the Y_test and Y_predict?
For evaluating model performance, there are 2 ways:
(1) via the score() function as in model.score(X_test, Y_test) which will automatically use r2 for regression models and accuracy for classification models.
(2) via the r2_score(Y_test, Y_test_pred) if it is a regression model or via accuracy_score(Y_test, Y_test_pred) if it is a classification model.
@@DataProfessor noted. thanks
Very useful video. Thanks a lot!
Glad to hear that!
How exactly should we do this on our own?
Hi Prof, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic.
if you can give advice on how to go about it, my department reject the use of sklearn libraries.
currently studying in africa.
Your candid advice will be greatly appreciated.
Cybersecurity isn't really my domain. I would approach the problem by asking experts in the field on what is the current gold standard method to perform this task. Then I would research research papers in the field. Then aggregate all information to plan my own approach. That's how I would do it. Hope this helps.
You have xplained it nicely..Plz explain machine learning in Python using Brain Arteries data set...
can I do a classification model in burnout?
Thank you for sharing, great explanation!!
Simply awesome.
Thanks again for the kind words 😃
HI Data Professor, could you demo , cifar10 dataset and also teach how to save the trained data and load the trained data and make prediction. Thank you !
Great suggestion! I will definitely take a look and consider for future videos 😃
hello how can i contact you?
Great Video Professor as Always!! Kindly upload videos for Handling Sensor Collected \ IoT related times series data & model building!! Thanks in Advance!! 🙂
Thanks for the suggestion, I’ll definitely consider this for future videos 😃
@@DataProfessor please do, I am currently working on a project, its IoT related, and i need to use machine learning to detect the DDoS traffic and then use other algorithms to block the traffic.
if you can give advice on how to go about it, my department reject the use of sklearn libraries.
Your candid advice will be greatly appreciated.
@thestorm4633 how did your project go?
Many thanks!
A pleasure! Thanks for watching 😃
thx😀
A great video sir, i am an UG student .i want to build an machine learning web app for multi language detection(eg:english,french,chinese,japanese).Please guide me how to do from basic
Thanks for your comment. What you need to do to make this web app is use the Shiny package in R for building the web app where you can build a ML model and plug it into the web app.
I made several videos on this topic. Please check it out below.
Building the Machine Learning Web Application in R | Shiny Tutorial Ep 4
ruclips.net/video/ceg7MMQNln8/видео.html
There are 4 other related videos and it guides you from the beginning:
1. Building your First Web Application in R | Shiny Tutorial Ep 1
ruclips.net/video/tfN10IUX9Lo/видео.html
2. Build Interactive Histogram Web Application in R | Shiny Tutorial Ep 2
ruclips.net/video/lC1Dk6gUbe0/видео.html
3. Building Data-Driven Web Application in R | Shiny Tutorial Ep 3
ruclips.net/video/CYXvVuklWRM/видео.html
5. Build BMI Calculator web application in R for health monitoring | Shiny Tutorial Ep 5
ruclips.net/video/9EQ6cwBQpvo/видео.html
thanks a lot sir
thank you so much
Thanks for the video, could you pleas build signal classifier model in Python such as EEG and ECG signal or recommend me a materials for that.... thanks in advance
Thanks for the comment. You might want to check out this GitHub page for some repository that have the code of EEG classifiers github.com/topics/eeg-classification
@@DataProfessor thanks so much
great work and I LOVE YOUR NAME
Thanks for watching!
my professor is a finger print candidate
Like seriously you don’t even explain some figures in it 😢😢😢 I keep repeating the file over 10times😢 your tutorial is very low