So here are objective u can obtained by using this statistical method, 1)Which features have good impact for ur model 2)Which type of algorithms u should choos
May be I am wrong, should that be "sepal length" instead of "petal length" in xlabel? based on your plot variables or feature used for univariate analysis
Hi I have a doubt these plots are ok for small datasets and interesting while learning but is these graphs helps when handling real time data or while working with real data science projects.
Thanks for the excellent tutorial..! But this works well for classification problems. How shall we perform the similar analysis for Regression problem..!?
Isn't multivariant analysis a consolidated representation of bivariant analysis, where all possible combinations of bivariant analysis are represented together?
so from multivariate if we some graphs with overlapping variables like sepal length and sepal width, we can ignore one of them while doing any further analysis ? Please help here
Interesting method to plot univariate, I generally create scatterplots to make similar deductions in terms of what kind of classifier will make sense. Here's some sample code: import matplotlib.pyplot as plt from sklearn import datasets iris = datasets.load_iris() X = iris.data y = iris.target F = iris.feature_names fig, ax = plt.subplots(1, len(F), figsize=(15,2)) for i,f in enumerate(F): ax[i].scatter(X[:,i],y, c=y) ax[i].set(xlabel=f) ax[i].get_yaxis().set_visible(False)
In the uni-variate analysis, why do you put all data points on the same level? By putting them onto different levels, e.g. by setting np.zeros_like()+0, np.zeros_like()+1 and np.zeros_like()+2, it will be very clear that these 3 data sets overlap very heavily as opposed to what you say @9:00 (unless I have misunderstood what you said there). Otherwise great lectures, thanks a lot!
hey , he's just trying to visualize the dependency of output feature on that particular feature i.e. "petal_width" .so there is no need for y axis if u want u can put x =0 , and plot it on y axis and we endup with a vertical stack :)
Hi Krish...when I am executing this code 'plt.plot(df_setosa['Sepal.Length'],np.zeros(df_setosa['Sepal.Length']),'o') it is returning a value error that reads as 'sequence too large; cannot be greater than 32'. How did you execute without getting this error. How to resolve?
Thank you very much for your great videos. However, this is the first video of your playlist that I could not understand. The dataset was not clear and you did not explained much.
if u r talking about getting the unique values in species then following code will help:- for unique numbers of species - iris_data['Species'].nunique() for names of those unique species - iris_data['Species'].unique()
Hello sir huge fan following ur ML playlist and I'm getting error in stringIO sir I also saw youtube video but I'm not able to slove the error it say No module something can u please guide me I'm stuck in your 7th playlist pls let me know sir it will be helpful
When I import iris in python , no commands is working I am getting error as "AttributeError: info" , and also "AttributeError: describe" , please solve this, why I am getting this error
My personal recommendation would be to start with python , basics of SQL and couple of ML algorithms i.e regression. It all comes to how many projects you have actually created..good luck 👍
ur teaching skills are damn good man keep it up man lots of respect
Another easy way to do the bivaruate plot at 11:20 is sns.scatterplot(df['sepal_length'],df['sepal_width'],hue=df['species'])
The best explanation about these variates ...
One small correction. That Hue is pronounced "Hiu" instead of "Hui". You are making absolutely great content. Love them all. Keep growing. (Y)
But I like how he pronounced 'HUII' :D
slave mindset ?
Just one tiny correction for Univariate x label should be Sepal Length ...all other good ..Thanks Krish
thank you so much for this..I dont know why I was unable to understand this concept. Thanks for this
Great job. Your sincerity shows. Wonderful effort.
X lab should have been 'Sepal length' instead of 'Petal Length'
I came in comment box to check same
I love when krish calls Hue as Huiii
So here are objective u can obtained by using this statistical method,
1)Which features have good impact for ur model
2)Which type of algorithms u should choos
Wow what a nice explaination! 👌 👋
you are grate sir .i am really grateful to your vedios thank you thank you so much sir.
univariate, bivariate and multivariate analysis should be done before data prep-processing or after......Please Reply...
after
Really helpful. Thanks
Thank you
Thanks for tutorial.Please arrange tutorials in proper sequential of related tutorials.
Pretty badass :) Thanks!
@Quincy Sebastian please provide me an account :/
Thank you so much sir . Great explanation
May be I am wrong, should that be "sepal length" instead of "petal length" in xlabel? based on your plot variables or feature used for univariate analysis
ya its sepal length may be there is some mistake
You need to have x label as sepal length in univariate analysis.
Hi I have a doubt these plots are ok for small datasets and interesting while learning but is these graphs helps when handling real time data or while working with real data science projects.
Hello Sir, could you please help me out with multivariate correlation through SPSS??
Thanks for the excellent tutorial..!
But this works well for classification problems. How shall we perform the similar analysis for Regression problem..!?
Question: it is possible to use categorical features to make predictions for a numerical targer variable ??
Line 17th code needs modification as follows:
sns.FacetGrid(df,hue="species").map(plt.scatter,"petal_length","sepal_width").add_legend();
plt.show()
Thanks Sir!
sir i think there is 'sepal length' instead of 'petal length' in xlabel. am i wrong or right??
Isn't multivariant analysis a consolidated representation of bivariant analysis, where all possible combinations of bivariant analysis are represented together?
so from multivariate if we some graphs with overlapping variables like sepal length and sepal width, we can ignore one of them while doing any further analysis ? Please help here
Interesting method to plot univariate, I generally create scatterplots to make similar deductions in terms of what kind of classifier will make sense.
Here's some sample code:
import matplotlib.pyplot as plt
from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target
F = iris.feature_names
fig, ax = plt.subplots(1, len(F), figsize=(15,2))
for i,f in enumerate(F):
ax[i].scatter(X[:,i],y, c=y)
ax[i].set(xlabel=f)
ax[i].get_yaxis().set_visible(False)
Use DataExplorer package in r
Sir can you make. Video on EDA only using python. Means what are necessary steps in EDA
Wow...
can you also include link to dataset used
Sir can u plz make one video with use of spss and univariate, bivariates and multivariate analysis
Just use the graph node and plot your histograms and scatter plots for all the variables you require.
Hi krosh what will be the codes for R for same analysis??
In the uni-variate analysis, why do you put all data points on the same level? By putting them onto different levels, e.g. by setting np.zeros_like()+0, np.zeros_like()+1 and np.zeros_like()+2, it will be very clear that these 3 data sets overlap very heavily as opposed to what you say @9:00 (unless I have misunderstood what you said there). Otherwise great lectures, thanks a lot!
great suggestion!
if we have more than 10 or 20 features, how can we do multivariate analysis. will it be visible clearly in pairplot
Why not just plot histograms for every feature for univariate analyis?
sir, what is web address you are using and is it free or paid please give some details about that also.
Are those 4 plots along with the diagonal density plots?
After executing the same code for univariate analysis my output is not color distributed as shown in video. can anyone help
sir can you provide some practice dataset
what if we have dimension in order of 100s...??
sir a virginica or versicolor kaya ha
How to do eda when we have many features, say 20+ and all are non correlated.
Hi Krish, Why you are keeping the Y-axis as 0. In the previous lecture also it's not explained. In graph you just kept it as 0.
Please reply.
hey , he's just trying to visualize the dependency of output feature on that particular feature i.e. "petal_width" .so there is no need for y axis if u want u can put x =0 , and plot it on y axis and we endup with a vertical stack :)
How orange , green colours came into picture, coz we didn't mention any color parameters like palette, colour?
Colors are automatically assigned if you don't mention them in the parameters
why put semicolons after your lines of code?
Hi Krish...when I am executing this code 'plt.plot(df_setosa['Sepal.Length'],np.zeros(df_setosa['Sepal.Length']),'o') it is returning a value error that reads as 'sequence too large; cannot be greater than 32'. How did you execute without getting this error. How to resolve?
U haven't written like after np. Zeros_like
❤❤❤❤❤❤❤❤❤❤
Sir how we can the data ???
Thank you very much for your great videos.
However, this is the first video of your playlist that I could not understand. The dataset was not clear and you did not explained much.
Hello sir how to know categories of given data in python? For eg. Here We want to know species categories?
if u r talking about getting the unique values in species then following code will help:-
for unique numbers of species - iris_data['Species'].nunique()
for names of those unique species - iris_data['Species'].unique()
Hello sir huge fan following ur ML playlist and I'm getting error in stringIO sir I also saw youtube video but I'm not able to slove the error it say No module something can u please guide me I'm stuck in your 7th playlist pls let me know sir it will be helpful
When I import iris in python , no commands is working I am getting error as "AttributeError: info" , and also "AttributeError: describe" , please solve this, why I am getting this error
sir evertime whenever i am running code then also error messege comes with "name df is not defined" can you please help me
try to load the data once again
how you are calling a url or internet file to read in pandas..... its like impossible for me to do... plztellme how?
Switch on internet would make it work
Sir how much is necessary to know to get job in data science (is there any bounds)
My personal recommendation would be to start with python , basics of SQL and couple of ML algorithms i.e regression. It all comes to how many projects you have actually created..good luck 👍
In univariate analysis, you have taken sepal length and labelled it as petal length , can you explain me about that.
its by mistake
coaching institutes just looted me
taught nothing like this
I can't believe you pronounced it as hueee....😂😂
Thank you