Why do you perform label insteand of one hot encoding as the former will implicityl make your model intepret ordinal relationship between the labeled categories ?
You can preprocess in excel using match and index function. Or create an extra column in your dataframe and use if statement and for loop to check customer ID and pick from you data containing the churn parameters to your previous data set which does not have churn data.
Thanks simple and informative video it was very helpful. I have two questions to ask though: 1. How to define churn? ex: for gaming company if user do not login for more than 1week they say user churned but in companies like airbnb their churn would be few months or years. Is there a way to define meaning of churn? 2. What other models are useful in churn prediction other than logistic regression?
Nice Video ! Thank u ! Sir can you upload video, by taking two different datasets for train and test . Just like Hackathons ! And also a video on Time Series Forecasting
Please I am new in machine learning, as I was practicing your tutorial I got this error; ValueError: Number of labels=1409 does not match number of samples=5634. I'd like you to help me correct it.
If a customer churns, it means that they stop doing business with the company. For example, if a Netflix customer churns, it means that they canceled their subscription.
Thanks but TotalCharges does contains 11 empty values which can be replaced with numpy NaN and classes are also imbalance. You can find empty values with df.loc[df.TotalCharges == " " , "TotalCharges"] And can replace them df.loc[df.TotalCharges == " " , "TotalCharges"] = np.nan
hello, can you share link to a copy of the raw data please?
Why do you perform label insteand of one hot encoding as the former will implicityl make your model intepret ordinal relationship between the labeled categories ?
Great work honestly
Excellent exploration!
Excellent tutorial. I have one question. Why did you use logistic regression? Could you have used other prediction models?
logistic regression is good for a binary output like churn yes or no, you could use SVM as well for this type of problem
It seems there's so many videos on building the model, but not on how to deploy it. How do you then use this to predict churn?
verrrrrrry nice. I skimmed. im gonna watch the whole vid now. Thanks for you work, it is very appreciated.
Thanks for the nice comment!
Hey man great vid!!! You have a more current vid on sentiment analysis.... how can i get the data set you use?
What if Churn value is not provided in the data set, how we can insert the column in data set using the customer ID
same question
You can preprocess in excel using match and index function. Or create an extra column in your dataframe and use if statement and for loop to check customer ID and pick from you data containing the churn parameters to your previous data set which does not have churn data.
If the data is sorted in order of customer ID number then
df ['Churn']=data, should create a new column with the churn data.
Hey can you upload your excel as well as code on Github
How can we improve our accuracy and recall score ??
Thanks simple and informative video it was very helpful. I have two questions to ask though:
1. How to define churn? ex: for gaming company if user do not login for more than 1week they say user churned but in companies like airbnb their churn would be few months or years. Is there a way to define meaning of churn?
2. What other models are useful in churn prediction other than logistic regression?
+1
Nice Video ! Thank u ! Sir can you upload video, by taking two different datasets for train and test . Just like Hackathons ! And also a video on Time Series Forecasting
Wtf man, u did almost all that I wanted to do
As always, something interesting thx! :)
Glad you enjoyed it !
Please I am new in machine learning, as I was practicing your tutorial I got this error; ValueError: Number of labels=1409 does not match number of samples=5634. I'd like you to help me correct it.
How to prevent zero division error?
Thanks alot for the work, sir can you please provide the link for the uploaded cvs file that you have done in second cell.
its from kaggle you can google it
Good stuff....
Thanks!
can you please tell me , what's wrong in this ?
Yes no for churn how you mentioned this?
Using labelencoder & standardscaler before splitting the data to training & testing dataset May lead to over fitting.Most likely in linear models.
File "", line 2
fig,ax = plt.subplots(1,2 figsize=(28,8))
^
SyntaxError: invalid syntax
You’re missing the comma after the 2
What is churn?could anyone explain it clearly
If a customer churns, it means that they stop doing business with the company. For example, if a Netflix customer churns, it means that they canceled their subscription.
Thanks but TotalCharges does contains 11 empty values which can be replaced with numpy NaN and classes are also imbalance.
You can find empty values with
df.loc[df.TotalCharges == " " , "TotalCharges"]
And can replace them
df.loc[df.TotalCharges == " " , "TotalCharges"] = np.nan