Not sure why this video has so less views. This is one of the most comprehensive videos to learn a lot of important concepts along with python , pandas practical implementation which is crucial as a Data scientist.
Hello Satyajeet I have find so many video regarding EDA but no one explain it correctly. I was asking so many data scientist but they have not explain it properly. Very thank you for explaining it very smoothly.
You are just simply amazing...I didn't got bored while watching this entire video. I completed this video within 2 days. You can see how amazing your whole video is, looking forward to attend more videos from your channel👍
This video is fantastic! The concepts are explained so clearly, covering all the basic topics in such an easy-to-understand manner. I really enjoyed it and found it extremely helpful. Thank you for such a great lecture!
the video is insightful and learned a good number of topics. but a heads up to people looking into this video before checking it out please have insightful knowledge about SK Learn or Scikit library along with Matplotib and Seabon. overall, thank you for your efforts and contribution to the data community's betterment.
I am just 15 minutes in and many of the concepts have been cleared. In most of the courses they didn’t tell why we do EDA. Hats off to this instructor….
@@shivam586gupta i did'nt say that you can be a master in 15 minutes. I just said, after watching the first 15 minutes I got to know the main reasons of why we do EDA which is I think the most important thing. You will have to take the whole course to learn complete EDA.
@@fiza_Aslam I understood, phir bhi yrr 5 ghante😵💫, I even finish the movies within an hour😅. So, just wanted to know the content else profiling report is also an option. How u thought I can be a master?🤣🤣
Hi Sir, your teaching style is Excellent 👏. I would like to ask one question about the outliers in the 3-SIGMA method. At the end of the method, you mentioned that anything below 40 or above 80 is considered an outlier. How do you predict it? I don't understand.
@ 5:09:00 row should be deleted if it has insignificant number of missing values? is this right ? shouldn't a row be deleted if it has significant no of missing values?
Amazing I am also a Data Science Student was facing issues in EDA. But your videos just clear all my doubts. Thank you Very much. & One more thing any source to practice EDA by yourself?
It was amazing EDA video.. can you please suggest some great and unique data science project which i can build to create my POC for the resume selections.
There can be many project ideas, if you are looking for NLP: Resume parsers, text-to-speech, speec-to-text, text summarization, subtitle generation etc etc can be really good projects. Into DL, you can work on any image classification, voice classification or image captioning projects :)
Hi@@SatyajitPattnaik CustomerChurn.csv in this files also there are no null values and columns are different than ChurnModelling.csv please upload ChurnModelling.csv with null values. Thanks!
Can you please drop a csv file for handling data which is scratch one. I opened the git hub and loaded the file in pandas and it didnt have any missing valued columns
Very informative compilation of complete EDA process.Thank you. Please do upload data files & python file so that one can practise while watching videos.
"Churn" has multiple meanings depending on the context. Here are the two most common interpretations: 1. Customer Churn: In business, churn typically refers to the rate at which customers stop using a company's services or products. This can be measured in various ways, such as the percentage of customers who cancel subscriptions, close accounts, or stop making purchases within a specific period. A high churn rate can be alarming for a business, as it indicates lost revenue and potentially dissatisfied customers. Businesses often analyze churn data to identify reasons why customers are leaving and implement strategies to reduce churn and retain customers. 2. Employee Churn: Churn can also refer to the rate at which employees leave a company. This metric is often used to assess employee satisfaction and company culture. High employee churn can be costly for businesses due to training new employees and the loss of institutional knowledge.
Hi sir , where can i find similar data set on churning which is not used by much poeple. most of the people have use easily available dataset and to look different in this competition as a fresher i wanted to work on less used data sets
You won't find one :) If i send a dataset, 100s will see and use it :) Do your own research on this, or else you can use chatGPT to generate csv files for practice
Hi,Thanks for the video.can you tell me when should be actually remove the rows having null values in a column.or is it necessary thatnm we should always replace missing values with some values like avg or sd
Also iam working on www.kaggle.com/datasets/claytonmiller/cubems-smart-building-energy-and-iaq-data .but unable to find ourlt relation btwn temperature,relative humidity n light with energy consumption.if you hv worked on it can u pls provide some insights.Thanks in advance
In my opinion you shouldn't perform feature scaling during data cleaning because if you do that you will scale all the data and that will lead to data leakage leading to a poor model performance
@@mospher9253 I have just explained feature scaling, but ofcourse it is something that is performed before ML model building, its not suitable as a step while doing EDA
@@SatyajitPattnaik Actually I am practicing in Jupiter for Missing Value the very beginning. While I am saving the folder as trying to copy as path. Its shows me no file exists
Hi Satyajit, I have got an error while working with the dataset In Types of Analysis - numerical analysis while doing corr() for the dataset , i got error that " string can't convert into float ''. In that dataset the column -" Surname ". How to rectify it? Thank you ❤
Files, Codes & Data: github.com/pik1989/EDA (Don't forget to provide stars in the Github repo)
The churn_modelling file has no missing values in the codebase, can you please update the file.
the dataset u provided for handling missing values showing no null values
@@pdivyanshupandey104 you have to use CustomerChurn.csv file
@@SatyajitPattnaik All the csv file in the data folder has no missing values
I want to ask do i need to do any model in EDA Part Such as the Linear regression do i need that in the EDA part?
This is the only proper EDA video available on RUclips, very simple and clear explanation. Thank you for your hard work.
✌️
1:19:25 OUTLIER TREATMENT
1:28:51 OUTLIER TREATMENT PRACTICAL
1:42:40 INVALID DATA
1:47:35 TYPES OF DATA
1:58:38 7 TYPES OF ANALYSIS
2:02:26 BIVARIATE ANALYSIS
2:07:48 MULTIVARIATE ANALYSIS
2:09:20 NUMERICAL ANALYSIS
2:13:12 PRACTICALS
2:43:38 DERIVED METRICS
2:48:21 FEATURE BINNING THEORY
2:55:10 PRACTICALS
3:06:06 FEATURE ENCODING
3:16:03 PRACTICALS
3:37:44 CASESTUDY
Never saw a proper explanation of EDA on RUclips channel. Great content 👍 and thanks to share it.
*Time Stamps*
00:02:36 - *Agenda Overview*
00:05:14 - *EDA, DA, and DS Process*
00:11:55 - *Introduction to EDA*
00:15:10 - *Data Visualization*
00:20:05 - *Steps in EDA*
00:24:47 - *Data Cleaning*
00:29:05 - *Handling Missing Values (Theory)*
00:35:37 - *Handling Missing Values (Practical)*
00:47:19 - *Feature Scaling Overview*
00:58:30 - *Standardization (Example)*
01:02:49 - *Normalization (Example)*
01:05:25 - *Feature Scaling (Practicals)*
01:19:22 - *Outlier Treatment (Theory)*
01:28:50 - *Outlier Treatment (Practical)*
01:42:38 - *Handling Invalid Data*
01:47:31 - *Types of Data*
01:50:36 - *7 Types of Analysis*
02:02:24 - *Bivariate Analysis*
02:07:46 - *Multivariate Analysis*
02:09:13 - *Numerical Analysis*
02:13:17 - *Numerical Analysis (Practicals)*
02:43:37 - *Derived Metrics*
02:48:17 - *Feature Binning (Theory)*
03:06:03 - *Feature Encoding (Theory)*
03:16:00 - *Feature Encoding (Practicals)*
03:37:40 - *Case Study*
03:56:03 - *Data Exploration*
04:15:24 - *Data Cleaning*
04:25:17 - *Univariate Analysis*
04:39:40 - *Numerical Analysis*
04:54:59 - *Bivariate Analysis*
05:03:43 - *EDA Report*
Perfect bh chota word he is video k llie......Absoulterlly.......What to say bhai....Kmal video he. Structure behtreen rkha apne. Love from Pakistan
This gave me a Clear cut Understanding of EDA that many Paid courses failed to Deliver. Thanks a lot Satyajit sir. Lots of Love
@@Arunchangalva Glad you liked it 😀
Not sure why this video has so less views. This is one of the most comprehensive videos to learn a lot of important concepts along with python , pandas practical implementation which is crucial as a Data scientist.
Sincerely I must confess, this has to be the most explanatory video I’ve come across
No one can teach EDA better than youuu for sure 👍
🔥🔥
thank you satyajit.......The only video on You tube that explains EDA in depth. Thank you so much for your efforts!
So nice of you
This is the best video on EDA to date. So much depth and clarity. Thank you @Satyajit
Thanks, does that deserve a shoutout on Linkedin via a post 😝
Brother hats off to you man.... best video for EDA i would say.... all the very best inshallah you will get millions of subscribers soon
I wish God fulfils your request ✌️🤣
zaroor zaroor😂😇@@SatyajitPattnaik
Thanks for this video! It was really helpful in my learning process.
Great job! You give bigger picture and it is easier to understand this topic.
3:04:07 checkout bar chart. 0-20 group doesn't have 6146
Hello Satyajeet I have find so many video regarding EDA but no one explain it correctly. I was asking so many data scientist but they have not explain it properly. Very thank you for explaining it very smoothly.
Best ever EDA lecture in RUclips......Mark my word.
This video help me a lot. thank u so much for such deep explanation
Outstanding and Neat and Simple way of teaching with Details..Love From Pakistan....
Thank you so much! Really needed an in-depth, but concise overview of EDA and this was just the video :)). Much thanks.
Great content never seen such a content in youtube from anyone.
You are just simply amazing...I didn't got bored while watching this entire video. I completed this video within 2 days. You can see how amazing your whole video is, looking forward to attend more videos from your channel👍
Sooner you will get the recognition for you talent and effective teaching and presenting skills - sooner this channel gets 500k subs for sure .
☺️
Thanks men this one shot is bang........This video cleared all my EDA concept
This video is fantastic! The concepts are explained so clearly, covering all the basic topics in such an easy-to-understand manner. I really enjoyed it and found it extremely helpful. Thank you for such a great lecture!
Welcome 💪
The video is posted one year ago but it is still totally worth watching..
Greate work, thank you very much for simple and clear explanation.
Watched full video. What a work mahnn!!!.
Thanks 🙏
the video is insightful and learned a good number of topics. but a heads up to people looking into this video before checking it out please have insightful knowledge about SK Learn or Scikit library along with Matplotib and Seabon. overall, thank you for your efforts and contribution to the data community's betterment.
The best ever EDA video that i have seen inthe you tube.. thanks alot sir.. really appreciate your hardwork...🫡
NEAT CLEAN SIMPLE UNDERSTANDING OF EDA
trust me guys this video is the best way to learn about EDA. dont forget to take notes .🤩
💪
Thanks for this video. Please try to do entire video on excel, sql separately and a video on inferential statistics and hypothesis testing
Thanks, Satyajit. The video has been a blessing to me. It is so practical and beginner-friendly
Glad to hear that
Really awesome content as well explanation
very informative.... thank you sir!.....👌👍
Succinct and very helpful. Please be regular. I would like to see you cover all the topics to become a data Scientist.
write
very simple and clear explanation. Thank you
Thank you so much for your hard work 💗
I am just 15 minutes in and many of the concepts have been cleared. In most of the courses they didn’t tell why we do EDA. Hats off to this instructor….
Thanks 💪
@@SatyajitPattnaikmy pleasure
@@fiza_Aslam Is the initial 15-20 minutes enough to do the EDA, I'm too impatient, can't complete this 5 hour video😢
@@shivam586gupta i did'nt say that you can be a master in 15 minutes. I just said, after watching the first 15 minutes I got to know the main reasons of why we do EDA which is I think the most important thing. You will have to take the whole course to learn complete EDA.
@@fiza_Aslam I understood, phir bhi yrr 5 ghante😵💫, I even finish the movies within an hour😅.
So, just wanted to know the content else profiling report is also an option.
How u thought I can be a master?🤣🤣
Very informative and helpful
Awesome lecture on point with examples too
I am glad you liked 😀
big fan sir 😍
Amazing work done .
thnks buddy , very informative ,👌👌
Very informative video about EDA. Looking forward to the dataset.
Very well explained. Thanks
Bhai bahut badhia😊
Want EDA on Bank Financial Loan project.. pls upload this type EDA .. thank you ❤❤❤❤.. and thank you for this video❤❤❤
Thank you so much, thanks a lotttt🎉
You're welcome 😊
Gem for datascience Aspirant
Stop using this fucked up word aspirant for everything
Thank you very much sir 🎉
Thank You Sir❣
nice video ... keep doing these kinds of video
Awesome ❤
Great effort sir❤
Hi Sir, your teaching style is Excellent 👏. I would like to ask one question about the outliers in the 3-SIGMA method. At the end of the method, you mentioned that anything below 40 or above 80 is considered an outlier. How do you predict it? I don't understand.
Very informative ✨
Glad it was helpful!
Very insightful ! Thank you.
yes its true
i was very help full
Love from odisha 🎉❤
Can u plz tell me did u access the code ?
5:03:00 done ✔️
Thankyou very much sir
@ 5:09:00 row should be deleted if it has insignificant number of missing values? is this right ? shouldn't a row be deleted if it has significant no of missing values?
Sir there are no missing values for Gender and Age in the churn modeling file. can u provide the updated file
Use the Data/CustomerChurn.csv file in the Github repository
1:38 hr - anything below 40 and greater than 80 considered as outlier- what is 40 and 80 here ? are these the length found from the anomaly function?
Based on the +- 3 standard deviation, we wrote that anything beyond 40 and 80 are outliers
@satyajitPattnaik please give dataset which has the missing values as neither churn_modelling csv nor customershurn has a missing value
Amazing I am also a Data Science Student was facing issues in EDA. But your videos just clear all my doubts.
Thank you Very much.
& One more thing any source to practice EDA by yourself?
😀
There are already few EDA projects on my channel, else just search EDA datasets on google, you will find tons of Kaggle repos
3.04.27 these error in x axis please tell how to resolve should we change the labels According to value count or how ??
1:38:09 what is 40 and 80? Where did these numbers come from? You did not explain this.
2 months and no reply. Yet Satyajit has time to heart the more recent comments.
for that column, those are upper and lower limits for outliers
which dataset is to be used for handling null values part? the dataset you provide doesn't have any null value in them
@@johnxina7496 let me cross check again
great content. I am facing one issue, uniplot func is showing same data for all columns, I appreciate if anyone help
It was amazing EDA video.. can you please suggest some great and unique data science project which i can build to create my POC for the resume selections.
hi anurag , i was too looking for the same. do you have any. wanna disscus?
There can be many project ideas, if you are looking for NLP: Resume parsers, text-to-speech, speec-to-text, text summarization, subtitle generation etc etc can be really good projects.
Into DL, you can work on any image classification, voice classification or image captioning projects :)
Hi @SatyajitPattnaik the ChurnModelling.csv dataset you provided on your github has no null values please upload corrected Dataset.
Use the Data/CustomerChurn.csv file in the Github repository
Hi@@SatyajitPattnaik CustomerChurn.csv in this files also there are no null values and columns are different than ChurnModelling.csv please upload ChurnModelling.csv with null values. Thanks!
@@AmbarGharat Its the same dataset, if u feel its not, just open and create some null values and practice
Why are we using log scale at 1:36:30? pls explain
Can you please drop a csv file for handling data which is scratch one.
I opened the git hub and loaded the file in pandas and it didnt have any missing valued columns
Very informative compilation of complete EDA process.Thank you. Please do upload data files & python file so that one can practise while watching videos.
200 likes and 50 comments, just 13 comments away 😀
Just started watching this video. How do I perform EDA in supply chain data set? Any resources?
@@055srinivaspatnaik2 go through this video and you can get some ideas
Sir, i am get an error like File not found after running the Data set. Though i save the file and copy as path.
In the feature scaling part, the standard deviation is 7874.6 for the income and if its 9643 .65 . How to get the number
Where is the dataset having missing values for Churn Modelling? All the values in the given dataset are filled.
Thank you. I benefited very much from the video.
sir Do you have any courses on other topics that I can learn from?
Yes, i have bunch of courses, pls reach out to me over whatsapp: +91 8237040802
Sir do you have EDA and Data Visualization notebooks done for any other different dataset. If yes please provide the link.
at 1:38 hr what are 35 and 75 value. please explain
Is it 1:38 or 1:38:00?
@@SatyajitPattnaik 1:38:00
The video is perfect just remove the background music. The music is very distracting.
thanku so much for this video
Welcome :)
can i get the notes of this session
@@SatyajitPattnaik
@@rajaryan6792 check pinned comments, files and codes are given
sir at the timestamp 3:05:44 i saw that the graph is wrong the range of 0-20=87 but it display the 21-40 age count can u check it
@@SatyajitPattnaik
Thanks for your EDA videos. Where did you learn Data Analysis course? Can you suggest me a good online platform to learn EDA?
Yes, i have an end to end DA program, pls ping me on whatsapp: +918237040802
Seen EDA videos. Great. Do you have any online EDA course? Interested.
Yes, i have an end to end DA program, pls ping me on whatsapp: +918237040802
Can you keep the link of data which is before performing EDA?
I want to ask do i need to do any model in EDA Part Such as the Linear regression or KNN ,do i need that in the EDA part?
No
bro there are no missing values in neither of the three csv files you provided
"Churn" has multiple meanings depending on the context. Here are the two most common interpretations:
1. Customer Churn: In business, churn typically refers to the rate at which customers stop using a company's services or products. This can be measured in various ways, such as the percentage of customers who cancel subscriptions, close accounts, or stop making purchases within a specific period.
A high churn rate can be alarming for a business, as it indicates lost revenue and potentially dissatisfied customers. Businesses often analyze churn data to identify reasons why customers are leaving and implement strategies to reduce churn and retain customers.
2. Employee Churn: Churn can also refer to the rate at which employees leave a company. This metric is often used to assess employee satisfaction and company culture. High employee churn can be costly for businesses due to training new employees and the loss of institutional knowledge.
Both your interpretations has same meaning 😀
nice video
Hi sir , where can i find similar data set on churning which is not used by much poeple. most of the people have use easily available dataset and to look different in this competition as a fresher i wanted to work on less used data sets
You won't find one :)
If i send a dataset, 100s will see and use it :)
Do your own research on this, or else you can use chatGPT to generate csv files for practice
Bhai i am learning python excel sql power bi and Adf kindly help me to get a job in this fields, kindly give some suggestions.
Hi,Thanks for the video.can you tell me when should be actually remove the rows having null values in a column.or is it necessary thatnm we should always replace missing values with some values like avg or sd
Also iam working on www.kaggle.com/datasets/claytonmiller/cubems-smart-building-energy-and-iaq-data .but unable to find ourlt relation btwn temperature,relative humidity n light with energy consumption.if you hv worked on it can u pls provide some insights.Thanks in advance
38:57 i can't find any null values
to learn Time series analysis is important for data analyst?
@@aamirgaming4475 not mandatory
How to download the pdf ??
In my opinion you shouldn't perform feature scaling during data cleaning because if you do that you will scale all the data and that will lead to data leakage leading to a poor model performance
@@mospher9253 I have just explained feature scaling, but ofcourse it is something that is performed before ML model building, its not suitable as a step while doing EDA
Sir, I am not able to run the CSV After saved. It shows me error. Sir please help how to download on laptop and run the file to practice
@@sugandhakashyap9672 whats the error?
@@SatyajitPattnaik Actually I am practicing in Jupiter for Missing Value the very beginning. While I am saving the folder as trying to copy as path. Its shows me no file exists
Hi Satyajit,
I have got an error while working with the dataset
In Types of Analysis - numerical analysis while doing corr() for the dataset , i got error that " string can't convert into float ''.
In that dataset the column -" Surname ".
How to rectify it?
Thank you ❤
Drop that column before doing corr()
convert categorical columns to numerical