Very well explanation Aman, Thanks! I have some doubts, could you please explain me. 1. Why the data has missing values & outliers? What are the possibilities that occurring these in that data? 2. So, whichever outliers presence in the data, will always replace with 99 percentile only? 3. In formula of finding outliers , why the value (1.5) we are using, why don't the other value?
+ramesh mamilla Hi Ramesh, all these are good questions. Thanks for asking. 1. There might be n numbers of reasons for missing data and outliers etc. to give some example lets say you capture data from some sensor or IOT device. In these cases device might not work properly at times and hence data captured can b missing or wrong. Other example would be no entry in system itself for example when u r given a form to fill , you only fill mandatory fields:) 2. It can be 95 and 5 or other values as well depends on distribution etc 3. 1.5 is a defined boundary given by statisticians and is widely accepted as defining boundaries for outlier detection.
Sure, Please visit playlist tab on my channel and you will find video playlist under various topics. Let me know if you are looking for anything more particular. Thank you.
Hi Sir, Firstly, You are doing a great job in creating and delivering apt content. I am definitely going to recommend this channel to my peers. Few queries:- 1.Is it always that Nan is replaced by median values? If not , please discuss different scenarios and how to choose best way to treat NA. 2.Discovering an outlier just by eyeballing the boxplot and then following the process of finding Outlier boundaries seems an unreliable approach. So, shouldn't we just blindly pick a predictor column, apply the whole process of finding UL &LL outlies and replace with 99/1st quantile. Because, anyway there is no harm in that. If there will be an outlier it will get replaced or if not then data will remain unchanged. 3. Why didn't we check for outliers in Income and Expense Column? Because I plotted the boxplot and found an outlier for that also. Kindly answers these so that I can scaffold my learning on Data Science. Thanks!
Love when you ask questions. Thanks a lot. 1.median is just one way there are many ways to impute missing value. Search for "missing value imputation unfold data science" On RUclips. 2.your approach may work however here for demonstration I plotted boxplot. 3.Just to keep video length short, idea was to give a approach We should check and do the necessary treatment.
please make a video on minimum minimum system requirements for implementing machine learning program. please include hardware configuration, operating system, visualisation software, best python framework and best python library etc. will be very grateful to you aman sir
Hi Aman, Thanks for the Video. Very well explained... 1.When we are saying 10,25,50 percentile.. of data how this is being calculated..are we taking the max age in case 125 and dividing into 100 parts and calculating percentile?
Hello, about data cleaning, we need to follow different steps for different type of data cleaning. I have explained some of these in my model training videos. Please watch my end to end implementation videos from my playlist.
Hello Sir, You have explained very well But i have a one doubt 1.Why you take 0.75 and 0.25 for quantile percentile ? Can we take other percentile instead of it?
No of hours spend by person X on Netflix every week since last one year 8, 9,7,10,8,7,9,10,8,0,8,20....................52 numbers here Here 0 is lower end outlier 20 is upper end outlier
Very well explanation Aman, Thanks! I have some doubts, could you please explain me.
1. Why the data has missing values & outliers? What are the possibilities that occurring these in that data?
2. So, whichever outliers presence in the data, will always replace with 99 percentile only?
3. In formula of finding outliers , why the value (1.5) we are using, why don't the other value?
+ramesh mamilla Hi Ramesh, all these are good questions. Thanks for asking.
1. There might be n numbers of reasons for missing data and outliers etc. to give some example lets say you capture data from some sensor or IOT device. In these cases device might not work properly at times and hence data captured can b missing or wrong. Other example would be no entry in system itself for example when u r given a form to fill , you only fill mandatory fields:)
2. It can be 95 and 5 or other values as well depends on distribution etc
3. 1.5 is a defined boundary given by statisticians and is widely accepted as defining boundaries for outlier detection.
Yeah agree Aman! Thank You to put your time on this.
Very nice and clear explanation on model building, Aman. Waiting for next consecutive video :-D Thank you. :)
+Mili Sneha Thank you Sneha. Yes next steps of model building and deployment is planned.
Thanks continues from my side.....
very good
Aman - I am eagerly looking forward to the questions you raised at the end of the video. Thanks!
+Abhishek Gautam Thanks Abhishek. Keep me posted with doubts as well.
@@UnfoldDataScience Absolutely, Aman.
Great explanation
Glad it was helpful Kartik.
Useful information through nice presentation...
+Gopi Kumar Thanks a lot Gopi :)
Good show and all the best Aman... awaiting for next video
+GOPI RAMAN KUMAR Thanks a lot. Yes next video is on the way :)
Useful Information. Thank you .
+Sudeep Labh Thanks for your feedback.
Could you please add all your videos to their respective playlists? That will help us better.. thanks for good work.. :)
Sure, Please visit playlist tab on my channel and you will find video playlist under various topics. Let me know if you are looking for anything more particular. Thank you.
@@UnfoldDataScience yes, but there are videos which are not linked to any of the playlists, so not sure about its order
Very useful video
+Sudeep Kumar Thanks Sudeep:)
Thanku soo much sir
Most welcome. stay safe. tc
Great content again! 👌👌
+Prerana Tiwary Thanks a lot for your great feedback always.
Great things Aman.
Please create the deployment also. Simply deployment. Small request
+Nikhil Reddy Hi Nikhil, yes next is model building steps and then deployment of the same model. Stay tuned.
Hi Aman ...at 6:43 ...you said distance in age value between 75 and 100 percentile is high ....isn’t same true with expense which is also double ?
Hi Kirti, good observation, yes if it double then it is relatively more. We should also look at other percentiles. For example - 25th, 50th etc.
It is understandable easily,and Can I know what are the libraries of python we are mostly using in machine learning like numpy
sure you will use packages like, pandas, numpy, matplptlib, seaborn, tensorflow, pytorch, sklearn, scipy etc
It's an excellent lecture. Can you give a tutorial on geographically weighted regression model implementation in python? It will be a great lecture
Sure. Thanks.
Hi Sir,
Firstly, You are doing a great job in creating and delivering apt content. I am definitely going to recommend this channel to my peers.
Few queries:-
1.Is it always that Nan is replaced by median values? If not , please discuss different scenarios and how to choose best way to treat NA.
2.Discovering an outlier just by eyeballing the boxplot and then following the process of finding Outlier boundaries seems an unreliable approach. So, shouldn't we just blindly pick a predictor column, apply the whole process of finding UL &LL outlies and replace with 99/1st quantile. Because, anyway there is no harm in that. If there will be an outlier it will get replaced or if not then data will remain unchanged.
3. Why didn't we check for outliers in Income and Expense Column? Because I plotted the boxplot and found an outlier for that also.
Kindly answers these so that I can scaffold my learning on Data Science. Thanks!
Love when you ask questions. Thanks a lot.
1.median is just one way there are many ways to impute missing value. Search for "missing value imputation unfold data science" On RUclips.
2.your approach may work however here for demonstration I plotted boxplot.
3.Just to keep video length short, idea was to give a approach
We should check and do the necessary treatment.
Thank you Sir for replying.
1.ok
2.OKay, so means we should make it a practice of outlier treatment for every predictor?
3. Ok
Thanks again!
very well Explained sir thanks. Any examples of problem to find out the problem is Classification or Regression?
Thanks Sumeet.
please make a video on minimum minimum system requirements for implementing machine learning program.
please include hardware configuration, operating system, visualisation software, best python framework and best python library etc.
will be very grateful to you aman sir
Noted, thanks for suggestion.
Nice explanation aman.....
+sadhna rai Thanks a lot for your feedback.
Hi Aman, Thanks for the Video. Very well explained...
1.When we are saying 10,25,50 percentile.. of data how this is being calculated..are we taking the max age in case 125 and dividing into 100 parts and calculating percentile?
Got it.. Thanks..
Thanks Aman for such very useful video it is realy very nice to understand.
Can we have some function also for cleaning the data.
Hello, about data cleaning, we need to follow different steps for different type of data cleaning. I have explained some of these in my model training videos. Please watch my end to end implementation videos from my playlist.
Hello sir, how to fill 'missing values' if column contains string datatype. how we can apply median ?
you can change datatype to float and find median,
@@UnfoldDataScience okay Sir Got it! thanks
Hello Aman. its a great video explanation. i got clarified so many doubts.
It could be great if you can share this code here.
Thanks Ganga.
drive.google.com/drive/folders/1XdPbyAc9iWml0fPPNX91Yq3BRwkZAG2M
Nice video. Could you please explain what determine if we should chose 99 or 95 percentile to remove outliers? Any examples?
Thanks Richa, there is no hard rule for it. Depends on case by case what works well for your model.
Are we getting all algos of supervised and unsupervised learning??
Yes. I am creating everything sequentially.
Hello Sir, You have explained very well
But i have a one doubt
1.Why you take 0.75 and 0.25 for quantile percentile ? Can we take other percentile instead of it?
No we cant. That is how quantiles are defined.
Why is the count 14 ? Could you please explain?
Shouldn't you replace the outliers with the IQR upper and lower limit?
Should as a standard ML practice, here probably i missed as I wanted to show more things in python in limited time
Unable to find path to my desktop
Please post the link of the next part of this video in reply section, I am not getting it. Thank you
ruclips.net/video/8PFt4Jin7B0/видео.html.
Hi Karishma, you can go to playlist section and start watching.
please share the link of next video after this video! I am Unable to find it sir
pls check here:
drive.google.com/drive/folders/1XdPbyAc9iWml0fPPNX91Yq3BRwkZAG2M
@@UnfoldDataScience thanks a ton!
sir i didnt understand the "finding and treating outliers- both upper and lower end" wala part!! :(
No of hours spend by person X on Netflix every week since last one year
8, 9,7,10,8,7,9,10,8,0,8,20....................52 numbers here
Here 0 is lower end outlier
20 is upper end outlier
@@UnfoldDataScience oh got it thanks
Python Code and data set?
Can u share the code too?
In GitHub
why 0.25 and 0.75 for IQR?, not anything else.
These number have been defined by statistician.
can please keep the code.
drive.google.com/drive/folders/1XdPbyAc9iWml0fPPNX91Yq3BRwkZAG2M