How do I handle missing values in pandas?
HTML-код
- Опубликовано: 11 сен 2024
- Most datasets contain "missing values", meaning that the data is incomplete. Deciding how to handle missing values can be challenging! In this video, I'll cover all of the basics: how missing values are represented in pandas, how to locate them, and options for how to drop them or fill them in.
SUBSCRIBE to learn data science with Python:
www.youtube.co...
JOIN the "Data School Insiders" community and receive exclusive rewards:
/ dataschool
== RESOURCES ==
GitHub repository for the series: github.com/jus...
"read_csv" documentation: pandas.pydata.o...
"isnull" documentation: pandas.pydata.o...
"notnull" documentation: pandas.pydata.o...
"dropna" documentation: pandas.pydata.o...
"value_counts" documentation: pandas.pydata.o...
"fillna" documentation: pandas.pydata.o...
Working with missing data: pandas.pydata.o...
== LET'S CONNECT! ==
Newsletter: www.dataschool...
Twitter: / justmarkham
Facebook: / datascienceschool
LinkedIn: / justmarkham
In pandas version 0.21 (released October 2017), they added 'isna' and 'notna' as aliases for 'isnull' and 'notnull'. Learn more in my latest video, "5 new changes in pandas you need to know about": ruclips.net/video/te5JrSCW-LY/видео.html
Even in the final of 2019 your material form 2016 still gives incredible help.
I have certainty the DataSchool will keep been a success and helping people.
Excellent job Kevin Markham. Thanks.
Why after replacing na with *various* the count is different .
Coubts of various must be equal to na values earlier which was 2644
@@Taranggpt6 hi, that's because there is already a category named "VARIOUS" in the dataset, so the new filled in data gets added up to the existing count of "VARIOUS".
can we get a video on how to handle missing values for data time related datasets. may be sensor values or any sensitive values. multiple varieties of handling missing value would be very useful.
@@bragattemas I must say even in 2021 it is still completely up to date
I like his way of teaching, he doesn't assume that the audience knows by default. He breaks down the explanation piece by piece, it is a great learning experience, concise and clear stated lectures as always! Thanks!
You're very welcome! Thanks for your kind words!
@@dataschool first time watch,,,,those positves comments are true,,,,,thanks a lot
I never leave this place unsatisfied or without answers, total treasure.
Thank you so much!
Now i am in love with Pandas just by seeing a couple of your videos, Shukran Jazeelan !
That's awesome! Thanks for sharing!
6 years has gone released this video and i m watching it now and it still made me fall in love with the series ... beautifully explained every concept in detail.
Thank you so much! 🙏
Awesome- you are gifted.... -- your explanation and content are clean and effective.
Thanks very much for your kind words!
The most amazing python tutorial I've watched so far. Fell in love with python.
I have been watching Kevin Videos, needless to say he is an Awesome Instructor. His explanation in all of his videos is Conceptual, In-depth and breaking down any complex topic into the easiest way.
Thanks Kevin for your great Work!!!
It would be great if you could make videos on visualization using Matplotlib & Seaborn.
Thanks for your kind words, and for your suggestions! :)
You saved my life twice today, your videos are great and the way you explain is really good. Thank you!
Thank you!
This is actually the most clearly explained video on DataFrames that I have ever come across. Glad I found you. Thank you so much.
Glad it was helpful!
Great videos covering the basics. I enjoy how the additional values within the functions are covered, i.e. axis, etc.
Glad it was helpful!
You are doing an excellent job. You are called to do this for sure. Cheers
Wow, thank you so much for your comment! I really appreciate it.
I am really loving your videos. Explored your channel just 2 days back!! Earlier I had no idea about pandas but after watching your video, I feel that I will be able to work on my assignment. Great Work! Thank you!
Great videos. I love how all the CSVs are available online.
Thanks! 😄
I cannot tell you how much you have helped me, with all sorts of problems! You have the clearest way of explaining things, thank you so much!
You're so very welcome, thanks for your kind words! 🙏
u are superb.i took a paid course but they were not able to make me explain these things as u explained me in such a easy way.thnx a lot.
You are very welcome! Thanks so much for your kind comment!
Your way of teaching makes learning Data Analysis very interesting to me. I really appreciate and wish you success.
Thank you!
Thanks for your videos. most of the python online course i took... i just couldn't get into. Something about your cadence, data sets, and or approach just clicks with me. Thanks for the content.
That's awesome! Thanks so much for sharing!
This was awesome. Clear, concise, incredibly easy to follow. Your explanations (and bonus) were exactly what I was looking for.
Excellent! I'm glad the video was helpful to you!
Exceptional would be a single word to describe your tutorial. Looking forward to binging on your videos lol. Thank you for such clear explanation.
Thank you! 🙏
Thanks so much for making this video. You spoke slowly, clearly and very concise. Other videos I have to rewind and watch over, but i dont have to do that here. Looking forward to watching other.
That's awesome to hear! Thanks for watching my videos 👍
Awesome as usual !
Thanks!
Good video. Learnt a lot in short and crisp way
Thanks!
Fantastic explanation , however at the end would be good to mention that there are more ways to fill with value_counts , eg. With the mean of all other values etc and not just merging null column with any other column. Cheers!
Thanks!
The best pandas tutorial ever. Hands down.
Wow! Thank you so much for your kind comment!
Really clear and amazing tutorial
Glad it was helpful!
The content you shared is Gold!!
Thanks!
i rarely leave youtube comment but thank you!! if it werent for your video i wouldn't understand how to do my assignment at all, you did a great job at explaining!
This yung gentleman is simply amazing.
Thank you! I'm actually 40 years old now 😊
and by the way i love the way you teach , its just perfect
Thank you!
Thanks a lot, your course is really helpful and very detailed. You are a great teacher!
Thank you!
you are a life saver man...... i was fucked up with errors for only 2 missing values in a row of 1000 data
Glad to hear I could be of help!
great video series, I always fall back here whenever I'm stuck..Thanx for making them so informative...cheers
Thanks very much for your kind words!
Excellent video Data School, very helpful, your explanations are clear and objective. Thank you !
Really helpful. This means if one needs to figure out number of rows with 1 or more Null values, the code should look like dataframe[dataframe.isnull().sum(axis=1) > 0].
Amazing, clear, precise and I got it working as well :)
Great to hear!
Thanks for all your well-made videos! I got to know you and your classes from Datacamp. As a beginner in the ML field, please allow me to ask a silly question. So if we have categorical features with missing values, do we need to handle missing values first then do categorical feature transformation using encoders? Or the order doesn't matter? Thanks!
Great question! Previous to scikit-learn 0.24, missing values need to be handled first if you are going to one-hot encode them. Starting in 0.24, OneHotEncoder can handle missing values itself. Hope that helps!
This video is quite helpful and easy to understand. Thanks a lot!
You're welcome!
In the last part of the video, why the number of "VARIOUS" made by fillna doesn't match the previous NA number?
Great video and explanation as always!
Thanks!
Thank you so much. It is always clearer to listen to you!!
You are so welcome!
This is super helpful, thank you!!!!!
Glad it was helpful!
Great explanation. This was a huge help. Thanks so much!
You're very welcome!
Thank you , You made learning pandas a cake walk.
Awesome, that's great to hear!
Could you please make a video on how to handling missing values in multiple sheets in pandas? Or any recommendation source that I can read about it?
Thanks in advance
Thanks for your suggestion!
very nice tutorial, your style of teaching is awesome like an amazing opera singer
What a compliment, thanks! :)
Thank you for simple and detailed explanation including the use of features.
You're very welcome!
What about displaying the rows where columns 'A' and 'B' both of them have any missing values?
Doubt: Sir , If I want to assign Na to a value suppose 5.Means where ever 5 is present in a DataFrame it will be replaced by Na.then how I have to proceed????
Thanks
df.column_name.replace(5, np.nan, inplace = True)
check to make sure values are replaced with df.info()
Nice!
Thank you so much! you are amazing as always ! I really appreciate it ! Please don't stop making these videos !
Thank you!
How can we count each time we drop a row and not count the amount of NaN values?
Dude it's just an awesome video, forgive me for saying this turning playback speed 1.25 is felts more normal hahah .Love ya, appreciate for your effort about teaching piece by piece !!!!!
Thank you!
Thank you so much.Very useful. I have no wordings to appreciate you. I liked your way of teaching very much. You became my ideal in teaching.
You're very welcome! Thanks so much for your kind words!
I have 1 column with 100 rows. After dropping 4 rows with null values, new column has 96 rows. How to write a code that can tell me which 4 rows were dropped
I actually have question, I have a dataframe grouped by month and country. Some of those countries don't have a value for a certain month which is causing anomalies in the visualization. I want to generate a record for the month and the country with zero if no record is found, how can I achieve that?
Thanks in advance
all the content in the video are presented clear!!! thanks very much!! we love you!!
And we love you! 😉
You told how to handle NaN values but if there are some other values such as "Not Provided" then what to do?
How to ignore them?
Excellent answer! 👏
What inspires a down vote on any of these videos?? Always great content!
Thanks Paul! :)
Explanation techniques is great........want to thank you for sharing your knowledge......Grt videos
You are good. Your explanation really made it simple.
Thanks!
What a beautiful video and such great explanation. Beautiful. Keep it up
Thank you! 🙏
Truly amazing videos. Can you do a series on Matplotlib and Seaborn
Thanks for your kind words and suggestion!
Hi let's say I accidentally changed the value like the one I line 19 where NAN is change to Various can I reverse the change?
No, changes made through assignment (or inplace operations) are permanent!
great tutor...great way of making us understand.... so easy and intuitive
Thanks!
Great video! I learned a lot! I just wished you talked about non discrete values as well. I'm having some trouble to replace missing numerical data and I don't want to replace them with zero because that would make my dataset biased. My goal was to replace that missing data with the mean of the data that I have.
The only problem is that I don't know how to do that (yet).
Glad you liked the videos! You can do something like this: df.fillna(df.mean())
thanks for Nice lecture we love ur sir
You're welcome!
Hi when I change na to nan in my data frame ...all I refers become floats...
N first there was no bulk values showed ..but now there are null values
In Some Scenarios instead of NaN, will be having Zero, How do you handle those or how you will count number of Zeros
df.column_name.replace(0, np.nan, inplace = True)
Thanks for sharing!
Thank you for the clear and quick explanation. Very helpful !!
You're welcome!
Hi Kevin!
How can I fill na based on a condition? Say I want to fill NA for all missing cities, but only if the color is red.
Great question! ufo.loc[(ufo.City.isnull()) & (ufo['Colors Reported']=='RED'), 'City'] = 'New value'
Man thx btw😁
In this Tutorial for finding the missing city name we used syntax ufo[ufo.City.isnull()] but what if i have to find the missing "Shape Reported" the syntax ufo[ufo.Shape Reported.isnull()] is not working? how to specify the space?
You are simply awesome :) .thank you for making such wonderful videos
That's so nice of you to say - thank you!
Sir what you told in the video is applicable only for the numbers and what should be done for the string values?
If you like, you can impute missing string values with the most common values using scikit-learn's SimpleImputer.
ROHIT SINGHAL Hi programmer🔌🤩 pleaaaase see my channel🌹
Superb video! Thanks a lot it helps alot !
Glad it helped!
Markham, make America Great Again...You're the Boss..
Ha! Thanks very much :)
I used drop command to drop a col which has 10,000 same entries out of 50,000 but it is deleting all row when i use df.dropna(how='any').shape what i do?
Awesome, simple and straight to point with code, what i have been looking for weeks. Thank you so much. Do you have any video where you have used NSL KDD or KDD 99 data set to demonstrate data pre-processing as this is driving me naught.
I'm sorry, I don't have a video like that... good luck!
Can you please post any complete project from scratch including pandas , matpootlib, scikitlean, seaborn ?
I thought we said that no values were missing from the City or Shape reported columns. Why do we see rows dropped at @11:07?
Values are missing from both the City and Shape Reported columns.
thank you so much the explanation is very clear
Glad it was helpful!
Do you handle missing data before splitting the data set (training set and test set) ?
Fantastic video Sir.Your work is really commendable.It would be great if you can make a video about imputing the missing values in Python.
Thanks for your suggestion, and your kind comments!
That's exactly what I was looking for!
Great to hear!
Very useful and easily explained.
Thanks!
Thank you!!!! All the good stuff, all in the same place...love it!
Great! :)
hey , how can we replace a NaN value with the previous value in a database like on ufos (shapes ) instead of various you place maybe rectangle shape if it was before the NaN value
Great video btw. Just a quick question. I am trying to build a benchmark, would it be okay to make the data standardized before creating it or?
Hi, how you can handle the ValueError: arrays must all be same length ? when df.transpone() is not an option?
As always, your videos are very helpful!
Thanks very much for your kind words!
what happens if you have missing values while training the model, e.g. xgboost?
Thank you! Great instructions!
You're welcome!
Bro how to do data cleaning in pandas ? What are the methods used for it ? Please reply
Super. I understood everything. Nice explanation
Thanks!
I watched your first video, you are legend!
Ha! Thanks :)
Thank you so much for explaining in such a simple language! You are doing a great job. God bless you!
You are so welcome!
Lots of thanks from NEPAL✌✌✌
Most welcome 😊
Excellent! Still have one doubt: how do I creat a third column (dummy variable) based on others two columns (dummy variables), considering that they have missing values. I don´t want to lose information, in other words, I want to consider the pair (NaN, 1) or (0, NaN) as 1 or 0.
I think it would be great if you can make lecture about handling missing missing values for machine learning.
Thanks for your suggestion!
Dear Sir , Small question here , i want to replace "..." in specific column name "Energy supply" . What i am doing is
en1['Energy Supply']=en1['Energy Supply'].str.replace("...", "NaN")
what this does is it disturbs all other values that are correct into NaN
My objective here is to replace "..." to NaN
I would not advise using the text "NaN" to denote missing values. Rather, you should be setting those values to "nan" from the NumPy library. An example is shown in this video: ruclips.net/video/4R4WsDJ-KVc/видео.html
Does that help?
Hey, first of all thanks a lot for your videos!
One question regarding the fillna() method you use. At the end of the video, when you check the NAs in Shape Reported it said that there were 2644 NaN. However, when you use the fillna() method, it appears that there are 2977 VARIOUS. I dont understand why there are more VARIOUS than NaN?
Thanks in advance
Okay nvm, there was already a group called various with 333 observations