Hello Mo, I work in the same area as you, except for the airline industry. As I work alone, from home, with no collaborators or qc, your videos help me a lot. Both giving me new perspectives on how to aim the analysis and by showing me the areas where my approach to the data and thought process align with yours. Please keep this type of content coming. Thank you!
Awesome tutorial, I truly love data cleaning. Data quality check is crucial and in my opinion it should include not just missing values but all the areas you have covered (duplicates, outliers, distribution, data source and data types). You could have included checking for inconsistent values on text variables using value count (cardinality or categories/levelfactors). It is also fundamental to check the dataset shape or layout, if the dataset meets the tidy tabukar long format or shape (as per Hardley Wickman). The columns should store variables not values, the rows keep records. As with the tools, Power Query shines over Excel, with the Column Profile feature. Pandas is fantastic, specially if you define a function that picks all numeric columns and iterates over to output the summary statistics in just one go.
Hi thanks for the video, very informative . I would have loved if you went further on what you would’ve gone further to explain your next action after you do these data quality checks e.g remove duplicates, replace or remove nulls etc.
1. Context and Relevance
2. Data quality
3. Data structures and types
4. Outliers
5. Data distribution and summary statistics.
Please don't stop dropping video's like this
These videos are so simple, keep them coming.
Hello Mo, I work in the same area as you, except for the airline industry. As I work alone, from home, with no collaborators or qc, your videos help me a lot. Both giving me new perspectives on how to aim the analysis and by showing me the areas where my approach to the data and thought process align with yours. Please keep this type of content coming. Thank you!
Thanks for sharing!
I love this Mo! Very simple and understandable.
Glad it was helpful!
Awesome tutorial, I truly love data cleaning.
Data quality check is crucial and in my opinion it should include not just missing values but all the areas you have covered (duplicates, outliers, distribution, data source and data types).
You could have included checking for inconsistent values on text variables using value count (cardinality or categories/levelfactors). It is also fundamental to check the dataset shape or layout, if the dataset meets the tidy tabukar long format or shape (as per Hardley Wickman). The columns should store variables not values, the rows keep records.
As with the tools, Power Query shines over Excel, with the Column Profile feature.
Pandas is fantastic, specially if you define a function that picks all numeric columns and iterates over to output the summary statistics in just one go.
Thanks Mo, super helpful and to the point
Wonderful video, thank you Mo!
Hi thanks for the video, very informative . I would have loved if you went further on what you would’ve gone further to explain your next action after you do these data quality checks e.g remove duplicates, replace or remove nulls etc.
Please check out my portfolio project playlist where I dive into everything in a lot more detail :)
@ oh thank you so so much, I’ll definitely check it out
unsurprisingly lol I really enjoy this lol
You should make a video on the gears you use also
Haha maybe one day!
What video interface do you sir for your videos? They are amazing.
OBS
Chen, where I can find databases for practicing creating own projects with Excel or Power BI??
Kaggle is a great place to start! Or you can also check out my Ultimate Excel Projects at mochen.info/ where I givev you the dataset
Do you remove erroneous data?
You can. Whether or not you remove them will depend on the dataset you have.
now teach us how to use the AI