How to Clean Data Like a Pro: Pandas for Data Scientists and Analysts
HTML-код
- Опубликовано: 25 июл 2024
- In this video, we will explore data cleaning techniques in Python with Pandas specifically tailored for data scientists and analysts. Whether you are a beginner or an experienced professional, these techniques will help you streamline your data cleaning process and enhance the accuracy of your analysis.
📖CHAPTERS
00:00 Intro
00:39 Data Walkthrough
03:48 Dropping Data
07:19 Dropping Duplicates
09:30 Cleaning String Data
18:29 Imputing Numeric Data
26:29 Imputing Categorical Data
31:53 Key Principal in Data Cleaning
35:18 Outro and Thanks!
UP NEXT:
- More Advanced Data Cleaning: • Master Missing Data wi...
🔗LINKS
- Data on Github: github.com/trentpark8800/pyth...
💵AFFILIATE LINKS (HELP SUPPORT THE CHANNEL)
- O'Reilly Media (Books courses and more): oreillymedia.pxf.io/python-fo...
wow your are the real goat
the best video so far
please more video like this
This is the 1st video I watched that actually seeing the python libraries in action.
Thank you for this.
You're very welcome! I'm excited to hear about what you will build with them 🙂
Keep doing good work. Big help
I appreciate the kind words 🙏 thanks for the support!
What are some data cleaning techniques that you have used? 🤔
Just like Thor said: "Another"
Cool, thanks. Is Polars making much of an impact in your world? I've used it a bit and I think I prefer the more explicit syntax - besides the potential for enormous performance gains it brings.
Hi tmb8807 :) I have followed a couple of tutorials on polars, but never used it on anything in a professional setting as of yet 🤔
I'll test it out more extensively.
Any good tutorials you'd recommend?
Typically, when I've worked on projects that needed high performance I've used Apache Spark - but Polars could be a nice in-between pandas and spark?
Thanks for the support!
Hi ! Thanks for this video. I wanted to know, as a data scientist/analyst, why did you choose to use Jupyter and a .ipynb cleaning file ? Why not using pycharm and a .py for example ? Is that just a matter of personal preference ? Sorry I am new to python, proficient on Stata but trying to make a shift
Hi @kikiboy2545 🙂 thank you for your question.
TL; DR - I chose to use jupyter as it is easier for me to demo with and record the video with.
To your point on creating a .py file - I would recommend this if you are creating cleaning logic that is going to be re-used and shipped to 'production' as it is easier to test and maintain a straight Python script IMO.
That being said, there is increasing support for the use of notebooks as the preferred environment - as examples, Snowflake, Databricks, Azure Synapse and more all support the use of re-useable notebooks to contain all of your logic. I've worked in teams where notebooks are preferred for all data pipeline code due to how intuitive and approachable they are - but as I say my personal preference is: use notebooks for exploration, and .py scripts for your production code 🙂
No need to apologize! I am glad to be part of your learning journey - keep pushing man! 😎
You misspelled Tidyverse 😮
🤣