How to Clean Data Like a Pro: Pandas for Data Scientists and Analysts

Поделиться
HTML-код
  • Опубликовано: 25 июл 2024
  • In this video, we will explore data cleaning techniques in Python with Pandas specifically tailored for data scientists and analysts. Whether you are a beginner or an experienced professional, these techniques will help you streamline your data cleaning process and enhance the accuracy of your analysis.
    📖CHAPTERS
    00:00 Intro
    00:39 Data Walkthrough
    03:48 Dropping Data
    07:19 Dropping Duplicates
    09:30 Cleaning String Data
    18:29 Imputing Numeric Data
    26:29 Imputing Categorical Data
    31:53 Key Principal in Data Cleaning
    35:18 Outro and Thanks!
    UP NEXT:
    - More Advanced Data Cleaning: • Master Missing Data wi...
    🔗LINKS
    - Data on Github: github.com/trentpark8800/pyth...
    💵AFFILIATE LINKS (HELP SUPPORT THE CHANNEL)
    - O'Reilly Media (Books courses and more): oreillymedia.pxf.io/python-fo...

Комментарии • 13

  • @ChukwuemekaAmblessedchinenye
    @ChukwuemekaAmblessedchinenye 2 дня назад

    wow your are the real goat
    the best video so far
    please more video like this

  • @israsuazo3345
    @israsuazo3345 16 дней назад

    This is the 1st video I watched that actually seeing the python libraries in action.
    Thank you for this.

    • @trentdoesmath
      @trentdoesmath  15 дней назад

      You're very welcome! I'm excited to hear about what you will build with them 🙂

  • @LivingG6170
    @LivingG6170 16 дней назад

    Keep doing good work. Big help

    • @trentdoesmath
      @trentdoesmath  15 дней назад

      I appreciate the kind words 🙏 thanks for the support!

  • @trentdoesmath
    @trentdoesmath  18 дней назад

    What are some data cleaning techniques that you have used? 🤔

  • @totoarifiyanto8679
    @totoarifiyanto8679 7 дней назад

    Just like Thor said: "Another"

  • @tmb8807
    @tmb8807 17 часов назад

    Cool, thanks. Is Polars making much of an impact in your world? I've used it a bit and I think I prefer the more explicit syntax - besides the potential for enormous performance gains it brings.

    • @trentdoesmath
      @trentdoesmath  14 часов назад

      Hi tmb8807 :) I have followed a couple of tutorials on polars, but never used it on anything in a professional setting as of yet 🤔
      I'll test it out more extensively.
      Any good tutorials you'd recommend?
      Typically, when I've worked on projects that needed high performance I've used Apache Spark - but Polars could be a nice in-between pandas and spark?
      Thanks for the support!

  • @kikiboy2545
    @kikiboy2545 16 дней назад

    Hi ! Thanks for this video. I wanted to know, as a data scientist/analyst, why did you choose to use Jupyter and a .ipynb cleaning file ? Why not using pycharm and a .py for example ? Is that just a matter of personal preference ? Sorry I am new to python, proficient on Stata but trying to make a shift

    • @trentdoesmath
      @trentdoesmath  15 дней назад +1

      Hi @kikiboy2545 🙂 thank you for your question.
      TL; DR - I chose to use jupyter as it is easier for me to demo with and record the video with.
      To your point on creating a .py file - I would recommend this if you are creating cleaning logic that is going to be re-used and shipped to 'production' as it is easier to test and maintain a straight Python script IMO.
      That being said, there is increasing support for the use of notebooks as the preferred environment - as examples, Snowflake, Databricks, Azure Synapse and more all support the use of re-useable notebooks to contain all of your logic. I've worked in teams where notebooks are preferred for all data pipeline code due to how intuitive and approachable they are - but as I say my personal preference is: use notebooks for exploration, and .py scripts for your production code 🙂
      No need to apologize! I am glad to be part of your learning journey - keep pushing man! 😎

  • @CaribouDataScience
    @CaribouDataScience День назад

    You misspelled Tidyverse 😮