How I Approach New Datasets (5 THINGS TO LOOK OUT FOR)

Поделиться
HTML-код
  • Опубликовано: 27 янв 2025

Комментарии • 25

  • @goldenyv1
    @goldenyv1 2 месяца назад +12

    1. Context and Relevance
    2. Data quality
    3. Data structures and types
    4. Outliers
    5. Data distribution and summary statistics.

  • @loneboy2349
    @loneboy2349 2 месяца назад +7

    Please don't stop dropping video's like this

  • @carlpalacios
    @carlpalacios 2 месяца назад +4

    These videos are so simple, keep them coming.

  • @quirogan
    @quirogan Месяц назад

    Hello Mo, I work in the same area as you, except for the airline industry. As I work alone, from home, with no collaborators or qc, your videos help me a lot. Both giving me new perspectives on how to aim the analysis and by showing me the areas where my approach to the data and thought process align with yours. Please keep this type of content coming. Thank you!

    • @mo-chen
      @mo-chen  Месяц назад

      Thanks for sharing!

  • @akinsinasegun2379
    @akinsinasegun2379 2 месяца назад

    I love this Mo! Very simple and understandable.

    • @mo-chen
      @mo-chen  2 месяца назад

      Glad it was helpful!

  • @osoriomatucurane9511
    @osoriomatucurane9511 2 месяца назад +1

    Awesome tutorial, I truly love data cleaning.
    Data quality check is crucial and in my opinion it should include not just missing values but all the areas you have covered (duplicates, outliers, distribution, data source and data types).
    You could have included checking for inconsistent values on text variables using value count (cardinality or categories/levelfactors). It is also fundamental to check the dataset shape or layout, if the dataset meets the tidy tabukar long format or shape (as per Hardley Wickman). The columns should store variables not values, the rows keep records.
    As with the tools, Power Query shines over Excel, with the Column Profile feature.
    Pandas is fantastic, specially if you define a function that picks all numeric columns and iterates over to output the summary statistics in just one go.

  • @karenaWarner144
    @karenaWarner144 2 месяца назад +1

    Thanks Mo, super helpful and to the point

  • @nordoonAI
    @nordoonAI 2 месяца назад +1

    Wonderful video, thank you Mo!

  • @olubukola107
    @olubukola107 2 месяца назад +1

    Hi thanks for the video, very informative . I would have loved if you went further on what you would’ve gone further to explain your next action after you do these data quality checks e.g remove duplicates, replace or remove nulls etc.

    • @mo-chen
      @mo-chen  2 месяца назад

      Please check out my portfolio project playlist where I dive into everything in a lot more detail :)

    • @olubukola107
      @olubukola107 2 месяца назад

      @ oh thank you so so much, I’ll definitely check it out

  • @mstoneise9249
    @mstoneise9249 2 месяца назад

    unsurprisingly lol I really enjoy this lol

  • @rumeeburu3648
    @rumeeburu3648 2 месяца назад

    You should make a video on the gears you use also

    • @mo-chen
      @mo-chen  2 месяца назад

      Haha maybe one day!

  • @rumeeburu3648
    @rumeeburu3648 2 месяца назад +1

    What video interface do you sir for your videos? They are amazing.

  • @zALEXzRu
    @zALEXzRu 2 месяца назад

    Chen, where I can find databases for practicing creating own projects with Excel or Power BI??

    • @mo-chen
      @mo-chen  2 месяца назад

      Kaggle is a great place to start! Or you can also check out my Ultimate Excel Projects at mochen.info/ where I givev you the dataset

  • @Eatsomemore
    @Eatsomemore 2 месяца назад

    Do you remove erroneous data?

    • @mo-chen
      @mo-chen  2 месяца назад

      You can. Whether or not you remove them will depend on the dataset you have.

  • @chenjoya7792
    @chenjoya7792 2 месяца назад +2

    now teach us how to use the AI