How to Do Data Exploration (step-by-step tutorial on real-life dataset)

Поделиться
HTML-код
  • Опубликовано: 15 июл 2024
  • 🐼 All you need to know about Pandas in one place! Download my Pandas Cheat Sheet (free) - misraturp.gumroad.com/l/pandascs
    👇Learn how to complete your first real-world data science project
    Hands-on Data Science course
    www.misraturp.com/hods
    In this video we learn how to explore a real-life dataset from NYC using Python and Pandas. We will dive deep into the data and find out potential problems, issues to fix and extract insights from it.
    If you’d like to follow along, find the data here: data.cityofnewyork.us/Environ...
    NYC Open Data - opendata.cityofnewyork.us/
    00:00 Welcome
    00:34 Some notes on data exploration
    01:15 Dataset explanations
    05:26 First look into our dataset
    06:58 Understanding columns
    09:22 Filtering out the unnecessary columns
    11:45 Missing value check
    12:49 Numerical values check
    15:08 Outliers check
    19:57 Categorical values check
    24:57 Explore distribution of binary columns
    26:40 Summary
    👋 Keep in touch?
    ==========================
    🐥 Twitter - / misraturp
    🔗 LinkedIn - / misraturp
    📹 RUclips - / @misraturp
    🌎 Website - misraturp.com/
    Courses & resources
    ============================
    📙 Fundamentals of Deep Learning in 25 pages
    misraturp.gumroad.com/l/fdl
    👩‍💻 Hands-on Data Science: Complete your first portfolio project
    www.misraturp.com/hods
    📥 Streamlit template
    misraturp.gumroad.com/l/stemp
    🤖 Deep Learning 101 with Python and Keras (FREE)
    • 50 Days of Deep Learning
    🏃‍♀️ Data Science Kick-starter mini-course (FREE)
    misraturp.gumroad.com/l/kick-...
    🐼 Pandas cheat sheet (FREE)
    misraturp.gumroad.com/l/pandascs
    📝 NNs hyperparameters cheat sheet (FREE)
    misraturp.gumroad.com/l/hcs
  • КиноКино

Комментарии • 148

  • @prinjeshshah9128
    @prinjeshshah9128 Год назад +24

    I was searching for a video where someone explains their reasoning in data exploration from last 15 days. This video has now become bookmark for my future data exploration works. Thank you very much.

    • @misraturp
      @misraturp  Год назад +2

      That's so nice to hear! Keep up the good work Prinjesh!

  • @santdayalverma8255
    @santdayalverma8255 Год назад

    Hi Misra, I love your videos. The way you explain topics in a simple manner is really helpful. Thank you so much!

  • @SFW7
    @SFW7 Год назад +1

    Thank you so much for making this video! I'm just starting out with using python for data analysis and this video is so informative and inspiring. :)

  • @user-up8uc9zn4x
    @user-up8uc9zn4x 8 месяцев назад

    Honestly your the most honest and humble trader on RUclips!!

  • @stephanielynn9808
    @stephanielynn9808 Год назад +2

    Found you as I start to work on a project for my MS in analytics! This has definitely helped me make better progress than before.

  • @T4ngerin0
    @T4ngerin0 3 года назад

    It‘s a pleasure to watch your Videos. Thank you Misra! 🙏🏽

    • @misraturp
      @misraturp  3 года назад +1

      That's very nice to hear, thank you!

  • @lianyan1412
    @lianyan1412 Год назад +1

    Really like how you have explained the thought process when comes to data understand and exploration. Learnt a lot!

  • @djmd808
    @djmd808 Год назад +6

    Just wanted to let you know that your videos breathed new life into my Udacity Nanodegree experience. I have really been stalling on the last two classes, but this video and the cleaning one (along with others) were so much easier to watch and understand than the Udacity instruction provided in the course. Thank you!

    • @misraturp
      @misraturp  Год назад

      Wow, thank you. That's great to hear! Best of luck with your courses. :)

  • @siddhantnaik
    @siddhantnaik 2 года назад

    Very nicely done!
    Looking forward to more data cleaning videos.
    Keep the good work going on.
    Thanks!!

  • @smartlymade
    @smartlymade Год назад

    thank you for this video! i was very overwhelmed by a dataset that i was looking at and didn't know how to start. great video

  • @swadhikarc7858
    @swadhikarc7858 7 месяцев назад

    Very useful indeed.. learnt how to think like a data analyst as a beginner

  • @swapnildeshpande21
    @swapnildeshpande21 2 года назад

    Best explanation and video to kickstart thinking about datasets...want more quality video like this... Keep it up

  • @Ol16511
    @Ol16511 2 месяца назад

    Wonderful explanation, thank you!

  • @abhishekjadhav7340
    @abhishekjadhav7340 2 года назад

    Thank you Misra, I personally like your videos a lot. You really teach Good .

  • @jbruges
    @jbruges 2 года назад

    this video was very helpful, I'm going to watch all of them

  • @massoudkadivar8758
    @massoudkadivar8758 Год назад

    I love the way you teaching!
    Thank you

  • @leonbeler2711
    @leonbeler2711 2 года назад +1

    Thank you so much for making this video! I am just getting started in the field and your video has given me lots of tips and tricks that would've taken me months to figure out by myself. Also: yay for more women in Data-Science.

    • @misraturp
      @misraturp  2 года назад

      Thank you! And I'm glad it was helpful. :)

  • @aureliensimon8685
    @aureliensimon8685 9 месяцев назад

    Thank you so much for this great video !

  • @misraturp
    @misraturp  3 года назад +11

    👉 Get real world data science experience by doing hands-on work
    www.misraturp.com/hods

  • @gotitgotya
    @gotitgotya Год назад

    its very informative ... thank you so much for uploading this❣

  • @ExtraKanin
    @ExtraKanin Год назад +2

    Hi Misra. I'm 18 minutes into the video and I'm still able to follow. thank you for this! Most of the articles I see on Google just jump straight into data cleaning and I don't even know how they detected the data errors they're trying to clean in the first place.

  • @lara-rosetadman3499
    @lara-rosetadman3499 3 года назад +19

    Hi Misra, this video was so helpful! I'm starting my master's in Data Science this September and you've honestly been such a role model for me! Thank you and keep up the great work!

    • @misraturp
      @misraturp  2 года назад +1

      That's so nice to hear Lara-Rose, thank you! Best of luck in your master's. I'm sure you'll do great!

    • @zeinomadikizela4783
      @zeinomadikizela4783 2 года назад

      Which University

    • @lara-rosetadman3499
      @lara-rosetadman3499 2 года назад

      @@zeinomadikizela4783 University of the West of England (Bristol)

    • @Ha-mb4yy
      @Ha-mb4yy Год назад

      @@lara-rosetadman3499 im at cdf

  • @ramvenkatachalam8153
    @ramvenkatachalam8153 Год назад

    Wow . Great DataSet . just what i was looking for . I was searching for a video where someone explains all important things on datascience . This video has now become bookmark for my future data exploration works. Thank you very much Misra . ur channel is the best in the world.

    • @misraturp
      @misraturp  Год назад +2

      Thank you Ram! That's very nice to hear. :)

    • @ramvenkatachalam8153
      @ramvenkatachalam8153 Год назад +1

      Hi Misra . Plz put more videos like this to help everyone in their career growth . Thanks a lot .

  • @udaykumar4B3
    @udaykumar4B3 2 года назад

    Thank you so much 🙏 your video helped me a lot, keep doing this🤞

    • @misraturp
      @misraturp  2 года назад

      You're very welcome :)

  • @gunhild1951
    @gunhild1951 2 года назад +6

    Great video! Very informative and helpful 😃 I would love if you can upload more videos like this where you go through the steps of exploratory data analysis. I love that you really bring up every little thing of how it could be when doing EDA. Like for example where you talked about the data explanation and how it could be in the real life. Just the thing about documentation of the dataset fields and how it sometimes is not so obvious and that you therefore need to talk to someone responsible etcetera etcetera. This kind of information that you brought up helps me to picture how I could be. Maybe it's silly but a simple thing like helped me big time. Overall, a perfect how-to-do-video. More videos like this would be appreciated! 😄

    • @misraturp
      @misraturp  2 года назад

      This is super useful feedback for me Gunhild thank you! Personally, I also really like little details like that that can really give one a feeling for what to expect from a job so nice to hear you thought it was helpful. I will prioritize similar videos in the future!

    • @gunhild1951
      @gunhild1951 2 года назад

      ​@@misraturp Hi again,
      Thanks for replying Misra! :) Well of course, I have to show my appreciation when the content is so good. Exactly, it's the feeling and to be able to visualize things that helps the most when you're to trying to familiarize with a new subject matter. Thanks, that would be perfect if you could upload more :)

  • @erfanmoosavi9428
    @erfanmoosavi9428 Год назад

    Thank you so much Mısra! You explain so gooood!

  • @sergiopellitero4136
    @sergiopellitero4136 11 месяцев назад

    This is the kind of videos that I am looking for. Let's see if it is interesting.

  • @mehdismaeili3743
    @mehdismaeili3743 2 года назад

    Hi , you are a good teacher.thanks for your useful videos.

    • @misraturp
      @misraturp  2 года назад

      Thank you, that's great to hear!

  • @cloudnatives9105
    @cloudnatives9105 Год назад

    Excellent video.. thanks for sharing! Learned a lot on the data exploration.
    Ms. Misra.. Do you have a video on just plotting data while doing the exploration?

  • @Olumasei
    @Olumasei 2 года назад

    You teach better than my MSc lecturers.
    Thank you

    • @misraturp
      @misraturp  2 года назад

      You are very welcome Samuel. That's nice to hear that you like the videos. :)

  • @Labbsatr1
    @Labbsatr1 2 года назад

    Çok verimli bi videoydu mısra çok teşekkür ederiz

  • @mr.abu-baker8bp460
    @mr.abu-baker8bp460 2 года назад

    So much helpful , thankk alot!!

  • @felixisnr1
    @felixisnr1 2 года назад

    really helpful. big thanks!

    • @misraturp
      @misraturp  2 года назад

      You're welcome! Glad it helped. :)

  • @KeithEmsAndrew-pf2pn
    @KeithEmsAndrew-pf2pn 4 месяца назад

    thanks to your videos

  • @arpangoyal7337
    @arpangoyal7337 2 года назад

    Hi Misra, just came across your channel and absolutely loved this video, so crisp and informative!
    I have just 1 suggestion: it would be so much more helpful if the video was like "Raw" (maybe a separate, longer video which includes the difficulties you came across and how you solved those?). That said, subscribed to your channel and hoping to learn more about Analytics!

    • @misraturp
      @misraturp  2 года назад

      Hey Arpan, thank you for your nice words and also taking the time to give your feedback. :) That actually is a good idea. I might do a blind data exploration with a dataset I've never seen before soon!

  • @onurerdogan2236
    @onurerdogan2236 3 года назад

    Very useful video.Thanks for sharing

  • @dpratte
    @dpratte Год назад

    Very nice job! In my experience over many years in data warehousing, good luck finding the 'data dictionary' type document you describe. Most orgs don't have the discipline to maintain that. So you'll need to develop relationships with the people who can help you! Thanks Again!

  • @arnabroy4870
    @arnabroy4870 Год назад

    Mam, First And Foremost you are very Beautiful,and Secondly your tutorial is awesome, it gave me a lot of insights about how to do data cleaning..
    Thank You Mam,Lots of Love and Respect from India❤️

  • @pliniado
    @pliniado 2 года назад

    Thanks Misra, I'm Python student (intermediate level, I think) and this video was just what I was lookin for.

    • @misraturp
      @misraturp  2 года назад

      That's great to hear Pablo!

  • @jeevan1409
    @jeevan1409 Год назад

    You saved my project ❤️

  • @andrespino8552
    @andrespino8552 3 года назад

    Awesome. Thank you so much :)

    • @misraturp
      @misraturp  3 года назад

      You're very welcome!

  • @diegomartins7214
    @diegomartins7214 8 месяцев назад

    Thank you!

  • @pimpirisnais
    @pimpirisnais 2 года назад

    Thanks, very practical

    • @misraturp
      @misraturp  2 года назад

      You're very welcome!

  • @harikishan437
    @harikishan437 2 года назад

    First of all tq a lot , I took coaching in data science but i never had an xact idea what we need to do in pre-processing and what are things we need to see and what are enough for us.....after watching this i got the clarity on preprocessing , that what should i do in that step. Again tq a lot @Misra Turp 🤝🤝🤝🤝❣❣❣

    • @misraturp
      @misraturp  2 года назад

      You are very welcome Hari! I'm glad it was helpful!

  • @axelrasmussen5365
    @axelrasmussen5365 2 года назад

    Great video. Thanks

  • @Kristina_Tsoy
    @Kristina_Tsoy Год назад

    Great video, thank a lot!

  •  10 месяцев назад

    Hi Misra, thank for your helpful video, you're so good at Data but you may gain some about tree 😄

  • @govindant8360
    @govindant8360 3 года назад

    Awesome Mam 👌👌

  • @hrithiksingh5131
    @hrithiksingh5131 2 года назад

    thanks, great informative video. Please make videos on creating portfolio projects using python pandas,matplotlib

    • @misraturp
      @misraturp  2 года назад

      Great suggestion! If you're interested I have a course where I teach how to build a portfolio project: www.soyouwanttobeadatascientist.com/hods

  • @froylanrodriguez7624
    @froylanrodriguez7624 Год назад

    I am a year late but this video is GREAT!! Thanks a bunch

  • @arindammaji4685
    @arindammaji4685 Год назад

    a definite bookmark video for data analysts

  • @KapitanAliRidho
    @KapitanAliRidho 2 года назад

    Thanks for the video Misra. Done subs

  • @ueto1985
    @ueto1985 2 года назад

    It worked!
    Thank you..

    • @misraturp
      @misraturp  2 года назад

      You're very welcome :)

  • @rangabharath4253
    @rangabharath4253 3 года назад

    Awesome

  • @ziaurrahman626
    @ziaurrahman626 6 месяцев назад

    Thanks for ur nice explanation. Could u plz share dataset and code?

  • @aleh3627
    @aleh3627 7 месяцев назад

    We have veey wide trees in Argentina. Easily over 3 meters in diameter. They are not officially classified as trees, but the trunk is definitely of that diameter. The roots are also huge and tend to extend to the surface. Usually they become playgrounds for kids. I used to play around them as a kid all of the time.

    • @misraturp
      @misraturp  7 месяцев назад

      That's crazy. What are those trees called?

    • @aleh3627
      @aleh3627 7 месяцев назад

      @@misraturp we call them "gomero". Not sure about the proper name. If you Google "gomero argentina", you can find them.

  • @HarishKumar-qt3mr
    @HarishKumar-qt3mr Год назад

    23:49 to 23 : 51 just listen it really 😍

  • @superawesomecaptainmcfluff9506
    @superawesomecaptainmcfluff9506 2 года назад +1

    Actually a really funny video. At around 14:39, you said you didn't know what an inch was. I was amused and then I remembered that you were Dutch haha. Great video! Love it! P.S. Imperial units suck!

    • @misraturp
      @misraturp  2 года назад +1

      Have to say I agree. Can't really see any reason to use the imperial system. :D

    • @superawesomecaptainmcfluff9506
      @superawesomecaptainmcfluff9506 2 года назад

      @@misraturp Oh wow! Having you reply is so cool. I did a project on the NYC Collisions dataset using your Streamlit templates.
      Love it. Thanks for the help! I can't wait to see your channel grow to 100k+ subs especially with DS/ML being a rapidly growing field.
      I have one Q if you don't mind answering: Do you have any tips on making my GitHub profile more attractive to recruiters, and really making sure the projects done properly showcase my skills?

    • @misraturp
      @misraturp  2 года назад +1

      Hey ​@@superawesomecaptainmcfluff9506 ,
      Thanks a lot for your support!
      For the github account, I would make sure to include a bunch of things:
      * in the readme mention all libraries, programming language, technologies, ML algorithms you used for that project. recruiters are looking for keywords, give them as many keywords as you can.
      * Have a lessons learned, or future work kind of analysis of your work. Doesn't have to be long. This will serve to show that you are aware of the shortcomings and what can be done better in projects.
      * Make sure you have headings, comments and small notes that structure and explain your code.
      This should be a good start. Just dumping code in github unfortunately doesn't work. Data science is more about understanding and explaining your code than the code itself.
      This question inspired me to write this up a bit longer though. So I will go send my email subscribers an email about this now. :D

    • @superawesomecaptainmcfluff9506
      @superawesomecaptainmcfluff9506 2 года назад

      @@misraturp Wow, you've been so incredibly helpful! I follow your newsletter a lot so hope to see your email soon!
      Thanks for all the tips, I especially liked the one about "lessons learned" and improving upon that. Thanks again!

    • @misraturp
      @misraturp  2 года назад +1

      @@superawesomecaptainmcfluff9506 You are very welcome! And thank you for the question. I love your username by the way, is it okay if I mention it in the email?

  • @ziaurrahman626
    @ziaurrahman626 3 месяца назад

    Thank u so much. Plz share the code and dataset..

  • @shayp20
    @shayp20 Год назад

    Hi Misra. Thank you for this video. Is there a chance you can post the code you write in this video?

    • @misraturp
      @misraturp  Год назад

      I do not have the code I developed in this video yet but I might make a similar video again soon and share the code. Stay tuned!

  • @billionairepodcast
    @billionairepodcast 4 месяца назад

    hi @misraturp, why you said ID is categorical value ? Why you said it's continuous when it's a whole number, a discrete. thank you

  • @peaceandlove8862
    @peaceandlove8862 5 месяцев назад

    Can you please help how to get the data set thanks

  • @chauhermione137
    @chauhermione137 2 года назад

    thank you so much! Your video is really helpful

    • @misraturp
      @misraturp  2 года назад

      You are very welcome! 🤗

  • @CaribouDataScience
    @CaribouDataScience 2 года назад

    What playlist is this video part of?

    • @misraturp
      @misraturp  2 года назад

      Here it is: ruclips.net/video/qxpKCBV60U4/видео.html

  • @yunisahmed4772
    @yunisahmed4772 Год назад

    a good Teacher and a beautiful girl ! i love you

  • @marypazcuessy3004
    @marypazcuessy3004 Месяц назад

    tree_census_subset isn't loading after I remove all the unneeded columns, not sure what I'm doing wrong

  • @janewade5619
    @janewade5619 3 года назад

    Also for outliers, max sidewalk width in NYC is 30 ft (360 inches). So the width of a tree max would be a quarter of that (90 inches)?

    • @misraturp
      @misraturp  3 года назад

      Hey, I haven't even thought about including that information, that's a great idea! Kudos!

    • @csanchez9536
      @csanchez9536 2 года назад

      Actually the biggest known tree diameter is like 24 meters or something like that. 450 inches is very reasonable

  • @rajsonawane8607
    @rajsonawane8607 2 года назад

    Hi Misra, Thank you so much for this video. Can you please demonstrate using JSON instead of CSV from the same website?

    • @misraturp
      @misraturp  2 года назад

      Honestly, if I were working with JSON files, I would first make them into dataframes and then do the data exploration. So the approach would not change. :)

  • @paragandozdroch3791
    @paragandozdroch3791 Год назад

    Can someone please let me know , how the search bar drop down on 23:47 min? thank you

  • @nagasai5243
    @nagasai5243 10 месяцев назад

    Pls tell me how to unnest

  • @lubin2764
    @lubin2764 3 года назад

    Any idea what is om.datasets ?

    • @misraturp
      @misraturp  3 года назад

      Could you elaborate Lubin about what you mean by om.datasets?

  • @parshavkarani8331
    @parshavkarani8331 Год назад

    8:00

  • @jpgunlukleri
    @jpgunlukleri Год назад

    Türkçe videolar yüklediğiniz kanalımız da var mı????

    • @misraturp
      @misraturp  Год назад

      Simdilik yok ne yazik ki

  • @asjadnaeem2557
    @asjadnaeem2557 2 года назад

    Hey! Can i get a copy of code?

    • @misraturp
      @misraturp  2 года назад

      I don't have it uploaded anywhere unfortunately.

  • @ramsri992
    @ramsri992 10 месяцев назад

    You look so beautiful mam and nice explanation

  • @trend758
    @trend758 2 года назад

    How many of you watching this video for mentor in video 😂😂😂 I am watching for her.

  • @mahammad4051
    @mahammad4051 Год назад

    türkçe de gelsin

  • @travisfubu9053
    @travisfubu9053 2 года назад +1

    I kind like the way you explain thing in the tutorial but I think if you worked on income data or maybe some cancer research data would have been simple I mean trees is kinda not interesting enough to fully engage with the set lol

    • @misraturp
      @misraturp  2 года назад +1

      Fair enough. It's just sometimes a bit tricky to find datasets that allow one to work on it publicly like this. In the latest videos I've been using a dataset on open positions in New York. Maybe that'd be better suited.

    • @travisfubu9053
      @travisfubu9053 Год назад +1

      @@misraturp yes definitely. Maybe it's just me I'm an economics graduate so a set on trees kind made me like "meh why trees"

    • @misraturp
      @misraturp  Год назад

      @@travisfubu9053 Hahah alright, next time something a bit more real-life-like for you!

  • @ataires
    @ataires Год назад

    You are so pretty and so good at explaining the process. Thumbs up!

  • @lysedeborah2179
    @lysedeborah2179 11 месяцев назад

    You are so beautiful

  • @magicmedia7950
    @magicmedia7950 3 месяца назад

    Great video . But she moves too fast.!

  • @dimakapranov7634
    @dimakapranov7634 Год назад

    en.wikipedia.org/wiki/Sequoiadendron_giganteumac