Data Cleaning using Python pandas in Jupyter Notebook - How to clean CSV data in Jupyter Notebook?

Поделиться
HTML-код
  • Опубликовано: 8 сен 2024
  • This video answers the following questions;
    How to clean data in CSV using Python? How to clean data using Pandas?
    How to clean data using Python? How to clean data in Jupyter Notebook?
    The dataset used in this video can be accessed here: drive.google.c...
    Incase of enquiry, you can use the comment box.
    If you enjoy the video, kindly like and subscribe to the channel.

Комментарии • 95

  • @michaelchapisa3709
    @michaelchapisa3709 Месяц назад +1

    Your video was profound, the background noise was irritating at 1st😅😅but after some time i could only hear you and your coding, thank you so much, you have no idea how much you've helped me, very clear instructions 🤝

    • @sammysttheanalyst
      @sammysttheanalyst  Месяц назад

      You're welcome.😊
      Kindly subscribe to the channel, and turn on the notification button for more interesting videos.
      That's the way to encourage me to do more.

  • @fksons4161
    @fksons4161 Год назад +6

    I must commend you for your patience and style of teaching python from the foundation. I will share your link with a number of would be learners. Well done

  • @UzukwuMichael
    @UzukwuMichael 4 месяца назад +1

    This video is one of the best I have watched in data cleaning. I must say well done to you. I will save this to refer to it from time to time especially the "class" and 'outliers' part. Great job. I am definitely subscribing to your channel

    • @sammysttheanalyst
      @sammysttheanalyst  Месяц назад

      Thank you for subscribing. Kindly turn on the notification button to see more videos when I post them.
      I hope to create more interesting videos moving forward.

  • @kannana1665
    @kannana1665 5 месяцев назад +2

    Very crispy hands on Dataset cleaning. Thank You Sammy. Can you post videos on a machine language prediction on diabetes based on your dataset.

    • @sammysttheanalyst
      @sammysttheanalyst  5 месяцев назад

      Glad you found my videos amazing. Here is link to a machine learning prediction on diabetes data:
      ruclips.net/video/1tfs37aSgDA/видео.htmlsi=OLkDPG4n5JvZPYNF

  • @adeyemioshodi3846
    @adeyemioshodi3846 Год назад +4

    This is a very detailed explanation on how to clean data in Jupyter notebook.
    I like the slow pace used throughout the teaching. I could practice the code along with the instructor.
    Great Job, Well done. 👏👏👏

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      Thank you for the comment. Please do share the video and subscribe to the channel for more upcoming videos. @adeyemioshodi3846

  • @mustafayusefi6647
    @mustafayusefi6647 16 дней назад +1

    It was very useful. Thank you!!

    • @sammysttheanalyst
      @sammysttheanalyst  16 дней назад

      Thank you.
      Kindly subscribe to my channel and turn on the notification button to receive more amazing videos.

  • @AnthonyDaniel-zu3il
    @AnthonyDaniel-zu3il 4 месяца назад

    You are a great teacher!! I'd like to know why the digit(0.995) was used for the calculation of the maximum quantile for the column "Cr'

  • @dpratte
    @dpratte Год назад +2

    Thank you Sir!. Content and delivery just right at my level.

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      Glad you enjoyed the video. Please, do well to share with others.

  • @kpieceemmry3387
    @kpieceemmry3387 7 месяцев назад

    That was a nice delivery
    Please 🙏 do another video on working on db file with sql etc
    Yeah sure I am an ALx students looking forward to share your page in my community of data science.
    I have a data I want to currently cleanup such as look out for swapped column names and bcos of the large data, I don’t know 🤷‍♀️ what approach will be best fit to see the swapped column.

    • @sammysttheanalyst
      @sammysttheanalyst  7 месяцев назад +1

      It's alright. Will look into it. Currently we are working on Python and Excel simultaneously.
      Have you yet subscribed to the channel?

  • @boyianthony4971
    @boyianthony4971 7 месяцев назад +1

    @sammyst the analyst . i always run back to your video for clarity while working on a dataset. Good job sir, a quick one . Did you chose 'class to work on because it's of the object type. Do we have to check for everything?

    • @sammysttheanalyst
      @sammysttheanalyst  7 месяцев назад

      Yes, it's very important you check for everything.
      Choosing a class to work on, may later impact on your analysis negatively.
      In order to be wary of that, do well to check for everything. Thank you.

  • @phiwayinkosilanga1758
    @phiwayinkosilanga1758 6 месяцев назад +1

    Thank You Very Much SIr, well explained

  • @I_am_a_dataanaylst
    @I_am_a_dataanaylst 2 месяца назад +1

    and sir i dont understand the area of indicating a unique value for the class, knowing that there are several N and Y assigned to other units.
    another thing is that do we use the "df1.groupby" functions to also check if there is duplicate on the other columns. Thank you sir. Much awaiting your reply sir.

    • @sammysttheanalyst
      @sammysttheanalyst  16 дней назад

      For the No. 1 question, I don't clearly understand it. You can elaborate on the question.
      For the No. 2 question, YES you can. When you use the groupby function, the result shows the discrepancies in values. It shows the duplicate values of that particular column.
      But this method would be tedious especially when you're working on a large data.

  • @Bryan-rt1qc
    @Bryan-rt1qc Год назад +1

    Thanks for feeding my mind bro. I enjoyed it with all the noise👊

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      Thank you very much for your comment... I get it, bro.
      I promise to work on the noise soonest... Kindly share the video with others.

  • @myounis
    @myounis 9 месяцев назад +1

    Great explanation , Thanks for sharing

    • @sammysttheanalyst
      @sammysttheanalyst  9 месяцев назад

      You're welcome. Kindly subscribe and turn on your notification button, so as to know when I upload new videos.

  • @ItsMePeterB
    @ItsMePeterB 5 месяцев назад +1

    Thank you for the tutorial!

  • @aniketgaidhane2729
    @aniketgaidhane2729 5 месяцев назад +1

    Nicely explained☺

  • @chq012
    @chq012 Год назад

    Great video!!
    Well explained.
    Thank you. 🎉🎉

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      You're welcome. Glad you enjoyed it. Please, do well to subscribe and share to others.

  • @andyofori9285
    @andyofori9285 Год назад +2

    Great video. Learned a lot!

  • @truthdeeds3612
    @truthdeeds3612 Год назад

    Nice. you are a good teacher.

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      Thank you very much... If you enjoy the video, do well to share the video with others...

  • @francisftaylor4680
    @francisftaylor4680 Год назад

    Thanks for the lesson well explained.

  • @I_am_a_dataanaylst
    @I_am_a_dataanaylst 2 месяца назад

    Sir please can I ask a question
    In the drop of the missing value to other columns, why did you use df1 =, instead of the df that you have been using initially

    • @sammysttheanalyst
      @sammysttheanalyst  2 месяца назад

      Hello Amaka,
      I used df1= because I have done a major change to the original dataframe(df).
      The major change was deleting some columns.
      Thus, in other for the new dataframe to function properly, I need to rename it and carry on with the new name(df1).
      Hope that helps?

    • @I_am_a_dataanaylst
      @I_am_a_dataanaylst 2 месяца назад

      @@sammysttheanalyst thank you so much sir

  • @I_am_a_dataanaylst
    @I_am_a_dataanaylst 2 месяца назад

    Thank you so much 🙏

    • @sammysttheanalyst
      @sammysttheanalyst  2 месяца назад

      You're welcome 😊.
      Please, do well to subscribe to the channel. Like and share the video as well.

    • @I_am_a_dataanaylst
      @I_am_a_dataanaylst 2 месяца назад

      Sir please can I ask a question
      In the drop of the missing value to other columns, why did you use df1 =, instead of the df that you have been using initially

    • @I_am_a_dataanaylst
      @I_am_a_dataanaylst 2 месяца назад

      13:51

  • @dilaraesmer
    @dilaraesmer Год назад +1

    Thank you for your all effort :)

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад +1

      I appreciate your comment... Thank you too... Please share the video...

    • @dilaraesmer
      @dilaraesmer Год назад

      @@sammysttheanalyst I'll share with my network :)

  • @basmathalishahir1341
    @basmathalishahir1341 5 месяцев назад

    is there a way to save the changes made by jupyter to the excel or csv file in the computer

    • @sammysttheanalyst
      @sammysttheanalyst  5 месяцев назад

      Yes, there is a way.
      Simply use df.to_csv("new_filename.csv")
      df.to_excel("new_filename.xlsx")
      This will store the files in the same directory as the Jupyter Notebook.

  • @user-jk6th7vi2n
    @user-jk6th7vi2n 3 месяца назад +1

    Can we see somehow the code to copy paste or see it without picking times on this vudeo?

    • @sammysttheanalyst
      @sammysttheanalyst  Месяц назад

      As a Data professional, it is recommended to type the codes yourself, so you can understand what result they produce.
      Nonetheless, I have attached the GitHub link to the description.

  • @aamerhafeez
    @aamerhafeez Год назад +1

    Very good video.

  • @pelarsolution1288
    @pelarsolution1288 Год назад

    Nice one bro clear explanation thank!!!

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      Glad you found it awesome. Do subscribe to the channel and share to others.

  • @siyakholwakhiphi1556
    @siyakholwakhiphi1556 Год назад +1

    Great Content Sir

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      Thank you for the encouraging words... I hope to do greater videos.

  • @kpieceemmry3387
    @kpieceemmry3387 7 месяцев назад

    Can you please 🙏 tell me why you suspected CLASS does it mean that u checked the data manually before cos it’s not just possible for you to assume that class is faulty?

    • @sammysttheanalyst
      @sammysttheanalyst  7 месяцев назад +1

      Yeah. I had previously assessed the column critically. You can check out my video on how to access data critically here:
      ruclips.net/video/n6UgprrvGQI/видео.html

    • @MaryJtee
      @MaryJtee 6 месяцев назад

      The video is not clear please, not seeing what you are doing.

  • @user-sw6mh5in8y
    @user-sw6mh5in8y Год назад

    thanks from greece

  • @CaribouDataScience
    @CaribouDataScience Год назад

    Say, do you have a description for the columns

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад +1

      Not at all. I didn't find column description in the website I downloaded the data from.
      Thank you for the observation 🙏

  • @hmikraminfo7019
    @hmikraminfo7019 Год назад +3

    Full of learning. But filled with noise

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад +2

      Thank you for your comment. I appreciate it.
      In regards to the noise, I am really sorry for your experience. I am working towards purchasing a noise-cancellation microphone to enable good sound.
      Thank you once again for your contribution.

  • @LeonidasParigoris
    @LeonidasParigoris Год назад +2

    Is that a parrot in the background?

  • @Tekionemission
    @Tekionemission Год назад

    (36:52) yes! Outliers

  • @user-if7cf9cx4f
    @user-if7cf9cx4f 7 месяцев назад

    please tell me how to import csv file i did but i gives me syntax error

    • @sammysttheanalyst
      @sammysttheanalyst  7 месяцев назад

      It's because the syntax isn't right. Try using df = pd.read_csv('file_path').

  • @jececdept.9548
    @jececdept.9548 11 месяцев назад

    pls tell me when to use astype to convert columns??

    • @sammysttheanalyst
      @sammysttheanalyst  10 месяцев назад

      Hello @jececdept.9548, can you elaborate more on converting columns? How do you mean? Is it the datatypes of the columns or what?

  • @theophilus4723
    @theophilus4723 Год назад

    Good job, keep it up

  • @leandrov07013
    @leandrov07013 Год назад

    Awesome

  • @IkramChebbi
    @IkramChebbi Год назад

    great video, thanks. Can I have the unclean dataset

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      You're welcome... Glad you enjoyed the video... The unclean dataset can be found in the video description...

  • @BelloIsmail-l8l
    @BelloIsmail-l8l Месяц назад

    I must be honest to the producer of this video that is very educative, but the video and audio are not cleared, the video is plumt and there was underground noise, like sound of generator/engine set

    • @sammysttheanalyst
      @sammysttheanalyst  Месяц назад

      Thank you for your comment.
      Actually, the underground noise was the sound of the laptop I was using then. But currently, I use a microphone, so the underground sound is no more.
      Please, do well to subscribe to the channel, and turn on your notification to see more of my videos.

  • @danlewis8782
    @danlewis8782 Год назад

    Where did you get the dataset. I want to follow along

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      Hello... You can access the unclean diabetes dataset through the Google drive link in the video description...
      Thank you for your comment... Kindly subscribe and share the video...

  • @abdullahal-omair8892
    @abdullahal-omair8892 9 месяцев назад +1

    can we have the NoteBook please

    • @sammysttheanalyst
      @sammysttheanalyst  9 месяцев назад +1

      Hello Abdullahal.
      You can access the notebook here:
      github.com/SammystTheAnalyst/Data-Science-Projects/blob/main/Data%20Cleaning%20on%20Diabetes%20Data.ipynb

  • @PP-cr6fy
    @PP-cr6fy 11 месяцев назад

    How to download that dataset sir

    • @sammysttheanalyst
      @sammysttheanalyst  11 месяцев назад +1

      Hello. Kindly check the video description. The dataset can be found there

    • @PP-cr6fy
      @PP-cr6fy 11 месяцев назад

      @@sammysttheanalyst Thank you and really like the way to explain each step and do and explain why

  • @preazz
    @preazz Год назад +2

    Clean you mic first

  • @vishalmane3139
    @vishalmane3139 Год назад

    Painfull background noise

    • @sammysttheanalyst
      @sammysttheanalyst  Год назад

      Yeah, am sorry if it affected in any way, please. I will work on that, thank you.

  • @user-ql5hb5ci6w
    @user-ql5hb5ci6w 6 месяцев назад

    What a horrible sound (((

  • @namashaggarwal7430
    @namashaggarwal7430 11 месяцев назад

    First clean your background noise!

    • @sammysttheanalyst
      @sammysttheanalyst  10 месяцев назад

      Please, do not ever comment on my RUclips videos with a link to your channel.
      Next time, I will have to report you and your channel to RUclips.
      Kindly take note. Thank you!