How to Detect and Remove Outliers in the Data | Python

Поделиться
HTML-код
  • Опубликовано: 22 янв 2025

Комментарии • 70

  • @pankajgoikar4158
    @pankajgoikar4158 2 года назад +10

    You are amazing bro. Don't have words to thank you. you have cleared my many concepts. Lots of love from UK and god bless you. 😊

    • @HackersRealm
      @HackersRealm  2 года назад +2

      Thank you so much for your kind words ❤️

    • @BLG120
      @BLG120 3 месяца назад

      u r indian

  • @asadnaeem123
    @asadnaeem123 6 месяцев назад +3

    Amazing tutorial. Bro, you made my day. Lots of love from Pakistan.

    • @HackersRealm
      @HackersRealm  6 месяцев назад

      Glad to hear that!!!

    • @YousafSulaiman
      @YousafSulaiman 5 месяцев назад +1

      You are from Pakistan !! Amazing !😀

  • @leekhon108
    @leekhon108 3 месяца назад +1

    Thank you very much for the tutorial, it is easy to understand and we explained ☺☺

  • @Hollysattic
    @Hollysattic Месяц назад +1

    Your videos helped me so much! Thanks a lot🎉

  • @kennylouries410
    @kennylouries410 2 месяца назад +1

    Thank you sir. Clearcut explanation.

  • @negusuworku2375
    @negusuworku2375 11 месяцев назад +1

    This is very helpful. Excellent.

  • @codename_Rahul
    @codename_Rahul 11 месяцев назад +1

    This video helped me a lot. Thanks!

  • @durgak2587
    @durgak2587 4 месяца назад +1

    Thank you so much❤️..this was very helpful 🤜✨

    • @HackersRealm
      @HackersRealm  4 месяца назад

      @@durgak2587 glad you liked it!!

  • @ArniFuentes
    @ArniFuentes 6 месяцев назад +1

    Thank you so much!!!. A question: in what type of distributions can the box plot be used? For example, if the data follows a uniform distribution, does it make sense to find outliers? What do you recommend me?

    • @HackersRealm
      @HackersRealm  6 месяцев назад +1

      You can use box plot and check if there are any outlier for any distribution. If there is some outliers, do the processing, if not ignore it.

    • @ArniFuentes
      @ArniFuentes 6 месяцев назад +1

      @@HackersRealm thanks for your answer

  • @debangshubarua5345
    @debangshubarua5345 Год назад +2

    Good vedio... Do i need check for all the numeric columns one by one and perform capping operation??????

    • @HackersRealm
      @HackersRealm  Год назад +1

      You can use a loop to do it for all numeric columns at once...

  • @DJnaidu22
    @DJnaidu22 9 месяцев назад +1

    really a great explanation

  • @DJnaidu22
    @DJnaidu22 9 месяцев назад +3

    Bruh I have a doubt..... please explain briefly..... These three techniques are used for trimming or capping outliers in the dataset...... But why don't we use only z-score to find outliers. Then what's the diff between these three techniques??

  • @massoudkadivar8758
    @massoudkadivar8758 Год назад

    Thank you so much,
    I have a question, do we need to do this process for each column one by one?

    • @HackersRealm
      @HackersRealm  Год назад

      yes, that's correct, you can use loops to automate this.

  • @titi-cu8dx
    @titi-cu8dx Год назад +1

    What about dealing with categorical columns in the context of outliers?

    • @HackersRealm
      @HackersRealm  Год назад

      I don't think there will be outliers in categories

  • @SylvanAnugrahSyahPutra
    @SylvanAnugrahSyahPutra 5 месяцев назад

    Hi.
    if the data distribution is not normal, it's okay to use z-score ? or we should use IQR ?

  • @Fighter_Believer_Achiever
    @Fighter_Believer_Achiever 8 месяцев назад +1

    Thank you very much sir!!

  • @sushmitarawat6438
    @sushmitarawat6438 Год назад

    Too good....and simple thanks a lot☺️🙏🏼

    • @HackersRealm
      @HackersRealm  Год назад +1

      Glad you like it sushmita!!!

    • @sushmitarawat6438
      @sushmitarawat6438 Год назад

      @@HackersRealm could you suggest some paid internship which I can start off with the very next month

    • @HackersRealm
      @HackersRealm  Год назад +1

      @@sushmitarawat6438 For ML based internship, it's better to compete in hackathons or contest to get internship.. You could checkout hackerearth, techgig, etc., for that

    • @sushmitarawat6438
      @sushmitarawat6438 Год назад

      @@HackersRealm ok

  • @vietttt0104
    @vietttt0104 2 года назад +1

    Greate Tutorial!! Thanks a lot!! I have a question that How could I do it with the whole dataset? not a single one

    • @HackersRealm
      @HackersRealm  2 года назад

      you can iterate the columns and process the whole data

    • @aniketlode4808
      @aniketlode4808 2 года назад

      @@HackersRealm So to iterate it we will be using for loop passing each column name as I??

    • @HackersRealm
      @HackersRealm  2 года назад

      @@aniketlode4808 yeah

  • @ocraking
    @ocraking 7 месяцев назад +1

    what an amazing video

  • @ShubhamPlays
    @ShubhamPlays Месяц назад +1

    good video

  • @nishah4058
    @nishah4058 3 месяца назад

    can u pls tell what can be outliers in textual data like comment etc..and how we can remove that outliers in textua data?

    • @HackersRealm
      @HackersRealm  3 месяца назад

      you could use text embedding and have a cluster, anything that is far of the cluster might be a outlier

    • @nishah4058
      @nishah4058 3 месяца назад

      @@HackersRealm not getting you .. can u pls eloborate.

  • @СулейманК-г8ы
    @СулейманК-г8ы 5 месяцев назад

    can somebody please explain from where we get 1.5 in the IQR method? why exactly 1.5?

  • @yvkvlogs
    @yvkvlogs 2 месяца назад

    Can we use any one method that is enough to remove Outliers 😊

  • @adityachoudhari3596
    @adityachoudhari3596 2 года назад +2

    Yo bro I m also learning ai and ml concepts I just need to work one some project or get the training in this
    Plz tell me if you can help

    • @HackersRealm
      @HackersRealm  2 года назад +1

      check the iris dataset analysis project in the playlist for start

  • @mohamads9759
    @mohamads9759 9 месяцев назад +1

    Very Great.

  • @Serene__Soul98
    @Serene__Soul98 2 года назад

    Hii..my dataset has 19 columns and at least 10 colums shows outliers..
    So do I have to perform this process for every column each time?

    • @HackersRealm
      @HackersRealm  2 года назад

      Yes it's better to do the process in a loop and fix it for better results

    • @avashchand9623
      @avashchand9623 2 года назад

      @@HackersRealm Can you kindly show this process too. Searching for it everywhere can't find it.

    • @HackersRealm
      @HackersRealm  2 года назад

      @@avashchand9623 what process you're referring?

    • @aniketlode4808
      @aniketlode4808 2 года назад

      @@HackersRealm I think he is asking for the process of looping the columns

    • @nihalkausar2215
      @nihalkausar2215 9 месяцев назад

      Pls after I have handled each column outlets how do I save it and which data frame should I continue using

  • @santoryuu989
    @santoryuu989 2 года назад

    what do you think is the best method out of these three ?

    • @HackersRealm
      @HackersRealm  2 года назад

      You can use any method as it's producing similar results, but instead of deleting samples, trim it in the range

  • @Niyati_11
    @Niyati_11 Год назад +1

    My df is empty while finding the outliers. Any idea why it is so?

  • @karthika8610
    @karthika8610 Год назад

    Which method is the most preferred?

    • @HackersRealm
      @HackersRealm  Год назад +2

      It's not about preference, it depends on where and which use case you're trying to solve

    • @madhulikasuman2803
      @madhulikasuman2803 9 месяцев назад +1

      @@HackersRealm if there are 40% outlier then ?

    • @HackersRealm
      @HackersRealm  9 месяцев назад

      @@madhulikasuman2803 it depends on the nature of data, need to understand the domain, and see why this is the case. We could do some data transformation like log transformation to change it

  • @ricesweat9951
    @ricesweat9951 Год назад

    why you decided to use residual sugar as a column to find outliers? any tips and tricks on which columns should be used to find outliers within the dataset?

    • @HackersRealm
      @HackersRealm  Год назад +1

      we can use boxplot or violinplot to find the outliers. You can see some dots outside the line which can be considered as outliers.

  • @Sachinnani019
    @Sachinnani019 Год назад

    8:35 outliers=26

  • @vlog_fiast
    @vlog_fiast 6 месяцев назад