How to Identify and Treat Outliers in Stata | Stata Tutorial

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024

Комментарии • 57

  • @Mat-mt8pk
    @Mat-mt8pk 3 года назад +20

    Methods of finding outliers
    1:14 #1. Sorting
    2:52 #2. Box Plot
    6:04 #3. Extremes
    10:05 #4. Histogram
    10:50 #5. Spike Plot
    11:42 #6. Zscore
    Treatment
    13:07 #1. Keep outliers
    13:42 #2. Correct error
    14:23 #3. Winsorization
    19:06 #4. Trimming

  • @gonout8402
    @gonout8402 2 года назад +3

    You have explained everything that my professor taught me in 2 months in just 20 minutes and it's is much more understandable and useful. Thank you very much

  • @rouniktalukdar872
    @rouniktalukdar872 2 года назад +1

    Amongst the nicest video lecture that I have come across on this topic.. Thanks a lot. please keep uploading more contents on STATA.

  • @wilsonahinful5127
    @wilsonahinful5127 2 года назад +1

    This is all that I have been looking for, thanks very much indeed

  • @addisugetahun1441
    @addisugetahun1441 2 года назад

    Thank you for your nice and clear lecture in identifying and treating outliers.

  • @jemalhassen2841
    @jemalhassen2841 18 дней назад

    It a very helpful video. Thank you!

  • @tomaxow
    @tomaxow 2 года назад +1

    Really well done and explained

  • @alphadie2012
    @alphadie2012 3 года назад

    Clear and concise explanation. Thank you

  • @jibrilyero2263
    @jibrilyero2263 6 месяцев назад +1

    Great job 🎉

  • @danishjunaid1659
    @danishjunaid1659 2 года назад +1

    Very well explained

  • @isaacasante4060
    @isaacasante4060 Год назад +3

    Awesome video. Could you please do a similar one using panel data.

  • @korneliuslanggason5477
    @korneliuslanggason5477 3 года назад

    thank you for the explanation.

  • @yilebesaddisu5314
    @yilebesaddisu5314 3 года назад

    Thank you dear, very helpful!!

  • @shafiqullahyousafzai15
    @shafiqullahyousafzai15 3 года назад

    Thanks from Afghanistan

  • @lottet1945
    @lottet1945 3 года назад +1

    Thank you for this clear explanation!
    Do you have a video on Cook's distance and Mahalanobis distance in Stata by any chance?

    • @thedatahall
      @thedatahall  3 года назад +1

      Thanks for watching the video. Unfortunately i currently dont have video on this. I will see if in future i might add this. But if u r interested in spss then there are videos on RUclips

  • @aibannongspung1765
    @aibannongspung1765 2 года назад

    Thank you so much for this insightful video !! Suppose I want to trim the top and bottom 0.1 % of the distribution .How do I write the command ?

    • @thedatahall
      @thedatahall  2 года назад

      I have never tried with decimals but the command will look like winsor2 variablename, trim cut(0.1 99.9)

    • @thedatahall
      @thedatahall  2 года назад

      Let me know if it works

  • @atiyaabdulkarim716
    @atiyaabdulkarim716 3 года назад

    A quick question, if we use sort function, will it allign all other observations in other variables? For eg. If we Sort by price, but we have other variables on age education and i.d. No.
    So after sorting by price, would it keep track of age and education with respect to i.d. after sorting or only one variable would be sorted not others, this can create problems, No?

    • @thedatahall
      @thedatahall  3 года назад

      In stata the sort comment will keep tract of all variables and sort them simultaneously. The whole row will move and not the specific column of price.

    • @thedatahall
      @thedatahall  3 года назад

      Sort only sorts in accending order, there is another command gsort -price so now it sort in descending

  • @AhaNYS
    @AhaNYS 3 года назад

    Thank you for the video! I have a question, I want to use ssc extremes among subcategories. How can I apply this extremes for every subcategory??

    • @thedatahall
      @thedatahall  3 года назад +1

      U can try bys category: extremes etc etc

  • @shrinjoy1234
    @shrinjoy1234 3 года назад

    How do we use winsor command if we want to replace outliers with Q3+1.5 IQR
    Can we use winsor command to handle outliers of multiple columns in one go? Please advise.

    • @thedatahall
      @thedatahall  3 года назад

      it is not possible using winsor or winsor2 command. you will have to write code for it. one way is to create a variable that will store the value of Q3+1.5iqr and then u can use that to replace in your main variable

  • @badiahahmed2085
    @badiahahmed2085 3 года назад

    Thank you for your great video. I have a question please, After using the Winsorization, can I take the logarithm for some variables? Thank you.

    • @thedatahall
      @thedatahall  3 года назад +1

      Yes you can take log after winsorization. But be advised that after taking log the interpretation of coefficient changes to percent change. I am soon going to make a video on functional forms, so if u dont have the idea on interpretation after taking log then that video will help.

    • @badiahahmed2085
      @badiahahmed2085 3 года назад

      @@thedatahall Thank you for your response, that will be great. MANY THANKS

  • @user-zn4rv5wb3k
    @user-zn4rv5wb3k 6 месяцев назад

    Hi, hope you are doing great. Can you share the link of multivariate outliers, I am not able to find it?

    • @thedatahall
      @thedatahall  6 месяцев назад +1

      Thanks for your kind words. Unfortunately we haven't made any video on multivariate outliers. I will add that in my todo list

    • @user-zn4rv5wb3k
      @user-zn4rv5wb3k 6 месяцев назад

      It would be highly appreciated.@@thedatahall

  • @atiyaabdulkarim716
    @atiyaabdulkarim716 3 года назад

    Can you tell us/take us through calculator functions in stata (syntax for exponent and complex function)

    • @thedatahall
      @thedatahall  3 года назад

      Sure, u want me to make a video on arithmetic etc functions in stata?

    • @atiyaabdulkarim716
      @atiyaabdulkarim716 3 года назад

      @@thedatahallthank you for getting back to me. I am a medical student and i have to use calculate function in stata to generate a new variable. My problem is that some components are used in exponent form, if you look at MDRD equation to define chronic kidney disease or CKD EPI equation, you will see serum creatinine levels, age are entered in the formula. My specific question is if i want to use this information from some variables in my data set, how can i do this. I tried exponent function but my calculations appear to be incorrect and it seems i am not following the right steps. I would highly appreciate if you could make a video or may be if you can give me a feedback.

    • @thedatahall
      @thedatahall  3 года назад

      What command did u used, if u used exp() function then thats to invert log... If u email me the equation at info@thedatahall.com and might be some sample data or the command u have used i will look into it. If u wanted to take power e.g. square of a number then u do gen newvariable=oldvariable^2

    • @thedatahall
      @thedatahall  3 года назад

      I searched for mdrd equation but i am not sure i found the right one

    • @atiyaabdulkarim716
      @atiyaabdulkarim716 3 года назад

      @@thedatahall thank you for getting back to me, here is the link: patient.info/doctor/estimated-glomerular-filtration-rate-gfr-calculator
      Normal creatinine values range between 0.6 to 1.2 mg/dl...so one can use values at higher end or perhaps old age and see what is the filteration rate....

  • @tranglephuong1896
    @tranglephuong1896 Год назад

    Can you give me the dataset you run in video?

    • @thedatahall
      @thedatahall  Год назад

      unfortunately i have misplaced the data and do file for this specific video.

  • @alfinasintiya7477
    @alfinasintiya7477 2 года назад

    saya tidak dapat menggunkan "extremes" adakah solusinya?

    • @thedatahall
      @thedatahall  2 года назад

      i just used extreme, its working fine with me what error u are getting?
      saya hanya menggunakan "extremes", berfungsi dengan baik dengan apa ralat yang anda dapat?