Stata - How to winsorize your data

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024
  • Welcome to my classroom!
    This video is part of my Stata series. A series where I help you learn how to use Stata. In this video, we look at how to use the winsor command to winsorize your data/variables. Often a better alternative to dropping certain observations.
    Note: What I show here is my take on the topic. I would be happy to receive comments!
    Useful links:
    ►Twitch: / steffens_classroom
    ►Twitter: / steff5001
    ►Workpage: www.rug.nl/sta...
    ►Subscribe: cutt.ly/Qfu9cmV

Комментарии • 35

  • @SteffensClassroom
    @SteffensClassroom  3 года назад

    Happy my brother had some fun editing this one -.-

  • @vartuhitonoyan4923
    @vartuhitonoyan4923 2 года назад +2

    Awesome job, brief and to the point. Thank you so much.

  • @whyosrs9863
    @whyosrs9863 2 года назад +1

    steffen legend

  • @vanisher49
    @vanisher49 2 года назад

    Thank you this was very helpful!

  • @sitinabilahmohdshaari6771
    @sitinabilahmohdshaari6771 Год назад +1

    many thanks

  • @failedomelet9978
    @failedomelet9978 3 года назад +1

    Pretty good video my dude
    Had a bad time near the end though

  • @juhokim6149
    @juhokim6149 2 года назад

    Thank you

  • @bellisma77
    @bellisma77 Год назад +1

    Thanx for your videos. Is there a specific criteria for choosing the cut off points here?

    • @SteffensClassroom
      @SteffensClassroom  Год назад +1

      Hi!
      Often, 5 or 10 percent is used. I need to look more into the literature, but always justify your choices.

    • @bellisma77
      @bellisma77 Год назад

      I wanted to apply winsor2 to my thesis data and afraid it won’t be suitable.

    • @SteffensClassroom
      @SteffensClassroom  Год назад +1

      @@bellisma77 Well, it depends on how the data looks. Do you have a lot of outliers? Is it skewed? In any case, it is a good idea to conduct your analysis with and without applying winzorizing.

    • @bellisma77
      @bellisma77 Год назад

      @SteffensClassroom not that much outliers and i want to replace them by winsorizing with +/- 3 sigma

    • @SteffensClassroom
      @SteffensClassroom  Год назад +1

      @@bellisma77 You just need to do it yourself, which fortunately is not very difficult in this case.
      sum x
      scalar low = r(mean) - 3 * r(sd)
      scalar high = r(mean) + 3 * r(sd)
      gen x2 = cond(x < low, low, cond(x > high & x < ., high, x))
      This should do it.

  • @FemkeHuisman
    @FemkeHuisman 3 года назад +1

    If you should winsorize all continuously scaled variables, does that mean variables with 'real numbers' (stata: 'float' and 'double'), or also integer variables (in stata named 'byte', 'int', 'long')?

    • @SteffensClassroom
      @SteffensClassroom  3 года назад

      Be sure what you mean by continous here. You may be mixing things up here. I typically would only consider string vs numeric when it comes to winzoring, as winzoring works the same way for all numeric types in Stata (to the best of my knowledge).

    • @FemkeHuisman
      @FemkeHuisman 3 года назад +1

      @@SteffensClassroom Thanks!!

  • @Kerrmunism
    @Kerrmunism 3 года назад

    Wait but why are one of their eyes blue tho?

  • @darvidtorres
    @darvidtorres Год назад +1

    How to winsorize for panel data?

    • @SteffensClassroom
      @SteffensClassroom  Год назад +1

      Afaik, same startegy. You are winsorizing prior to estimation.

  • @aminaahmedalibelal5676
    @aminaahmedalibelal5676 Год назад

    Thanx for your simple and clear explaination. Is it possible to replace the values with say the mean?
    winsor2 x1 if z ==1,replace(mean)? i want to replace x1 which its z =1 with the mean value. Is this correct?

    • @SteffensClassroom
      @SteffensClassroom  Год назад

      Honestly, never tried. However, what is your reasoning for wanting to replace your values with the mean? It doesn't sound like a good idea.

    • @aminaahmedalibelal5676
      @aminaahmedalibelal5676 Год назад

      @@SteffensClassroom i have a panel data and i detected outliers using Cook's distance, and leverages, and i want to replace these points with mean or median value. I guess using replace command will be suitable right? thank you Dr. for you reply

    • @SteffensClassroom
      @SteffensClassroom  Год назад +1

      @@aminaahmedalibelal5676 Hi again. The replace command would of course work. My question was more like why you would like to replace the values with the mean/median. There must be a reason I am missing :)
      Good luck with your project!

    • @aminaahmedalibelal5676
      @aminaahmedalibelal5676 Год назад

      @@SteffensClassroom Hi. I have a percentage of 2.30% of outliers and i do not want to trim them so my panel data will not be unbalanced. Thanks for your wishes. Hope you loaded more video's on STATA.

  • @Josobro
    @Josobro 3 года назад +1

    snans blundertale.

  • @JetEriksen
    @JetEriksen 3 года назад

    Funny Skeleton

  • @xtra3678
    @xtra3678 3 года назад

    Sans is ness confirmation

  • @crusaddy982
    @crusaddy982 3 года назад

    Ketchup

  • @kittentamer2164
    @kittentamer2164 3 года назад

    Funnybones

  • @mitchellroberts9626
    @mitchellroberts9626 3 года назад

    doot

  • @Suicutie
    @Suicutie 3 года назад

    undersnail

  • @GrandleJams
    @GrandleJams 3 года назад

    Haha, Megalovania

  • @abujad4226
    @abujad4226 2 года назад

    Thank you