A Two Step Transformation to Normality in SPSS

Поделиться
HTML-код
  • Опубликовано: 22 авг 2024
  • This video shows how to transform continuous variables toward normality in SPSS. This approach retains the original series mean and standard deviation to improve the interpretation of results. Use of this method can be justified by citing the following published paper:
    Templeton, G. F. (2011). A Two-Step Approach for Transforming Continuous Variables to Normal: Implications and Recommendations for IS Research. Communications of the Association for Information Systems, 28, pp-pp. doi.org/10.177...

Комментарии • 267

  • @mikemmoon
    @mikemmoon 9 лет назад +11

    Wow! I have been trying every transformation under the sun for several of my variables for 2 straight weeks with no luck. This is like magic. Now I just might finish my PhD dissertation by the end of the summer after all. Many thanks!

  • @abigailfulton9185
    @abigailfulton9185 5 лет назад +4

    For anyone who doesn't know you use the series mean and standard deviation that he uses in the video. IT WORKS AND HE SAVED MY LIFE!

  • @gftempleton
    @gftempleton  8 лет назад +11

    People often ask why their sample size is reduced by 1 when using this technique. The reason this happens is as a result of the first step, the values range from 1/n to 1. All values must be a fraction for step 2 to work, so it skips over the 1 (associated with the biggest value). In order to fix this, you should replace the missing value (the result of applying step 2 to the 1) with 1-(1/n).
    For example, if you start with a sample of 1,000, the Two-Step will likely result in a sample of 999. To use the missing record, you'll need to find it (it's the "1" value resulting from the first step among all cases). Replace the 1 with 1-(1/1000), or 1-.001, or .999.
    This won't change results much, but will ensure every case is used.
    Of course, I'd put a small note in the paper about any transformation step needed.

    • @Marie-sh6zm
      @Marie-sh6zm 7 лет назад

      Hi i would like to ask, how many times am i allowed to normalized the same data? thanks in advance!

    • @gftempleton
      @gftempleton  7 лет назад

      In my opinion, there are no rules as long as you report exactly what you have done. Let the reviewers or advisors help you.

    • @amadeo3844
      @amadeo3844 6 лет назад

      Is there an automated way to do this replacement? I have over one-hundred columns of data ranked and each one has a value of "1.00". Do i manually need to go in and find the 1.00 in each column and change it to .999?

  • @gftempleton
    @gftempleton  9 лет назад +33

    When using this technique in research, it may help in the peer review process to cite the published article referenced at the end of the video:
    Templeton, G.F. 2011. "A Two-Step Approach for Transforming Continuous Variables to Normal: Implications and Recommendations for IS Research," Communications of the AIS, Vol. 28, Article 4.

    • @charlottekik6475
      @charlottekik6475 8 лет назад

      Thank you so much!

    • @gftempleton
      @gftempleton  8 лет назад

      You're welcome, Charlotte!

    • @melissacagle3707
      @melissacagle3707 5 лет назад

      Hi Gary! Should I transform my variables like this for conducting a Principle Component Analysis in order to form an index? Thank you for your video!

    • @veronicawong9023
      @veronicawong9023 3 года назад +2

      Thank you so much...its 2021 now ur video save my life !

    • @davine1301
      @davine1301 3 года назад

      @@veronicawong9023 orang mana?

  • @chukwuemekaemenekwe746
    @chukwuemekaemenekwe746 9 лет назад +2

    Gary your tutorial just saved my day. Been struggling with different transformation techniques. Seeing yours just brightened my day. Danke!!!

  • @oscaronam7862
    @oscaronam7862 8 лет назад +3

    Thanks a lot Gary, I had been struggling to normalize my skewed data but when I used the two steps in your paper and video that you explain clearly, my data is now normal - confirmed by Kolmogorov-Smirnov and Shapiro-Wilk tests. Very helpful video!

    • @gftempleton
      @gftempleton  8 лет назад

      +Oscar Onam That's great, Oscar. Good luck.

    • @aristrolltle8580
      @aristrolltle8580 2 года назад

      Sometimes, the KS test and SW test refutes the normality hypothesis, although the skewness and kurtosis values are ok.

  • @arturogarcialomeli5745
    @arturogarcialomeli5745 3 года назад +2

    You save my day, thanks a lot, i think fi you want to obtain the mean and de standard deviation you need to process you data before apply this method, you are gonna be cited in my thesis!!!!!

    • @asmaae1993
      @asmaae1993 3 года назад

      Hi Mr Arturo
      Do you have any idea about how we can transform back the data from this form when we want to report the results!!
      Thank you

  • @shahmahmood3908
    @shahmahmood3908 9 лет назад +53

    My question is, from where you wrote the value for the second question mark (?) and from the third question mark (?). You didn't show that from where you got the series mean that copied and past from notepad and the standard deviation. I will really appreciate if you could help me in this regards.
    IDF.NORMAL(RDistanc,?,?)

    • @norwegianresearchtraininginsti
      @norwegianresearchtraininginsti 4 года назад

      did you find out the answer to this your question you share with me. He has mentioned he copied them from Notepad that is what I have heard

    • @juliabachmann639
      @juliabachmann639 4 года назад +3

      @@norwegianresearchtraininginsti you can find it in his paper, it says:
      To accomplish Step 2 in Excel, use the NORMINV() function, having the following syntax:
      NORMINV(Step 1 result, imposed mean, imposed standard deviation)
      Where,
      Step 1 result = the result of Step 1, which must be in probability form
      Imposed mean = mean of the variable resulting from the transformation (!!)
      Imposed standard deviation = standard deviation of the resulting variable (!!)

    • @norwegianresearchtraininginsti
      @norwegianresearchtraininginsti 4 года назад

      @@juliabachmann639 I will check that his paper Iam interested in that method

    • @doancongthanh93
      @doancongthanh93 3 года назад +1

      @@juliabachmann639 I think it's the mean of the variable that has been transformed. You can see these values in the Histogram chart at 2.33

    • @juliabachmann639
      @juliabachmann639 3 года назад

      @@doancongthanh93 ahh okay, I see. thank you- that's very helpful!!

  • @khalidrehman4202
    @khalidrehman4202 6 лет назад +1

    thank u so much...this video is too much informative..my data are not normally distributed but after watching this video..i apply this procedure.now my data was normal.

  • @marhan2757
    @marhan2757 9 лет назад +3

    I guess, the interpretation does not change much because of this transformation, i.e. st. deviations and means stay the same, while kurtosis and skewness significantly improve. Also, this technique solves the problem with outliers (that are actually not).
    thanks a lot for such a great solution!

  • @learningwithms8293
    @learningwithms8293 2 года назад

    Thanks a lot, Dr. Templeton, It is really helpful, I used mentioned process and found it to be useful not only for me but also for my entire department.

  • @jaimeb.384
    @jaimeb.384 7 лет назад +9

    What mean and standard deviation are you using? It is not clear in the video.

  • @ellieking4132
    @ellieking4132 4 года назад

    I was really struggling to work out how to make my data normally distributed in order to do my analysis and this video has saved me. Thank you so much for taking the time to share this method with is, and answer our queries! I really appreciate it :)

    • @riderho1
      @riderho1 4 года назад

      Ellie King where did he get the second and third value ? (,?,?)

    • @ellieking4132
      @ellieking4132 4 года назад

      @@riderho1 My understanding (and what I used) is the second and third values are the mean and SD of the variable that you are transforming.

  • @aaljumaili
    @aaljumaili 4 года назад +1

    This video has great value. Thank you so much, Gary for saving my day

  • @alexjimenez9940
    @alexjimenez9940 3 года назад +1

    What a great solution! Thank you very much Gary for your help!

  • @jjabb
    @jjabb 4 года назад

    Thank you. This saved my life. I had been struggling with numerous transformations, but did not work. It worked for me. I also used the 1-(1/n). You get a citation from me.

  • @tarignassr6690
    @tarignassr6690 6 лет назад +8

    Hi, You didn't show that from where you got the series mean that copied and past from notepad and the standard deviation.

  • @FooodConfusion
    @FooodConfusion 3 года назад +1

    Wow so easy Loved your way of explanation simple and to the point

  • @kamalpreetrakhra8071
    @kamalpreetrakhra8071 8 лет назад +7

    Hello Dr. Templeton,
    I am a PhD student and have found that using the technique mentioned greatly improves the skewness and kurtosis value. However data is still not normally distributed. I have also tried log10 and loge transformations. Is there anything else that I can use? I dont have the option of dichotomizing the data. Please can I have the copy of your article for further details

    • @tsedesiree
      @tsedesiree 2 года назад +1

      Hi Kamalpreet, I got the same problem too. I applied Dr. Templeton's technique but my data is still skewed. Did you figure out how to transform the data into normal distribution? I want to perform a 3way anova so can't just use KW test. Appreciate if you could get back, thanks:)

    • @duckhunterforex7577
      @duckhunterforex7577 2 года назад

      @@tsedesiree Same here, hope you already found the answer and would share it with us all, looking forward. Thank you in advance

    • @10VGomez
      @10VGomez 2 года назад +1

      Hi, I'm also interested in normalizing a variabe. I have used: ln(x), log(10), 1/x, sqrt(x) and this method but nothing works.. I have heard about johnson transformation method. I haven't tried yet, but it said this method works almost always since it finds an optimal function that normalize your data. Let me try and I will tell you, If somebody knows how to use this method in spss please share the info =)

  • @smrutimokal7452
    @smrutimokal7452 4 года назад

    Thanks Gary for the wonderful video and the article. I always have trouble when normalising the data since transformation like log doesn’t usually work.. but this is great .. simply wonderful. Thank u again

    • @gftempleton
      @gftempleton  4 года назад

      I'm glad it helped, Smruti.

    • @gftempleton
      @gftempleton  4 года назад

      Awesome to hear that it worked for you, Smruti. Good luck on your research.

  • @sasali6727
    @sasali6727 7 лет назад

    Gary,
    I ran several times the procedure on both SPSS and EXCEL using the same data set. Apparently, the outputs are inconsistent. Not sure what might cause the difference. I double checked the formula as well described on your paper.
    Here is the excel formula:
    To get the percent rank =IF(B4="","",IF(PERCENTRANK(B$2:B$50,B4)=1,0.9999,IF(PERCENTRANK(B$2:B$50,B4)=0,0.0001,PERCENTRANK(B$2:B$50,B4))))
    To get the inverse of the Cumulative Normal Distribution =IF(B115="","",NORMINV(B115,0,1))
    Running data set with replaced outliers with mean and on the original data produce some significant changes. So, replacing outliers with means doesn't look a reliable method to apply.
    Now, I am thinking to Winsorize my original data? Do you have any recommendation on it to not miss any single outlier?
    My data is both hugely negatively skewed and has outliers. They make it hard to figure what is the best way to do. I am think to Robust Statistics as well given my data. Any thought on that?
    Huge thanks.

  • @sasali6727
    @sasali6727 7 лет назад +1

    Wow, Thanks Gary. This is a great method. I used Log10 and square root transformations to normalized the distribution of my data. None of them worked, and my data was still negatively skewed. I used this two-step transformation and it worked great. At first, I have a hard time finding the corresponding mean of the series and SD. After looking at the paper referenced, I found out that you have two choices. You can either put 0 as a mean and 1 as a SD for the arguments of the function (e.g. IDF.NORMAL(a new ranked series,0,1) or put the mean and SD of the original series to maintain the unit of data.
    When I put the original mean and SD of the series, I worked just fine and data looked normal. However, when I replaced the parameters with 0 and 1, SPSS returned no value. Not sure why.
    I also noticed that my sample size didn't reduce by 1 after the transformation. My sample size is small (50). Are you supposed to lose one sample after the transformation? Am I doing it in a wrong way not seeing this result?

    • @gftempleton
      @gftempleton  7 лет назад +1

      So you went from standardized normal (0 and 1 parameters) to normalized (original mean and sd) and back to standardized? Did you save variables with the same names? That may be the problem. I've used the technique en masse and have never heard of that. It seems like you need to make sure you use unique variable names.
      Low sample size would help retain all records. I don't know what the threshold is. Do you know what the fix is if it does become a problem? Replace the max value (1) in the results of Step 1 with 1-(1/n).

    • @sasali6727
      @sasali6727 7 лет назад

      Given the problems with SPSS, I actually ended up using the excel formula you provided in the paper which I found much easier for data with many variables (33 in my case). It worked just fine and all my variables except one are normally distributed now (that non-normal variable looked normal too me, but Shapiro-Wilk test was significant after all ). I was able to retain all my sample (50) using the excel formula. What do you think about that?
      I also went ahead and ran outlier testing giving a g value of 2.2. To my surprise, most variables have at least two outliers. That's surprising as I replaced all outliers with the mean from the original data before running the transformation in excel. The g factor 2.2 is usually considered pretty large to retain almost all data. That's very interesting to see such a pattern. But I still trust this data with outliers better than my original non-normally distributed data unless I am missing something significant here.

    • @gftempleton
      @gftempleton  7 лет назад +1

      I perform the Two-Step in Excel often - there is no difference as far as I can tell - the same two steps are readily available.
      I would never replace an outlier with the mean! You're changing the data and may be suppressing or hiding results. If your data is sufficiently normal, don't worry about outliers. And, your data doesn't have to be perfectly normal. If it is terribly non-normal and you use non-parametrics, your data isn't normal anyway. For example, if you use Spearman's rank, the data is transformed to uniform (not normal).

    • @sasali6727
      @sasali6727 7 лет назад

      Thanks for taking time and providing the great pointers. The scale of my data is continuous. I don't remember where I saw, but replacing outliers with mean was explained as a reliable method.
      Unfortunately, my data is pretty skewed, having skewness value of .9 or something like that. My sample size is also pretty small (45), so, not missing even one sample is important.
      What I would do, I would put back all outliers in their place and run everything all again. That's a pain, but I am curious to see the difference.

    • @sasali6727
      @sasali6727 7 лет назад

      Gary,
      I ran several times the procedure on both SPSS and EXCEL using
      the same data set. Apparently, the outputs are inconsistent. Not sure
      what might cause the difference. I double checked the formula as well
      described on your paper.
      Here is the excel formula:
      To get
      the percent rank
      =IF(B4="","",IF(PERCENTRANK(B$2:B$50,B4)=1,0.9999,IF(PERCENTRANK(B$2:B$50,B4)=0,0.0001,PERCENTRANK(B$2:B$50,B4))))
      To get the inverse of the Cumulative Normal Distribution =IF(B115="","",NORMINV(B115,0,1))
      Running
      data set with replaced outliers with mean and on the original data
      produce some significant changes. So, replacing outliers with means
      doesn't look a reliable method to apply.
      Now, I am thinking to Winsorize my original data? Do you have any recommendation on it to not miss any single outlier?
      My
      data is both hugely negatively skewed and has outliers. They make it
      hard to figure what is the best way to do. I am think to Robust
      Statistics as well given my data. Any thought on that?
      Huge thanks.

  • @l.briant3537
    @l.briant3537 6 лет назад +2

    Great video Gary, thank you. Just like below, I have some of my variables having an "out of range" error:
    >At least one of the arguments to the IDF.NORMAL function is out of range. The
    >first argument (probability) must be positive and less than one. The third
    >argument must be positive. The result has been set to the system-missing
    >value.
    Why is this the case?

    • @sepiahell1417
      @sepiahell1417 2 года назад +1

      same :( can anyone help plsssssss

  • @MrFantastic161
    @MrFantastic161 8 лет назад

    Thank you so much for this! I'm currently doing my dissertation and the non-normal data kind of shot me in the foot for the proposed analytical methods. Much appreciated!

  • @kanya1998
    @kanya1998 8 лет назад +1

    much appreciated Mr. Gary, it works perfectly well!

    • @gftempleton
      @gftempleton  8 лет назад

      +mahirwe anthony
      Great and good luck!

  • @fatosakbulut3171
    @fatosakbulut3171 4 года назад +1

    Hi Mr.Templeton. I have 8 groups of data to analyze. Some of them are normal and some of them are not. Should I implement your method for all of the groups in order to compare them in one -way Anova analysis? Please please help...

  • @saro4761
    @saro4761 7 лет назад

    Thanks Gary for this absolutely great video.

  • @dashama
    @dashama 9 лет назад

    Loved your video!
    Blessings and Love,
    Dashama

  • @Afra.Rezagholizadeh
    @Afra.Rezagholizadeh 4 года назад +2

    I did all the steps several times and used my own data series' mean & standard deviation, his numbers , and 0,1 but every time this error appears:
    >At least one of the arguments to the IDF.NORMAL function is out of range. The
    >first argument (probability) must be positive and less than one. The third
    >argument must be positive. The result has been set to the system-missing
    >value.
    what should I do?
    when I used 0,1 and my numbers despite of this error series of new data appeared as normalized data but I don't know they are reliable or not...

    • @gftempleton
      @gftempleton  4 года назад

      Did you use three arguments? The syntax for the second step requires 1) the result of step 1 (this is in fractional or probability form, 2) mean, 3) standard deviation.

    • @Afra.Rezagholizadeh
      @Afra.Rezagholizadeh 4 года назад

      @@gftempleton thanks for your answer. yes I did. for both data series this error came up but also new columns was added to my spss worksheet! I'm gonna use them but considering these errors I don't know how reliable are they...

  • @franciscosanchez-narvaez9474
    @franciscosanchez-narvaez9474 5 лет назад

    Thank you Gary, your tutorial is very clearly and helpful

  • @aishasebunya2675
    @aishasebunya2675 4 года назад

    Gary, thank you so much. This is awesome. All other methods failed for my work. I really appreciate this and will of course cite you :)

    • @gftempleton
      @gftempleton  4 года назад

      I'm glad it worked for you, Aisha. Good luck on your research.

    • @alice-nckucsielee8265
      @alice-nckucsielee8265 3 года назад

      @@gftempleton Thank you so much but a lot of people are asking for the true of mean and STD mystery XD

    • @gftempleton
      @gftempleton  3 года назад

      @@alice-nckucsielee8265 I'm not sure I understand your question. Units are interpreted as "normalized x." I hope that helps.

    • @alice-nckucsielee8265
      @alice-nckucsielee8265 3 года назад

      @@gftempleton Hi Gary, thanks for replying. I meant the mean and the standard deviation we have to put into the quote at 1:38. What should we put into it? the original mean and STD or after step1 transformation mean and STD :)?

  • @pauls1571
    @pauls1571 7 лет назад +2

    Hi Gary. First, thanks for your informative video. I was dealing with a few very non-normal distributions, and this method worked wonderfully in normalizing the data. That said, I have one question for you, the answer to which I cannot seem to figure out. Namely, in the video description, you note that, "This approach retains the original series mean and standard deviation to improve the interpretation of results." However, I have not found this to be the case. Although the means and SDs for the transformed variables are quite similar to he original series' means and SDs, they are not perfectly retained. At least, this was true in cases where a value of 1 was generated after completing the first (fractional rank-order step), even when I used the formula, to replace the 1, that you mentioned in your response to a comment, below (i.e., replace the 1 with 1-(1/n)). Any clarification here would be much appreciated. Thanks again for your informative video.

    • @gftempleton
      @gftempleton  7 лет назад

      Another reason they aren't exactly the original mean and standard deviation is because of inflated frequencies (stacks of the same value) that are some distance from the mean. If there were no 'same values" in the dataset, the resulting mean and standard deviation would be the exact mean and standard deviation and the original set. The approach "tries" to do that at least.

    • @gftempleton
      @gftempleton  7 лет назад

      1-(1/n)) is a close approximation that allows researchers to lose a record. Consider it part of the procedure - just like the first two steps. You may be right, that may cause the mean and standard deviations parameters to vary slightly. I don't think it would affect interpretations much. Sample size is a bigger issue in a lot of cases. This should be up to the researcher to decide.

  • @lanuit9733
    @lanuit9733 4 года назад

    Thank you so much for this video!!! You saved my life! 감사합니다. Thanks again!

  • @oliviapenaramirez4379
    @oliviapenaramirez4379 4 года назад +1

    Thanks Gary. But every time I run the trasnformation, this error appears.
    >At least one of the arguments to the IDF.NORMAL function is out of range. The
    >first argument (probability) must be positive and less than one. The third
    >argument must be positive. The result has been set to the system-missing
    >value.
    What should I do?
    Where do I copy the mean and the standard deviation?
    Thanks in advance

    • @gftempleton
      @gftempleton  4 года назад +1

      Step 2 will not work on 0's or 1's. If the problem is a 0, convert it to 1/n as an estimate. If the problem is a 1, convert it using 1-(1/n) to estimate.

  • @monicaalas4421
    @monicaalas4421 5 лет назад +1

    Once we have normality, how can I run a regression with the original data (taking into account the normalized data) so that I can use it in my predictive model?

    • @riderho1
      @riderho1 4 года назад

      Monica Alas hi monica,mind sharing how did u obtain the predictive model ? And the criteria taken into consideration like Correlation matrix,etc?

  • @289993
    @289993 8 лет назад +4

    By series mean and standard deviation, do you mean the mean and SD of our original variable or of the fractional rank? Sorry if this is a silly question! I am a bit lost and am running out of time. Thank you!

    • @gftempleton
      @gftempleton  8 лет назад +6

      Original variable - this will retain the original units of analysis.
      You can do that or use 0 for the mean and 1 for the standard deviation. This will standardize all your variables so they have equal weighting.

    • @genniebaello1712
      @genniebaello1712 5 лет назад

      @@gftempleton hi! would it greatly affect ANOVA if I use the original units or just 0 & 1 for mean and sd?

  • @mohammedkhalid9799
    @mohammedkhalid9799 6 лет назад

    THANK YOU, Mr. Gary

  • @abigailfulton9185
    @abigailfulton9185 5 лет назад +1

    Where do you get the series mean from and the standard deviation? Please can anyone help!

    • @gftempleton
      @gftempleton  5 лет назад +2

      Calculate the mean (average) and standard deviation from the original data. Use those in the second step if you want to approximate original units.

  • @NoeWanKenobi
    @NoeWanKenobi 7 лет назад

    Thank you so much for you easy and helpful explanation! You really saved my life (and thesis, which are the same thing right now) :P

  • @linyuliao3417
    @linyuliao3417 6 лет назад +1

    Thank your for sharing this video. Can I ask a question. How is the first step related to the second step?

  • @mostafajerari7560
    @mostafajerari7560 2 года назад

    Thank you for your effort.
    I would like to know how to achieve normality of several variables at once (not one by one).
    Thanks for another time.

  • @amalhussein9960
    @amalhussein9960 4 года назад

    Thanks Gary Templeton for this informative video. After doing the two steps how can we interpret the output of the regression analysis

    • @gftempleton
      @gftempleton  4 года назад +1

      Not original units, but normalized units
      Example
      If you transform assets to normal and put it in an equation, it is interpreted as normalized assets
      It's the same as with any transformation

  • @TheEmanuelDaniel
    @TheEmanuelDaniel 4 года назад

    Thank you! Very useful and clearly explain.

  • @fatimabezbiq4212
    @fatimabezbiq4212 Год назад +1

    We make transformation only for dependent variable? Or for all variables of our model?

    • @gftempleton
      @gftempleton  Год назад

      There is no rule when you are trying to satisfy the assumptions of the test. Only you should report all procedures.

    • @fatimabezbiq4212
      @fatimabezbiq4212 Год назад

      @@gftempleton Thank you

  • @ellieseager589
    @ellieseager589 2 года назад

    You just saved me. Thank you!

  • @madiharazzam1098
    @madiharazzam1098 6 лет назад +1

    this is not for Lickert Scale. How to transform data of Lickert Scale.please help

  • @andrifadillahmartin8074
    @andrifadillahmartin8074 9 лет назад

    Thanks Gary.. Very much appreciated

  • @rraj3167
    @rraj3167 3 месяца назад

    This is great!!

  • @OmisileKehindeOlugbenga
    @OmisileKehindeOlugbenga 7 лет назад

    Thanks a lot for saving my say. But you did not mention initially that I would need a 1-(1/n) transformation before the final inversion. Thanks all the same.

  • @steven-el3sw
    @steven-el3sw 4 года назад

    Thank you for this video Gary, it was very helpful.
    Just to clarify: we are using the mean and standard deviation from the original, non-normal data...correct?

    • @gftempleton
      @gftempleton  4 года назад +1

      That is one option; the other is to standardize, using mean=0 and sd=1

  • @fizzaabidi.3094
    @fizzaabidi.3094 10 месяцев назад

    Everything is ok but after using this method it give me outlier what should i do

  • @lllv1989
    @lllv1989 8 лет назад

    This is an amazing method. I'm wondering if there's an added value to winsorizing or otherwise capping variables before the transformation. Some of the variables I have clinical variables for which have a case or two have extreme outliers and are also non-normal. Using the means and standard deviations for these variables seems a little weird to me because the Ms and SDs before winsorizing don't seem within the range of values usually seen in my patient population. If I winsorize before transforming, the Ms and SDs seem a little more representative... Am I completely off here?

  • @mahanesti3990
    @mahanesti3990 5 лет назад

    Hi Mr. Templeton, thank you for your transformation into normality method. What we should call this transformation method ?

  • @DrJolly
    @DrJolly 9 лет назад +1

    Thanks, extremely helpful!

  • @gershomhabile7215
    @gershomhabile7215 2 года назад

    Very useful information but I'm getting lost where you are copying the mean and standard deviation as such I'm stuck. Kindly help where to copy the mean and the standard deviation. You only mentioned that you copy from your notepad, but what about me, where do I copy from? I'm stuck, someone help asap please.

  • @zeric_raiz
    @zeric_raiz 7 лет назад

    Thank you very much for such an instructional article and for the follow up video. I've been able to normalize my data following your method but still have a doubt.
    I've a non-normally distributed variable "SOM" (metric) which assumes values for the years 2002 and 2012 (nominal variable named SAMPLE only assuming the value '1' for 2002 and '2' for 2012 collected samples).
    I'm now able to 'globally' normalize my SOM variable with the 'Two-Step Transformation' BUT when I now do a ANALYZE --> DESCRIPTIVE STATISTICS --> EXPLORE with a split file or a 'Factor List' by the variable SAMPLE and re-analyze the normality tests I state that only the 2012 samples are normally distributed with the 2002 not being normally distributed and don't know how to resolve this. I'm stating this particular case but I also need to split my variable SOM even further (i.e. "date collected AND soil type" or "Date collected AND soil type AND cultivation system", etc). Or is this a non-issue because the 'global variable SOM' is now normally distributed?
    I'm having this issue recursively and simply cannot find the answer to this problem. If you find the time to enlighten me on the issue it will help a lot. Thanks anyway for such a great transformation.

  • @user-xk4ho7ph3o
    @user-xk4ho7ph3o Год назад

    Hello, I followed the steps above. And one of Fractional rank value was 1, and it would be a missing data(no data shown) after the transformation .
    I don’t know how to solve the problems 😅
    Look forward to your reply!

    • @user-xk4ho7ph3o
      @user-xk4ho7ph3o Год назад +1

      I got that answer in previous comment! Thanks!

  • @radina3737
    @radina3737 6 лет назад

    First of all thank you very much for this approach saves me a lot of time and effort.
    My question is: I have a dependent variable measuring "click intention" which can be measured from 0 to 100. After normalizing the data however I get 3 negative results and 2 above 100. Is it acceptable to keep it this way?
    Thank you very much!

  • @haziziesa4534
    @haziziesa4534 2 года назад

    Dr Templeton - the mean and SD where come from? Do you get it from the original data (not normal one)

    • @gftempleton
      @gftempleton  2 года назад

      Yes - both the mean and SD come from original data.

  • @fong9615
    @fong9615 9 лет назад

    wow nice techniques! Thanks! I have one question. Can I use this as for an already normal data because I am using paired sample t test and i think that this should be applied in both data for equal comparisons. And another more, does this techniques is great for observation? because it seems so perfect. Thanks!

  • @Belcebub69
    @Belcebub69 7 лет назад

    Thanks, Gary, it is very helpful what you have done. Are you maybe aware, if there's any critical peer review or papers out there regarding this method of yours?
    Thanks, for the answer.

    • @gftempleton
      @gftempleton  7 лет назад +1

      This paper has been peer reviewed. It will be published in print in early August:
      aaajournals.org/doi/abs/10.2308/isys-51510?code=aaan-site

  • @elijahd.spragueph.d8905
    @elijahd.spragueph.d8905 8 месяцев назад

    Can these steps be used after taking ordinal questions and converting them to scale in SPSS?

  • @j-m.s.6646
    @j-m.s.6646 6 лет назад

    Would using this method after the fact be considered a linear-linear regression, a log-log or what? Also, would transforming the variables post-processing back to say a scale from one to ten be considered good practice in terms of easing interpretation?

  • @abhigilli
    @abhigilli 5 лет назад

    Thanks a ton for this! This was of great help! I had a quick question, and I would be really grateful if you could help me out with this.
    Tried this on 7 of my variables. 4 of them got transformed, but 3 of them still haven't. Does this mean that they cannot be transformed or is there another way to do this? The sample size is 50, and I am using the Q-Q plot and Shapiro Wilk to test for normality.
    Thanks in advance, Any help would be greatly appreciated. Thanks again!

  • @risausa4796
    @risausa4796 2 года назад

    Hi Gary!
    Thanks for this video.
    Where did you get the value for the MEAN and STANDARD DEVIATION?

    • @gftempleton
      @gftempleton  2 года назад +1

      Two options: 1) the original variable mean and standard deviation or 2) 0 for mean, 1 for standard deviation (z-scores).

  • @schummanr
    @schummanr 8 лет назад +1

    Thanks for the video and the reference, Prof Templeton. When computing the Fractional Rank of some of my variables I end up having a value =1 (highest value on that variable), which then creates an "out of range" error on the IDF.Normal function as the range of values it accepts is 0 to less than 1. This does not happen with all variables, but just some. Any hints as to why this happens and how to address this in this transformation to normality would be appreciate it. Thanks

    • @amadeo3844
      @amadeo3844 6 лет назад

      I too would like to know how to correct this problem, as this results in missingness in the data that I would like to avoid.

    • @l.briant3537
      @l.briant3537 6 лет назад +1

      Hi Fidel Vila,
      I think I've worked it out. I think there must be some rounding errors, which means that the probability (first argument) ends up being interpreted as being out of range (it has to be within 0 and 1).
      I'm not sure how this happens (perhaps the calculations for the mean and SD need to be to more significant places, but I've tried this and it doesn't remove the error), but I have worked out a fix which is a bit of a "fudge":
      Say you have a variable X to be normalised, with mean MEAN and standard deviation SD. Lets suppose you have conducted the fractional rank and made a variable RX. You then do the following:
      >COMPUTE X_norm=IDF.NORMAL(RX/1.001,MEAN,SD).
      >EXECUTE.
      Dividing RX by 1.001 ensures that the variable is kept within the allowed range. (Although I repeat: I am not sure why it is interpreted as being out of range - as far as I can see, my variables all fall into 0 and 1, so it must be to do with rounding errors for the mean and SD calculations).
      Hope this helps!

    • @ellieking4132
      @ellieking4132 4 года назад

      @@l.briant3537 Thank you SO SO much for this! I was genuinely despairing about the method not working, and your solution worked perfectly!!!

  • @afaquehussain7678
    @afaquehussain7678 4 года назад

    Qno.1
    I have 3 dependent variables. Two of them are in range of normal skewness value i.e. +1 to -1 and have kurtosis in range of +3 to -3, but the third remaining dependent variable is not in normal range of skewnes or kurtosis. I want to transform that variable with square root transform to run parametric tests. So the question is, Can I transform that one variable only and run parametric test on the variables or I should transform all three variables before doing test? should I transform all three variables together even the two of them are already normally distributed? will it create problems to transform only one non normal variable?
    q.no.2
    Can I infer and interpret my data for normality on the basis of skewness and kurtosis only rather than gooing for shapiro wilk test?

  • @farhaniqbal3421
    @farhaniqbal3421 4 года назад

    Thanks Gary. One Question is about the shapiro test.Once I transformed my variables, all improved in terms of skewness and kurtosis. However, the shaprio-wilk test still shows non-normal distribution (p

    • @gftempleton
      @gftempleton  4 года назад

      Feel free to try other transformations (e.g., natural log, power transformations, truncating, winsorizing). However, that is time consuming. If you can find a statistical package that uses Box-Cox, which is tests many different power options, that may be a good use of time.
      However, reviewers of your work could also tolerate that you attempted a normality transformation that improved the situation. Worst case, you'll have to use non-parametric procedures (which, coincidentally utilize transformations - usually ranking).

  • @pungozeng3860
    @pungozeng3860 8 лет назад

    Thank you and could you specify the series mean and std. dev are of the original var (i.e.: the market cap), right?

  • @davine1301
    @davine1301 3 года назад

    Im late for 6 years ago, i love you sir

  • @naghmesheibani3877
    @naghmesheibani3877 2 года назад

    awesome awesome awesome

  • @ibrahimsaid28
    @ibrahimsaid28 6 лет назад

    Thank you for posting it; then Which method is better to normalize data ?; and what if all methods (log; ln; sqrt; trunc fail to normalize my data?

  • @ker329
    @ker329 Год назад

    Hi Gary. I followed all steps but I got warning 4940 at least one of the argument in idf normal function is out of range! would you know why?

  • @linduchyable
    @linduchyable 8 лет назад

    i saw this video before and it was helpful i followed your steps and the results as it occured i posted them to you in the first comment
    i posted the result at your friend's channel James Gaskin and he recommended this video for me pleas help i don't know what to do:(

  • @farahjuana
    @farahjuana 2 года назад

    Hi, I just found your video and it's very helpful for me. May I know if this technique has any pros or cons. Is the method suitable to any continuous data?

    • @gftempleton
      @gftempleton  2 года назад

      It is applicable to any numeric data. Of course, its efficacy will vary like any transformation. It attempts to achieve statistical normality, unlike other transformations.

  • @hidaghasemi6538
    @hidaghasemi6538 Год назад

    Very helpful,Thank you

  • @sameeral-abdi6870
    @sameeral-abdi6870 9 лет назад

    Thanks Gary
    It is just awesome. I have one inquiry: How can I transform back from the step 2 to the original data? For example, After I did the two steps I got I mean of (-0.007).So, How am I going to report that?!

  • @mouradelhanafii8198
    @mouradelhanafii8198 4 года назад

    In case we need to describe this procedure in data analysis or results, what should we mention exactly along with the reference?

    • @gftempleton
      @gftempleton  4 года назад +1

      If you are asking about units, just say the results are in normalized units. Of course, you would explain you used the Two-Step in the methods. Models using original and transformed (e.g., natural log or Two-Step) are separate models (i.e., different error terms) and should be interpreted differently (this is not so obvious to some but models are commonly treated distinctly).
      Regarding the method, step 1 is simply a fractional rank and step 2 is the application of the inverse normal function applied to the results of step 1.

    • @mouradelhanafii8198
      @mouradelhanafii8198 4 года назад +1

      Gary Templeton THANK YOU SO MUCH, indeed.

  • @SHSATNewYork
    @SHSATNewYork 5 лет назад

    This two-step process has three step major issues.
    One: by using ‘Function group,’ Inverse DF,’ and data from notepad such as ‘series mean and SD’ you transformed the data. Can we use 0 and 1 as series mean and SD as you claim in your paper?
    Two: after transforming, you will reduce sample size by 1. That means if you have five
    Three: We have another major issue with this transformation. Now, you can do statistical tests on the transformed data and here is the big problem. Reporting mean and standard deviation in the 'transformed unit' is not the purpose of almost every research.
    How do you back-transform your result after this ‘two-step transformation’ to explain the results in original data? If you cannot back-transform the results, this is not acceptable in research. Please explain how to deal with all these three issues, thank you.

    • @gftempleton
      @gftempleton  5 лет назад +1

      The three items you list are in no way "major issues:" 1) simply standardizes the data. Another option is to use the original series mean and standard deviation; 2) there is a simple fix - impute missing value using 1-(1/n); 3) this is an issue with every transformation, including the natural log and probabilities; the Two-Step is the only approach that allows the researcher to use the original mean and standard deviation as arguments to the result will emulate the original units.

  • @rockfortpete
    @rockfortpete 8 лет назад +3

    What do you mean by series main?

    • @gftempleton
      @gftempleton  8 лет назад

      +pete entjade Mean=average - the average of all values in a variable

    • @Davao420
      @Davao420 6 лет назад +1

      the original Marketvalue or the fractionally ranked one?

  • @m.roussel1757
    @m.roussel1757 8 лет назад

    Thank you for this helpful video.
    I was wondering.... what is the use of the bootstrap option in Amos ? is it better to perform a transformation or is bootstrap sufficient when performing a CFA ?

    • @gftempleton
      @gftempleton  8 лет назад

      +Miriam Roussel This would make the subject of a good research paper. I believe bootstrapping has its own weaknesses, as does any transformation. I prefer using normalized, real data.

  • @porscheboddicker1443
    @porscheboddicker1443 8 лет назад

    Why do we need to do 2 steps? Can't I just use the fractional rank? For example, my BMI variable was skewed and we wanted to do a GEE with that. Can't I just use the fractional rank as my new BMI?

  • @aminfarzaneh8142
    @aminfarzaneh8142 8 лет назад

    Thank you so much for the method. I can not nomalize my data. Can I have the data set you used?

  • @nicolasvanderlinden8569
    @nicolasvanderlinden8569 4 года назад

    Hi, thanks for this video. I do have a clarification question. I'm not sure what to use as second and third arguments in the Idf.Normal function. In your 2011 paper, I read that, by default, one can use 0 and 1, respectively, for the desired mean and standard deviation but, in the video, you mention a series' mean and standard deviation. Is it the mean and standard deviation of the (fraction rank) transformed variable?

    • @nicolasvanderlinden8569
      @nicolasvanderlinden8569 4 года назад

      OK. Got it. I found the answer in the video at 2'34.

    • @Afra.Rezagholizadeh
      @Afra.Rezagholizadeh 4 года назад

      @@nicolasvanderlinden8569 so we should use the numbers that he uses instead of our own mean and standard deviation? what about 0 & 1? I'm so confused...

  • @jayaprakashsalian1804
    @jayaprakashsalian1804 6 лет назад

    how do we get the value back from transformed data i.e after i perform transformation i do regression using normal value now after the result i need to know how to get the actual data from transformed data. For logarithmic transform we use the base to get the value back how do we do it here

  • @ensarifadi457
    @ensarifadi457 6 лет назад

    Dear Gary,
    Can you please tell me what are the implications of using this technique on likert scale. for instance, I have used a likert scale in which 1 is strongly disagree and 7 being strongly agree. Does it inverse the relation or what ? Thanks

  • @minaorang5094
    @minaorang5094 5 лет назад

    Thank you for the video and your paper! I used the two-step method for my non-normal data, and all turned to normal distributions! The only concern remained is that if I am allowed to use this method for my data, which are drawn based on 4- and 5-point Likert scales??! I read at your article to use this method mostly for higher levels (up to 100)! I would appreciate it if you could tell me whether I can use this method for 4- and 5-point Likert scales or not! Thanks in advance!

  • @eda1976bdy
    @eda1976bdy 8 лет назад

    Yes it is a great explanation but i have try on my own data but unable to get normality..i've used log10 and sqrt. the results still the same...a bit changes but no changes on normality. what to do. ple advice.tq

  • @mouradelhanafii8198
    @mouradelhanafii8198 4 года назад

    Great technique, indeed! Thanks alot. Is that the mean and std deviation of the original variable?

    • @gftempleton
      @gftempleton  4 года назад +2

      That is an option, Maurad. The other basic option is mean=0, sd=1 to make the variable standard-normal.

    • @mouradelhanafii8198
      @mouradelhanafii8198 4 года назад

      Gary Templeton Thank you.

  • @RusiantiSugio
    @RusiantiSugio 5 лет назад

    Thank you for your nice video

  • @chenlin920
    @chenlin920 7 лет назад

    Very helpful video!

  • @oumelkhirmoulay1416
    @oumelkhirmoulay1416 Год назад

    Thank u very much for this video pkease i have a question about the nature of this transformation i want to write a sentence to explaine a methods of transformation like this : "the data were arcsin transformed" please if i use this method RV.Normal. what i say

    • @gftempleton
      @gftempleton  Год назад

      Just say it was transformed to normal using a two-step procedure described in Templeton (2011).
      The full reference is at the end of the video.

  • @mosun6390
    @mosun6390 3 года назад

    Great one!!!

  • @fatihaelagri7753
    @fatihaelagri7753 3 года назад

    Hello
    My question is, i have a serie whose distribution does not follow the normal distribution, I tried the logarithmic transformation on Eviews but the p-value of jarque-bera is always lower than 0.05
    So what transformation to do in Eviews?

    • @gftempleton
      @gftempleton  3 года назад

      Eviews has each step. The first is a fractional rank (rank represented in proportions) and the second is a normal inverse function. It appears "Normal (Gaussian)" is showing you this here:
      www.eviews.com/help/helpintro.html#page/content/mathapp-Statistical_Distribution_Functions.html

  • @diananazaryan
    @diananazaryan 9 лет назад

    Thank you soooooooo much! Particularly for the decent reference. Couldn't download the article, though, any other sites I can take it from?

  • @engysaeed1379
    @engysaeed1379 5 лет назад

    thank you so much professor: Gray its amazing video, but i wanna ask a question to you to ensure it and used this method in my paper, if i fitted a gamma model and extract the (Residuals) ind i wanna use this residuals but its non normal distribution if i follow up this steps in this video and changes the values of the data its applicable to use it in my paper as the same residuals ??

    • @gftempleton
      @gftempleton  5 лет назад

      Why not? Use it as you would any other transformation.

  • @researchory
    @researchory 7 лет назад

    Thanks Gray. How to interpret coefficients after converting dependent variable using IDF.normal function? for example, if one unit increase in the independent variable, how it is affecting the dependent variable?
    Thanks,

    • @gftempleton
      @gftempleton  7 лет назад +2

      Assuming you transform using the series mean and standard deviation, Interpret exactly the same as you would original units. I would note that you normalized the original units.
      Alternatively, you can transform using mean=0 and sd=1 and interpret as standardized normal original units.

  • @jayaprakashksalian
    @jayaprakashksalian 6 лет назад +1

    Hi,
    I just have one doubt if we want to convert back to absolute value how can we we do that.. for example i have regression model and i converted the dependent variable and now i want to see what will be the absolute value of y

    • @gftempleton
      @gftempleton  6 лет назад +1

      It depends on the units you use. There are two basic uses of the Two-Step: 1) convert to standardized units (use mean=0 and sd=1 in the second step) or 2) convert to normalized original units (use original series mean and sd). So, the interpretation depends on usage. If you use the second step, there is no reason to convert back as you are in the original units (just normalized).

    • @jayaprakashksalian
      @jayaprakashksalian 6 лет назад

      Gary Templeton thanks a lot

    • @jayaprakashksalian
      @jayaprakashksalian 6 лет назад

      Just one more question to be sure, so after applying this method for transformation the regression equation remains same i.e y=b0+x1*b1+x2*b3 and it doesn't change like it changes when we do log transformation.

    • @gftempleton
      @gftempleton  6 лет назад

      Reverting back isn't necessary if you transform using the original series mean and standard deviation. You are already in the original units.
      Also, remember that using the exponential function to revert back to original units from logged units is problematic when some original values are negative. In that case, the natural log would produce missing values. To avoid this, researchers will shift the values so none of them are negative, then do the natural log transformation. This means reverting back using the exponential is useless, unless the preconditioning is reversed appropriately. The natural log has many flaws and is inferior to the Two-Step in achieving normality and achieving significant results. See:
      Templeton, G.F. and Burney, L. 2017
      . “Using a Two-Step Transformation to Address Non-Normality from a Business Value of Information Technology Perspective,” Journal of Information Systems, Vol. 31, No. 2, pp. 149-164.

    • @jayaprakashksalian
      @jayaprakashksalian 6 лет назад

      I am really thank full for your response.. Thanks a lot

  • @alice-nckucsielee8265
    @alice-nckucsielee8265 3 года назад +1

    I found that when my variable transformed through the fractional rank and those became 1 will be blank after transformed to IDF.normal, anyone encounters the same situation as I did?

    • @alice-nckucsielee8265
      @alice-nckucsielee8265 3 года назад

      I found that those are the top (the biggest) values in their group.

    • @gftempleton
      @gftempleton  3 года назад +2

      Convert any missing value resulting from any Step 1 result of "1" to 1-1/n. Then, apply Step 2. Report this in your methods.

    • @alice-nckucsielee8265
      @alice-nckucsielee8265 3 года назад

      @@gftempleton I think I got it!!!! Thanks for your kindness in replying so fast, I do need some help quite urgent.
      You are awesome!!!!

  • @spz145
    @spz145 8 лет назад

    Hi, thanks for the share. I tried the method, and it works to normalize the dataset, however, why the sample size is reduced after the procedure? For example, why the sample size reduced from 6843 to 6842 above? Would that affect the conclusion?

    • @alauddinmohammad1517
      @alauddinmohammad1517 6 лет назад

      Please give us a reasonable answer why the sample is decreasing after two-step process. Why the missing figure is coming? How can we interpret this problem in research paper? Thank you in advance.