Test of normality and data transformation in SPSS

Поделиться
HTML-код
  • Опубликовано: 11 сен 2024
  • A short video demonstrating how to test whether a collection of data can be statistically distinguished from a sample drawn from a normal distribution. This test is carried out in SPSS. The video also shows the user how to log transform their data and then test whether this increases the degree to which these data approximate a sample from a normal distribution. Why do this? Well, a lot of commonly used statistical tests require the dependent variable to approximate a sample from a normal distribution.

Комментарии • 101

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    If you are testing whether B1 and B2 are significant predictors of Y with a multiple regression, it is required that Y approximate a sample from a normal distribution. So you would be interested in testing Y to see if it approximates a normal distribution, and then, if it is not normal but a ln transformation improves this, you would test LnY=B1+B2. I hope that helps

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    Yes, this was a mistake on my part when I made the video a long time ago. I'm working on fixing this. Thanks for letting me know. The value should indeed be greater than 0.05 to be accepted as representing a sample from a normal distribution

  • @DrBSchamp
    @DrBSchamp  11 лет назад +1

    Hi, yes, this has come to my attention. I must have missed that when I made the video. This process doesn't actually solve the non-normality of these data. You do want a value of greater than 0.05.

  • @DrBSchamp
    @DrBSchamp  12 лет назад

    Hi there: I'm glad you found it helpful. I'm slowly building my set of videos on this, but I have to admit that PCA isn't high on my list though. Good luck!

  • @DrBSchamp
    @DrBSchamp  12 лет назад +1

    In this case, log transforming your data did not make it more closely approximate a sample from a normal distribution. As such, you can try a different transformation, or you can use a non-parametric test that doesn't require your dependent variable to approximate a sample from a normal distribution. Log transforming your data does appear to be improving things. I would consider using a log transformation that is not base 10. Maybe a higher base. Good luck.

  • @DrBSchamp
    @DrBSchamp  13 лет назад

    It certainly should, although it's not always possible to transform data to make it approximate a sample from a normal distribution. For the data used in the example, that's the case

  • @HealthbeautyluckyshahBlogspot
    @HealthbeautyluckyshahBlogspot 5 лет назад

    I have this unique problem whatever test I apply my skewness decreases to an acceptable range of less than 2 and kurtosis less than 7 but Shapiro walk test and Kolmogorov-Smirnov(test of normality) still comes as significant. can i do this like transform score and mention this range of skewness as test of normality than test of normality?

  • @bisquicko
    @bisquicko 12 лет назад

    Thanks DrBSchamp you cleared up a lot of confusion!

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    It should absolutely. If you look through the comments, you'll see this is a common refrain. This was a mistake I made when I made the video some time ago. The method is correct, but it's true, this particular transformation did not make these data sufficiently normal

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    Try a square root transformation. You may have to subtract from a constant (NEW X=SQRT(K-X) Where K is a constant that each score is subtracted from so that the minimum score is 1. This constant is typically the largest number in the data set plus 1. I hope that helps.

  • @marvntreyhd
    @marvntreyhd 11 лет назад

    Hi i have encountered a problem with my data set. I am looking at curvature values and some of these have negative values and most read - 0.74 or -0.456. Now i have already run a test for normality and the values were not normally distributed and when trying to transform them towards normality.
    I am having trouble tranforming my data i keep getting a message on SPSS "The argument for the log base 10 function is less than or equal to zero on the
    >indicated command." help!!!

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    My apologies, I didn't understand your message and thought you were talking about your own data. I see now that you're talking about my video. Yes, this procedure didn't normalize my data set sufficiently. I mistook the 0.012 value for 0.12. You most definitely want the P-value to be above 0.05.

    • @mariayactayofelix
      @mariayactayofelix 5 лет назад

      Hey Mr. Brandon! what about function "IDF.NORMAL(prob, mean, stddev) to creat a normalized data? or what will you be your advice if Log10 does not work?

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    I'm glad it was helpful. I'll try to do more videos this year.

  • @kamalpreetrakhra8071
    @kamalpreetrakhra8071 9 лет назад +5

    Hi,
    what else can e done if neither log10 or loge transformations do not return a normally distributed data.
    Thanks

    • @OfisLab
      @OfisLab 8 лет назад +1

      try square root transformation

  • @DrBSchamp
    @DrBSchamp  12 лет назад

    Thanks for the comment. You can make it clearer by clicking on the gear on the bottom right of the video and selecting a higher level of clarity. The video can even be viewed in HD at full screen. It's much clearer that way. I hope that's helpful!

  • @ProTech_2023
    @ProTech_2023 Год назад

    Suppose that, you transformed pretest data and the post test is already normally distributed, now to run the analysis, shall we transform the post test as well? Or we can use transformed pretest values and actual posttest value? I am confuse. Please help,

  • @theuniverse8937
    @theuniverse8937 3 года назад

    Excellent!

  • @ProTech_2023
    @ProTech_2023 Год назад

    After the transformation, the sig for sapiro wilk is still 0.03 which is less than 0.05, how can we agree the data are now normally distributed?

  • @bansarichawada123
    @bansarichawada123 11 лет назад

    Thank You for this. Very much helpful

  • @Irishguitar65
    @Irishguitar65 13 лет назад +1

    Hello DrBSchamp,
    thanks for the great introduction into data transformation in SPSS. I just got one question. Shouldn't the significance value be above 0.05 after transformation? In your example it is still 0.012.
    Thanks

    • @letslearn3034
      @letslearn3034 2 года назад

      I am wondering the same. It is not correct, unless having it approximate (I think 0.012 is quite still far) to the 0.05.

  • @claireel-jor4149
    @claireel-jor4149 6 лет назад

    My skewness and kurtosis values are between 1 and -1 but Shapiro-Wilk test and Kolmogorov-Smernov test are significant ... help?!

  • @bansarichawada123
    @bansarichawada123 11 лет назад

    Can we do the same for binary logistic regression where we do have only two outcome?

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    I'm not sure what to recommend here, other than some reading. I found lots on Logistic regression models for ordinal variables. That said, I think there's a way to do SEM with non-parametric response variables. Good luck!

  • @manojillangasooriya3590
    @manojillangasooriya3590 11 лет назад

    Hi DrBSchamp,
    all the best

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    If you cannot transform your data to make it approximate a normal distribution, you can use a non-parametric test. Which test you use will depend on what you are trying to test with your data. Can you elaborate on what you're trying to do?

  • @firatkurt4289
    @firatkurt4289 8 лет назад

    I have two main plots with and without sulphur. Each main plot have sub-plots consisting of treatment of four different boron doses . When I do normality test for all the data (assuming all data as an one independent variable) the data distributes normally. However, when I just take sub-treatments (boron treatments) into consideration then the data do not distribute normally. Also, when I just considerate main plots for normality test, one half of data distributes normally (with S) while the other is not.
    What should I do? How should I test normality for a data consisting of different sub-categories of independent variables as explained above??? Thanks for your help in advance.

  • @Artyom109Zinchenko
    @Artyom109Zinchenko 11 лет назад

    Hi Dr. Schamp, Just a quick question. Is there a particular reason why you choose to imput the data from excel manually? I usually import it right into spss and the first row in excel becomes a header in spss. It saves a lot of time.

  • @i4tur
    @i4tur 11 лет назад

    @brandon Schamp Thank you so much for posting this. I truly am grateful :)

  • @waniey6833
    @waniey6833 9 лет назад +4

    hye i dont understand how to transform negative value for financial ratio analysis using log 10.??.when to transform log10 need value above zero right?thanx

    • @DrBSchamp
      @DrBSchamp  9 лет назад +2

      Hi, one solution to this is to simply add a constant to all your values so they are positive, and then log transform them. That's not uncommon and doesn't change the relationship between values. Good luck.

    • @waniey6833
      @waniey6833 9 лет назад

      okey thanx,by the way u can show me video or tutorial how to add a constant all value to positive?thank a lot for helping me :-)

    • @DrBSchamp
      @DrBSchamp  9 лет назад +1

      Ayunie Zuhairah
      I don't think this warrants a video. 1. Find the smallest number in your data (most negative); 2. Let's say that number is -7; 3. Add 8 to all your data points so they're all great than zero. 4. Take the log of all your data points. Good luck.

    • @waniey6833
      @waniey6833 9 лет назад

      sorry ..i trouble you... how about the data big negative let say - 454?

    • @DrBSchamp
      @DrBSchamp  9 лет назад +1

      Ayunie Zuhairah
      Then you would simply add 455 to all your data points

  • @lindseylilly790
    @lindseylilly790 5 лет назад

    I am running a MANCOVA and have 4 dependent variable.. Do i need to transform all variables before running my test? or can i just transform the 2 that are needed to ?

  • @jun189
    @jun189 11 лет назад

    hi Brandon,
    could I ask a stupid question ? What do I do after the transformation. For instance, previously the model is Y = B1 + B2. X. After transformation (say Ln) should it be Y = B1 + B2. lnX ?
    Thank you

  • @HMadhvapaty
    @HMadhvapaty 11 лет назад

    Just to confirm - we are looking for a sig. value above 0.05?
    My test shows all values ranging from .300 - .750
    Does that mean they are normally distributed?
    And if we get a negative sig. value - what would that imply?
    Thanks in advance!

  • @faizimalik5520
    @faizimalik5520 4 года назад

    Sir my data is not normally distributed now what to do to normal my data in SPSS?

  • @akshaypunde3968
    @akshaypunde3968 7 лет назад

    Hi. What if i have a categorical variable. Do i need to transform the data then?

  • @marcgarvida532
    @marcgarvida532 6 лет назад

    HELLO ,ONCE i have already transformed my data into a normally distributed data set, how will i able to use those obtained data in data analysis? for instance is in multiple regression analysis wherein an equation model will be created. What will happen to the independent variables? can dependent variable be predicted using the original data value of the independent variables??

  • @ishaui2416
    @ishaui2416 6 лет назад +1

    After the transformation, p-values were still below 0.05, why did then assume they are normal

    • @DrBSchamp
      @DrBSchamp  6 лет назад

      Yes, I added a comment to the video about that. It is true that the transformation did not alter the data so that it approximates a sample from a normal distribution. Sometimes it is not possible to normalize data by transformation, and sometimes a different transformation is needed.

  • @culichi865
    @culichi865 12 лет назад

    Hello DrBSchamp, thanks for explanation it was helpfull! But it is possible for you explain or do a tutorial about a Principal component analysis for biological data? Thanks

  • @deepikaladdu5042
    @deepikaladdu5042 9 лет назад

    hi, great video. I am having the same problem as you showed. I have many dependent variables (each will be independently analyzed). with half of the dep. var, the log transformation took care of the normality issue. however, with the other half, a log transformation didn't help. I would like to stay consistent with what transformation method I am using. do you recommend using an alternative transformation i.e., natural log or just leaving it as is?

    • @DrBSchamp
      @DrBSchamp  9 лет назад +2

      Hi there, yes, I would try different based log transformations. Hopefully one will normalize all your dependent variables. However, for a given dependent variable, as long as you're running one test per dependent variable, it really doesn't matter if you use the same transformation. For example, you may log transform one dependent variable and log(2) transform another, and that won't influence your two tests on these. Good luck.

  • @manodemetate
    @manodemetate 11 лет назад

    Hi Dr Brandon,
    thanks for the video. wouldn't angiosperms-gymnosperms be nominal data? I thought ordinal variables represented and order of some sort.
    cheers

  • @TokenFun105
    @TokenFun105 11 лет назад

    Do you have a video for D'gostino's test. I have read that this is far superior. Can this be done in SPSS? many thanks

  • @MrDaryl602
    @MrDaryl602 11 лет назад

    I have one variable which is positively skewed and one which is negatively skewed when I am transforming the data this only works for the variable which is positively skewed. is there something different I should be doing for negatively skewed data?

  • @Irishguitar65
    @Irishguitar65 12 лет назад

    @DrBSchamp
    Thanks for the answer and for your great videos!

  • @DrBSchamp
    @DrBSchamp  12 лет назад

    @bisquicko Yup, there's one on my channel, although it uses a freeware stats program called KyPlot (there's a link to the download for this small stats and graphing program with the video). Even if you're working in SPSS this might still be helpful

  • @gustavoalmeida8344
    @gustavoalmeida8344 11 лет назад

    One question. Can the transformation be made to a scale item ? (likert scale 5 points)? because I am having non-normality when agrega?te the itens in SEM (AMOS) ?

  • @teacheryenny
    @teacheryenny 9 лет назад

    I also came across some approaches that before multiple regression, one of the asssumptions is having normally distributed dependent variable and independent variables. I am confused abt this matter. is it true that IV and DV must also be normally distributed or only make sure the dependent variable is normally distributed?

    • @DrBSchamp
      @DrBSchamp  9 лет назад

      There are no assumptions about the independent variables approximating a sample from a normal distribution for multiple regression. Only the dependent variable.

    • @teacheryenny
      @teacheryenny 9 лет назад

      Brandon Schamp Dear dr Schamp, I read from laerd's website statistics.laerd.com/spss-tutorials/testing-for-normality-using-spss-statistics-2.php
      where the under factor list, the IVs are entered. I am not sure why the IVs are entered in factor list. Could you pls explain?

  • @bisquicko
    @bisquicko 12 лет назад

    DrBSchamp do you have any clips on Linear Regression?

  • @DrBSchamp
    @DrBSchamp  12 лет назад

    @bisquicko Glad you found it helpful!

  • @mandym7502
    @mandym7502 8 лет назад +2

    The video should have included the "further transformations"

    • @DrBSchamp
      @DrBSchamp  8 лет назад +2

      Boy do I wish it did! I made this a long time ago for a class I was teaching - and didn't notice that I used the wrong sample distribution. I had a log-normal distribution I had prepared, but alas...Anyway, I hope it at least gives viewers an understanding of how to use SPSS to do these tranformations.

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    You're very welcome.

  • @ennaiclud
    @ennaiclud 10 лет назад

    Nossa, você salvou meu trabalho de conclusão de curso

  • @i4tur
    @i4tur 11 лет назад

    I forgot to ask you. I have log transformed my 'skewed' data and when I run tests of normality i'm still getting a significance of .000 for both kolmogorov-smirnov and shapiro-wilk. WHY?! plz help. thanks.

  • @cgat0018
    @cgat0018 11 лет назад

    what kind of non parametric tests do u suggest please?

  • @bisquicko
    @bisquicko 12 лет назад

    Thanks DrBSchamp, I'll look it up

  • @marathapoornachander7550
    @marathapoornachander7550 10 лет назад +1

    Hi Brandon, a quick query. I have logtransformed my dependent variables and attained normal distribution. However neither i did check for normal distribution of my independent variables nor i transformed them. Now i wanted to regression analysis for my dependent and independent variables,. Was wondering if i can do regression for logtransformed dependent variable against non tranformed independt variable.
    Thanks in advance

    • @DrBSchamp
      @DrBSchamp  10 лет назад

      I believe you can. Normality requirements in regression are for the dependent variable.

    • @marathapoornachander7550
      @marathapoornachander7550 10 лет назад

      Brandon Schamp So we can perform regression for normalized dependent variable vs non normalized independent variable. But normalised values are different from raw values and if you do regressions for normalized dependent variable vs non normalized independent variable, dosent it effect R / R square value.
      Thanks Brandon

    • @DrBSchamp
      @DrBSchamp  10 лет назад

      Maratha Poornachander
      Yes. And you would certainly hope that normalizing the depenent variable would increase the R square value.

    • @marathapoornachander7550
      @marathapoornachander7550 10 лет назад

      Brandon Schamp Thanks a lot for the clarifications Brandon. Cheers

    • @teacheryenny
      @teacheryenny 9 лет назад

      Brandon Schamp I also came across some approaches that before multiple regression, one of the asssumptions is having normally distributed dependent variable and independent variables. I am confused abt this matter. is it true that IV and DV must also be normally distributed or only make sure the dependent variable is normally distributed?

  • @ihssanehistane1419
    @ihssanehistane1419 9 лет назад

    Hi, I would like to know if I have to remove the outliers before testing the normality or after ?

    • @teacheryenny
      @teacheryenny 9 лет назад

      Ihssane HISTANE I think when you do normality test, u will figure the outliers

  • @peggotty23
    @peggotty23 11 лет назад

    despite that...very very helpful. thank you :-)

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    If you're going to use a T-test, you'll need to transform both variables. Happy transforming.

  • @saharsarhan
    @saharsarhan 7 лет назад

    Hi, I did all steps but Kolmogorov-Smirnov and Shapiro-Wilk are still significant so Data sill not normal how can i solve this problem Plz? the sample = 193

    • @DrBSchamp
      @DrBSchamp  7 лет назад

      Hi there. You'll need to do other transformations on the data to try to normalize them. Different transformations will work for different data sets. Also, not all data can be normalized.

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    Sorry, I have no idea! Good luck to you both!

  • @sarvenazpetroudi7341
    @sarvenazpetroudi7341 7 лет назад

    Hi, i am a bit confused. The assumption for linear regression is that the residuals shoul show normal distribution. now my residuals do not show normal distribution and i used this transformation on the DEPENDENT variable (the book says it should be done on the y-variable). Now after doing so, i did the linear regression again and still, when i save the residuals, they are still not normal.
    What can i do?
    Whould be great, if someone could help me out!
    Best regards,
    sarv

    • @DrBSchamp
      @DrBSchamp  7 лет назад +1

      Hi there. There's no "right" transformation that makes any data normal. Other transformations may be necessary, and in some cases, transformation can't normalize your data. Good luck.

    • @sarvenazpetroudi7341
      @sarvenazpetroudi7341 7 лет назад

      Brandon Schamp wow thanks for the quick answer! So should i do some more log oder squar on the new variable i got after the first transformation? And if 2 or 3 more tranaformations did not work out, write this in the limitations of my thesis?
      Couldnt find any information about that in literature ... :/

    • @DrBSchamp
      @DrBSchamp  7 лет назад +1

      Hi there, unfortunately, it is not always possible to normalize your dependent variable. If you can't find a transformation that works, and you are interested in examining how that dependent variable changes with variation in another, you could use rank correlation. Good luck!

  • @eminememinemful
    @eminememinemful 7 лет назад

    Can do this with Ordinal data ( Likert Scale)? Thanks in advance for your answer.

    • @DrBSchamp
      @DrBSchamp  7 лет назад

      Hey there. Normalizing the data is typically done to use it in a linear model. I wouldn't. The ordinal data are essentially ranks, so you'd likely want to use something non-parametric to analyze them. That said, I don't have a lot of experience with that. This page might provide some help: davidmlane.com/hyperstat/viswanathan/Kolmogorov.html

  • @Sghrn
    @Sghrn 11 лет назад

    Id like to see the qq-plot after transformation ;)

  • @bookreader1474
    @bookreader1474 12 лет назад

    Maybe you can make your screen more visible, thanks.

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    I'm afraid I don't know the answer to that. Sorry!

  • @DrBSchamp
    @DrBSchamp  11 лет назад

    That's right

  • @ibrahimsaid28
    @ibrahimsaid28 6 лет назад

    could you please send me amended video in reply? 0.012 is still significant.

    • @DrBSchamp
      @DrBSchamp  6 лет назад +1

      If you read the below comments, you will see my response to this. A different transformation may help, but this will depend on your particular data set. If your data are right/positively skewed, log (try several bases), root, or reciprocal tranformations may help. Good luck to you!

    • @ibrahimsaid28
      @ibrahimsaid28 6 лет назад +1

      Thank you for your polite corporation

  • @andreaely1899
    @andreaely1899 11 лет назад

    I have the same doubt!!!!