Normal Probability Plots Explained (OpenIntro textbook supplement)

Поделиться
HTML-код
  • Опубликовано: 16 дек 2016
  • Our accompanying textbooks on openintro.org/books, all of which are free to download. Hard copies are also priced to be affordable for students. (We price our books in a way that we hope ensures all students can get a hard copy if they want one.)
    Topics covered in this video:
    - Probability basics
    - Disjoint / mutually independent
    - Probability Distributions
    - Complement
    - Independence and probability
    Video author, voice, and editor: David Diez.

Комментарии • 40

  • @Mahmoud-li2xn
    @Mahmoud-li2xn 3 года назад +9

    Best explanation on RUclips for this topic, thank you.

  • @navjotsingh2251
    @navjotsingh2251 Год назад +4

    I loved this video. A nice follow up, would be a video where you go much deeper into the theory and explain the math behind these kind of plots. Thank you.

  • @rffairchild
    @rffairchild 3 года назад +1

    I agree with other comments. This is the best explanation of this topic on RUclips

  • @Outlier_G
    @Outlier_G Год назад

    I can't explain how best the video was. thanks 😊

  • @Valerie-ws3zr
    @Valerie-ws3zr 3 года назад

    just what I was searching for ....... Nice job !!

  • @riccardomattea1240
    @riccardomattea1240 4 года назад +3

    Probably the best explanation video out there

  • @rishisingh6111
    @rishisingh6111 2 года назад

    Simply awesome! Thanks for shring this!

  • @allanmuganga7075
    @allanmuganga7075 3 года назад

    Thanks for the video, it's been helpful. Kudos

  • @araujopsy
    @araujopsy 3 года назад

    Very instructive, thanks

  • @gunning6407
    @gunning6407 6 лет назад +8

    Recipe for QQ-plot (quantile-quantile) in R:
    ## In R, a key observation is that the "pnorm" and "qnorm" functions are inverses of each other.
    ## To construct a QQ-plot of N observations (random samples here):
    ##
    ## Number of observations
    nn

    • @RobvanMechelen
      @RobvanMechelen Год назад +1

      Here is exactly what I was looking for. Thank you very much!

  • @vladimirtorres1181
    @vladimirtorres1181 2 года назад +1

    Very useful!! Thank you

  • @kittyxing
    @kittyxing 3 года назад

    Thanks for the video. How to generate the line for non-normal distributed data? I can understand that for the normal distributed data, the line has slope of STD and intercept of mean, then the x axis value is z score and y axis value is the actual data value. But how about the non-normal data set? how exactly to calculate the x axis value for each data point? how to calculate the y values for the straight line?

  • @mustafizurrahman5699
    @mustafizurrahman5699 Год назад

    Simply splendid

  • @aCllips
    @aCllips 5 лет назад +6

    Right vs. left skewness is depicted the opposite way. The picture on the left is skewed to the left, and the picture on the right is skewed to the right.

    • @OpenIntroOrg
      @OpenIntroOrg  5 лет назад +2

      Are you talking about the plots at about 8:30? The left plot has fewer observations strung out at higher values, which corresponds to right skewed (skew goes in the direction of the long tail). The reverse is true for the plot on the right.

    • @aCllips
      @aCllips 5 лет назад +1

      @@OpenIntroOrg Thanks for the response. I am sorry, I was wrong. It seems, one cannot decide skewness from the histogram which could be drawn based on the first examples in this video. Because the value axis goes from high values to low in those histograms. They would need to be "mirrored" first in order to decide skewness.

    • @maxrkmrose
      @maxrkmrose 4 года назад +1

      @@OpenIntroOrg Skewness specifically indicates that the MEAN of the data set is not equal to the MEDIAN of the data set. Side note for others: on the histogram, lower values of the data are to the left with higher values of the data to the right.
      So a RIGHT skewed data set means that the MEAN of the data set is higher that the MEDIAN of the data set. There will be a higher density of observations to the left on the histogram. This concept seem opposite of what the histogram looks like, but the skewness is determined by the calculations from the data set.
      A LEFT skewed data set means that the MEAN is lower than the MEDIAN. There will be a higher density of observations to the right on the histogram.
      In a perfectly normal data set, the MEAN and MEDIAN will be approximately equal.

    • @ilanlivne4472
      @ilanlivne4472 4 года назад

      @@aCllips Thanks for this explanation

  • @tule9213
    @tule9213 2 года назад

    so touching for an excellent video

  • @ankmeyester
    @ankmeyester 7 лет назад +4

    so, the x axis here is the z score values and the y axis is the actual values? and plotting it against one another as seen here, we should see how it lines up? the better the linearity, the more 'normal' the distribution?

    • @OpenIntroOrg
      @OpenIntroOrg  7 лет назад +9

      Basically yes :) The x-values are the Z-scores we expect if the population and sample are as perfectly normal as it could be. So the straighter the line, the more encouraging that the data are nearly normal. That said, no population is perfectly normal, and even a sample from a truly normal distribution will not be perfectly straight just due to random sampling. That said, the main goal of this type of plot is often as a basic check to ensure nothing too wonky is going on and the population is roughly normal.

    • @ricardofraser4243
      @ricardofraser4243 5 лет назад

      seems like it

  • @maxtok414
    @maxtok414 3 года назад

    Thank you!

  • @shokhrukhabduahadov3985
    @shokhrukhabduahadov3985 5 лет назад +3

    so why it is so? why dont u explain the reason for not fitting the line

  • @gunning6407
    @gunning6407 6 лет назад +2

    In the textbook, I found the QQ-plot explanation to be lacking. Here, too, a number of key attributes are missing. First off, we must order the empirical observations (y-axis), as noted in previous comments. An explicit definition of "quantile" in earlier lectures would set the stage here, motivating "theoretical quantiles": the quantiles of the standard normal associated with the empirical probabilities (e.g. regularly spaced probabilites).

    • @DavidDiez
      @DavidDiez 6 лет назад +2

      Hi Gunning, thanks for the feedback. In short, this is a "special topic" that isn't covered in most intro stat courses (though some do cover it), so we breeze through on theory here and get to the practical application of the method. We don't expect anyone to walk away from this video able to reconstruct this type of plot -- only be able to read one.

    • @gunning6407
      @gunning6407 6 лет назад +1

      I updated my comment to put the "recipe" in a separate comment for curious readers. For context, I'm currently using the text to teach intro stats. This is my first semester with the department, but the department has used this text for several semesters.
      I absolutely understand the concern about special topics and coverage. My *personal* feeling is that the text should either include a discussion of QQ-plot along with 3-4 sentences of discussion of construction, or omit it altogether. That said, I would argue that understanding how the plot is constructed is critical to correctly reading it!

  • @harrygroundwater2590
    @harrygroundwater2590 Месяц назад

    Very Helpful

  • @robert8552
    @robert8552 4 года назад

    So, my data is skewed and non-normally distributed - What's to be done?
    Do I perform some transformation to force normality, or do I rather just perform non-parametric tests?

    • @OpenIntroOrg
      @OpenIntroOrg  4 года назад +2

      Unfortunately, it's easier to say "something might be risky or broken here" than it is to say "this is how to fix it". What is required will be highly dependent on the circumstances, both the data and what the goals are of the analysis:
      - If the sample is large enough and / or the skew isn't severe enough, then non-normality will not matter for some statistical methods. For example, if all your observations are within ~4 SDs of the mean, there are 30+ observations, and the method being applied is a t-confidence interval for the mean, then the skew isn't much of a concern because the Central Limit Theorem will have kicked in to the point the skew won't matter much.
      - A more robust method might help. However, be aware that not "nonparametric" does not automatically mean "robust". For instance, the bootstrap percentile method is less robust than t-distribution methods when the sample size is relatively small (

  • @garryarvindelgado4107
    @garryarvindelgado4107 7 месяцев назад

    thank you

  • @bensonmathew8679
    @bensonmathew8679 5 лет назад

    Very helpful!

  • @m7mdsaleh523
    @m7mdsaleh523 3 года назад

    Can we use the slope of the probability plot to measure the population variance of a sample?

    • @OpenIntroOrg
      @OpenIntroOrg  3 года назад

      The line doesn't quite represent this, especially when the distribution has longer tails than a normal distribution, so it is good to calculate the sample variance separately.
      Also, sorry to nitpick, but a clarify to avoid confusion for others: we'd describe "population variance of a sample" as "sample variance", and to further remove any ambiguity, we divide by (n-1) when computing the sample variance (while population variance is often computed by dividing by n).

  • @sunilkumarsamji8871
    @sunilkumarsamji8871 4 года назад

    Well, The name is Normal probability plots. a) Why are they called Probability plots? b) Why the plot between the observed data and z score is supposed to be a straight line? Well I can understand if the data fits well its a measure of goodness of the fit, however, I dont understand why this has to be a straight line

  • @StellaNimas
    @StellaNimas 2 года назад

    im doing my thesis rn, and the data is not normal, what to do with this? 😭😭

    • @OpenIntroOrg
      @OpenIntroOrg  2 года назад

      Data is never perfectly normal, so you're in good company. Check out OpenIntro Statistics Section 7.1, which offers a couple of rules of thumb on the bottom of the first page of that section. The book is free online as a PDF from our website, see:
      www.openintro.org/book/os

  • @gentle2005phir
    @gentle2005phir 5 лет назад

    Good one