Are nonparametric statistics useless?

Quant Psych

Просмотров 3,1 тыс.

Добавить в
- Мой плейлист
- Посмотреть позже
Поделиться

HTML-код

Размер видео:

Показать панель управления

Автовоспроизведение

Автоповтор

Опубликовано: 11 сен 2024

Комментарии • 38

@seanmahoney7707 6 дней назад
Thanks for your videos, which I've just discovered and really like. I'd learned that non-parametric data exists (ex. from Linguistics, a perception of a person's accentedness on a scale of 1 to 9, where it's impossible for the layman listener to quantify another person's accent via a scaled metric). Surely non-parametric tests are best for non-parametric data, no? I'll have to look into robust methods, lowest lines, and random forests. Thank you for the hints to alternatives, and for your ever-present sense of humour. Keep up the wonderful work!
@StatisticsSupreme Год назад ⁺¹⁶
But what if your variable is ordnial? Are ranks not the best way to model ordinal data?
@QuantPsych Год назад ⁺⁷
Good point! I hadn't thought of that.
@TheBjjninja 6 месяцев назад ⁺³
Hey! I always loved the background music
@dimitrioskioroglou4316 2 месяца назад ⁺¹
I totally agree with you... ranks are not that useful. The way I think it is that ranks result from an underlying latent process. We need to understand and properly model the process, not the ranks which represent a snapshot. It is not the easier thing to do. But better trying tricky stuff than chasing ghosts.
@TheBjjninja 6 месяцев назад ⁺³
Dr. Fife 'won't stop, can't stop'
@toad8427 Год назад ⁺³
The loud music bit, the “oh snap” 😂😂
@mahmoudhamza6765 Год назад ⁺²
Thanks a lot for the videos in general. I am starting to become addicted to ur channel
Also, thanks a million for the open discussion.
I have a few questions but I will briefly describe what I think first.
Please correct me if I am wrong.
First, using the normal distribution assumption is very tempting since we have an arsenal of classical tests that are based on it.
when the assumption is violated, we have multiple options:
1- use the Central limit theorem to assume the normality of the sampling distribution
This has limitations:
a- limited by certain statistics such as mean and proportions
b- large enough sample size. That's vague. Sometimes a sample as large as 500 observations is not enough in highly skewed data or with extreme outliers
2- nonparametric rank tests [problems: not modeling the data anymore]
3- bootstrapping: I mean here for a test of the difference between two groups, we can create a distribution of mean/median/sd/variance based on each sample and then compare these distributions)
4- Other methods - glm, quantile regression, robust stats
My questions are:
1- when is bootstrapping not enough for comparing differences between group(s)?
2- for the other methods in item 4, which one do you use/prefer? a video/resource/thoughts would be highly appreciated
3- This package in R www.danieldsjoberg.com/gtsummary/ is saving me a lot of time. It can very easily formulate summary stats and models as elegant tables in R.
However, it is using the nonparametric ranked tests as the default for comparing groups in **table1 summary stats**.
Is this acceptable for descriptive statistics - table1 patients' characteristics in statistical analysis?
Sorry for the long comment
@QuantPsych Год назад
A couple comments:
re: central limit theorem. Yes, technically, models are quite robust to normality violations (because of CLT). But, they're not robust to nonlinearity. Unfortunately, nonlinearity and non-normality go hand in hand.
re: bootstrapping. Modern robust methods use bootstrapping to estimate probabilities. But, again, that's not a model. I think it's fine for a quick and dirty estimation method, but it's probably better to find the right model. I use generalized linear models. I believe there's a playlist on my channel for those.
re: gtsummary. Looks like a cool package for preparing tables. I'll have to check it out. If I understand your question, I don't think it's a problem to do rank tests for a basic demographics table. That's not really your model. I do gripe about doing tests of these, but not because of the type of model chosen. I'm more concerned about people getting distracted from what the actual paper is about.
@mahmoudhamza6765 Год назад
@@QuantPsych That was insightful.
Thanks!
@galenseilis5971 Год назад ⁺¹³
Thanks for elaborating on your perspective, Dustin. I'll be happy to respond in time. Hopefully not an entire year later though! ;-)
I'll post something back here to ping you when I have posted a response.
@QuantPsych Год назад
Deal :)
@olgierd245 Год назад
I wonder what the response would be
@galenseilis5971 Год назад
@@olgierd245 I am working on a response when time allows.
@galenseilis5971 10 месяцев назад
@@QuantPsych Hmm, well it took almost a year... Somehow the time slips away. I've put a response on my blog. I won't post it here because I think RUclips will automatically delete it anyway, but it should be easy to find. I'll also try to send the link to your Rowan University email.
@galenseilis5971 10 месяцев назад
@@olgierd245 The response can be found on my blog.
@vazquez-borsetti Год назад ⁺⁷
congrats for your paper!!!
@MikkoHaavisto1 Год назад ⁺³
how had you never heard of ordinal variables? that is stuff for the first statistics course...
@seejendo3290 Год назад ⁺⁵
I officially need a video from you on what you’re referring to when you say “convergence issues” - the internet is not explaining this well for non-math folk. Pretty please?
@seejendo3290 Год назад
And maybe just some examples of rank based models and other non-parametric models, how they’re supposed to be used, and how they’re used badly in the wild.
@QuantPsych Год назад ⁺¹
It's on my to-do list :)
@AbdullahN8 Год назад ⁺⁵
Thanks for the insight..
Can you give practical real-life examples in R when nonparametric are routinely used in biostatistics and when to use robust, loess or random forest in those situations?
@QuantPsych Год назад
I'm not sure that loess/robust/RF are the alternative for the situations I'm talking about. I would probably do like a gamma regression model instead of a mann-whitney. But, I'll think about doing another video.
@SimónSchulzMontecinos 5 месяцев назад
Can I get some insight, I am kind of desparate and it seems that Wilcoxon test is my only way out:
I obtained a dataset of a pre-post intervention, with n=10 and no control group. The measurement was conducted using a scale ranging from 0 to 12 to assess the outcome of a physical test before and after the intervention. The objective is to determine if there is a difference due to the intervention. I conducted a Wilcoxon test for paired data, which yielded a significant result - a good start. However, no linear model met the assumptions (not surprising). I attempted a Huber regression, but it didn't yield any changes in the outcome. I also tried modeling the difference between post and pre-intervention scores, and then dichotomizing them into 1 (improved) and 0 (not improved). However, it appears that I lost information as the result turned out to be non-significant. Thus, it seems to me that the only analysis I can perform that adequately accounts for paired data, a small sample size, and doesn't rely on assumptions of normality is the Wilcoxon test.
@QuantPsych 5 месяцев назад
Have you plotted your data? Don't use statistical tests to determine if you've violated normality. Look at the plot and see if the fitted line passes through the data. If it doesn't, then you can use generalized linear models instead of a wilcoxen.
@StatisticsSupreme Год назад ⁺¹
To me ranks are models - not transformations. With a tranformation you can go back to the original data, even if you lost the original data, because you have a tranformation formula. With ranks, once you lost the original data, you can not go back. Same with other models.
@QuantPsych Год назад ⁺¹
I'm not sure what you mean. Ranks are models even though you can untransform them?
@StatisticsSupreme Год назад
@@QuantPsych Ranks are models. They are not transformations. Because you can_not untransform them.
@pipertripp Год назад ⁺¹
So are we mostly talking about using models for explanation or maybe inference vs prediction here? I'm guessing that you're more interested in trying to explain a phenomenon mathematically and so the nonparameteric models like RF aren't super useful b/c they don't yield something that explains the phenomenon with a closed form expression like a GLM would? Sorry, I'm really new to statistics and this is over my head right now, but definitely interesting.
@QuantPsych Год назад ⁺²
I suppose that's a fair assessment. Yes, if you're just doing prediction, maybe parametric models don't matter as much.
@pedropequeno7353 Год назад
Thanls for putting my thoughts into words, maybe I am not going crazy
@ikitoki 5 месяцев назад
You make me feel guilty for using rank-based, non-parametric tests in the past. But I did not know any better. This is what they taught me to use when the sample groups are so small that I can't test for the normality of the distribution. They also told me that in general, non-parametric tests are less powerful than parametric ones, so I thought it would be better to use a less powerful test and only report the most significant results. I actually thought I was being conservative in using rank-based, non-parametric tests.
@QuantPsych 5 месяцев назад ⁺¹
I used to use those a lot too. I don't know that I'd go so far as saying they're bad, they're just not modeling the data. I prefer to model the data.
@dryinpan9860 Год назад ⁺¹
You know what would really show Galen? Some forecasting methods in FLEXPLOT... I'm so sorry, I just want to see it.
@QuantPsych Год назад ⁺¹
Persistent one, aren't you :) You can file a feature request on github. It's been over 15 years since I've done any forecasting, but maybe it won't be too hard to modify flexplot to handle that.
@dryinpan9860 Год назад
@@QuantPsych Oow sorry to post this in the wrong area. It doesn't HAVE to be in flexplot. It would be great to see you do a series on forecasting in R and working with time series data! Thank you again for all your teachings.
@JakeCo-pf6ty Год назад
I suppose this would be less about the model and more about the inferential procedure, but nonparametric methods like the bootstrap can be quite useful and in some cases, not a pit stop, but the end goal (or best general test for a certain quantity). Think of mediation models and the indirect effect: the product of regression coefficients is typically not normal (except in very large samples), so the bootstrap serves as a good-great alternative that won't break down when methods like the Sobel do. (That isn't to say there isn't a parametric procedure for this, you could look at the regression coefficients jointly or PRODCLIN developed by MacKinnon for the product of the coefficients), but these methods would break down when assumptions are violated all the same. In other words, are theoretical sampling distributions /always (or ideally) better than empirical sampling distributions that don't try to force a form or shape to a particular problem? I would say no, but I can see your perspective.

Следующие

Автовоспроизведение