Quantitative Social Science Data Analysis
Quantitative Social Science Data Analysis
  • Видео 103
  • Просмотров 94 482
Chapter 4 Video 8 - Pivoting Datasets in R
In this video, we briefly examine pivoting datasets/variables using the pivot_longer() and pivot_wider() functions from the tidyr package.
This is the 8th video of Chapter 4 for the book Quantitative Social Science Data with R, 2nd Edition (Sage).
April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Просмотров: 178

Видео

Chapter 9 Video 3 - Wilcoxon Rank Sum Tests in R
Просмотров 202Год назад
In this video, we examine running Wilcoxon rank sum tests with independent samples in R. This is done with the wilcox.test() function. This is the 3rd video of Chapter 9 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 5 Video 2 - Renaming Variables in R
Просмотров 102Год назад
In this video, we go through renaming variables in R. This is done using the rename() and rename_with() functions from the dplyr package. This is the 2nd video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 5 Video 6 - Re-Arranging Variable Values in R
Просмотров 116Год назад
In this video, we take a look at re-arranging variable values in R. This is done using the mutate(), factor() function with the levels option and the fct_rev() function from the dplyr and forcats packages. This is the 6th video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study...
Chapter 5 Video 10 - Dealing with Missing Values in R
Просмотров 85Год назад
In this video, we look at how to deal with missing values in R. This is done with the mutate(), replace_na(), and na_if() functions from the dplyr and tidyr packages. This is the 10th video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 5 Video 3 - Changing Variable Classifications in R
Просмотров 148Год назад
In this video, we briefly examining changing variable classifications in R. This is done using the as.factor() and as.numeric() functions. This is the 3rd video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 4 Video 1 - Reading in Data from a Working Directory in R
Просмотров 232Год назад
In this video, we examine reading in different data formats from a working directory in R. This includes reading in .csv, .xlsx, and .dta (Stata) data formats using the tidyverse, haven, and readxl packages. This is the 1st video of Chapter 4 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: stu...
Chapter 5 Video 1 - Determining Levels of Measurement in R
Просмотров 579Год назад
In this video, we go through determining variables' level of measurement in R. This is done using the glimpse() and count() functions. This is the 1st video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 8 Video 12 - Multiple Plots - Combining Plots in R (with ggplot2)
Просмотров 76Год назад
In this video, we examine combining multiple plots in R using the patchwork package. This is the 12th video of Chapter 8 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 5 Video 8 - Collapsing Numeric Variables in R
Просмотров 127Год назад
In this video, we quickly go through collapsing numeric variables in R. This is done using the mutate() and cut_interval() functions from the dplyr and ggplot2 packages. This is the 8th video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 2 Video 2 - RStudio Tour
Просмотров 115Год назад
In this video, we briefly examine the features of RStudio. This is the 2nd video of Chapter 2 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 8 Video 9 - Scatterplots with Four Variables in R (with ggplot2)
Просмотров 244Год назад
In this video, we create a scatterplot with four variables in R using the geom_point() function from the ggplot2 package. This is the 9th video of Chapter 8 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebooks are now available for these R tutorials: study.sagepub.com/Fogarty2e
Chapter 5 Video 4 - Removing Characters in Variables in R
Просмотров 171Год назад
In this video, we examine removing characters in variables' names and values in R. This is done using the mutate(), parse_number(), parse_date(), rename_with(), str_to_lower(), and str_replace_all() functions from the readr, dplyr, and stringr packages. This is the 4th video of Chapter 5 for the book Quantitative Social Science Data with R, 2nd Edition (Sage). April 2023 Update: Data & codebook...
Chapter 5 Video 12 - Creating New Variables from Existing Variables in R
Просмотров 602Год назад
Chapter 5 Video 12 - Creating New Variables from Existing Variables in R
Chapter 8 Video 1 - Bar Plots with One Variable in R (with ggplot2)
Просмотров 227Год назад
Chapter 8 Video 1 - Bar Plots with One Variable in R (with ggplot2)
Chapter 5 Video 7 - Collapsing Categorical Variables in R
Просмотров 574Год назад
Chapter 5 Video 7 - Collapsing Categorical Variables in R
Chapter 8 Video 4 - Histograms with Two Variables in R (with ggplot2)
Просмотров 920Год назад
Chapter 8 Video 4 - Histograms with Two Variables in R (with ggplot2)
Chapter 8 Video 7 - Scatterplots with Two Variables in R (with ggplot2)
Просмотров 30Год назад
Chapter 8 Video 7 - Scatterplots with Two Variables in R (with ggplot2)
Chapter 7 Video 1 - Frequency Distributions in R
Просмотров 112Год назад
Chapter 7 Video 1 - Frequency Distributions in R
Chapter 4 Video 2 - Reading in Data using RStudio
Просмотров 65Год назад
Chapter 4 Video 2 - Reading in Data using RStudio
Chapter 4 Video 4 - Examining Variables in R
Просмотров 77Год назад
Chapter 4 Video 4 - Examining Variables in R
Chapter 4 Video 5 - Managing Missing Values in R
Просмотров 69Год назад
Chapter 4 Video 5 - Managing Missing Values in R
Chapter 2 Video 3 - Working Directories in R
Просмотров 99Год назад
Chapter 2 Video 3 - Working Directories in R
Chapter 8 Video 8 - Scatterplots with Three Variables in R (with ggplot2)
Просмотров 154Год назад
Chapter 8 Video 8 - Scatterplots with Three Variables in R (with ggplot2)
Chapter 8 Video 10 - Colour Considerations in Scatterplots in R (with ggplot2)
Просмотров 35Год назад
Chapter 8 Video 10 - Colour Considerations in Scatterplots in R (with ggplot2)
Chapter 4 Video 7 - Merging Different Datasets in R
Просмотров 66Год назад
Chapter 4 Video 7 - Merging Different Datasets in R
Chapter 8 Video 5 - Smoothed Density Plots in R (with ggplot2)
Просмотров 120Год назад
Chapter 8 Video 5 - Smoothed Density Plots in R (with ggplot2)
Chapter 8 Video 2 - Bar Plots with Two Variables in R (with ggplot2)
Просмотров 245Год назад
Chapter 8 Video 2 - Bar Plots with Two Variables in R (with ggplot2)
Chapter 7 Video 4 - Z Scores and SATs in R
Просмотров 46Год назад
Chapter 7 Video 4 - Z Scores and SATs in R
Chapter 8 Video 13 - Saving Plots in R (with ggplot2)
Просмотров 48Год назад
Chapter 8 Video 13 - Saving Plots in R (with ggplot2)

Комментарии

  • @gingeralonso2562
    @gingeralonso2562 5 дней назад

    Thank you for this video! Do you have any videos on plotting predicted probabilities for an interaction effect in an ordered logit model in R?

  • @cocojamba1488
    @cocojamba1488 12 дней назад

    Привет! Будут ли ещё видео???

    • @qssd
      @qssd 6 дней назад

      I would love to do more, but I unfortunately haven't had the bandwidth....

    • @cocojamba1488
      @cocojamba1488 6 дней назад

      @@qssd sad(

  • @itsmydaily3412
    @itsmydaily3412 15 дней назад

    I would to ask what would be the code if I want to make the combined plot as a new plot. (like a new figure contain the two plots) pls help TT

    • @qssd
      @qssd 8 дней назад

      It depends on exactly what you want to do. When I use `patchwork` in a Rmarkdown file, it prints as one plot. You should also be able to save the combined plot as its own object, e.g., `p3 <- p1/p2`

  • @abdourakibmama4274
    @abdourakibmama4274 16 дней назад

    Happy to get this tutorial. I try use the check_zeroinflation(model.prm) but it doesn't work. I hope we need those information before

  • @fuzzyrandom
    @fuzzyrandom 2 месяца назад

    hi Brian, at about 2:20 you removed family='poisson' saying we don't need that, what do you mean by that? And what would change if we left that in?

    • @qssd
      @qssd 2 месяца назад

      Thanks for the question. We don't need to specify that b/c the function is specifically for the NB model, whereas the `glm()` function runs a bunch of different models so we specify the `family`. Since `family` doesn't appear to be one of the options for `glm.nb`, if you leave in `family="poisson"` I expect R will ignore it or give you an error (I haven't tried to run it recently).

  • @jeandenys7
    @jeandenys7 3 месяца назад

    You make me laught when you laught saying must people ignore the brant test😅😅

  • @hoangminhtran7064
    @hoangminhtran7064 4 месяца назад

    Can you consult my R conduction personally? I will pay for you. Please contact me.

  • @anthonymenor1152
    @anthonymenor1152 5 месяцев назад

    Hello, thank you for the helpful video, and subscribed! I had a brief conceptual question. If my model had an interaction term between 2 predictors (let's call them A (numeric) and B (binary)), and I used this in ggpredict() with the "terms" being x = A and group = B, then would the resulting plot trends represent that interaction right. As in, the slope for level 1 of B would differ from the slope of level 2 of B based on the interaction. Conversely, if I used an additive model of just A + B (no interaction) with the same "terms," the resulting slopes for levels 1 and 2 of B would differ but this would not reflect any interaction -- only the additive model. I guess my question is whether the 2nd plot (additive model) is just stratifying the data by levels of B and then producing separate lines for each level? And if so, which is better at capturing the moderating effect of B on the relationship between A and the outcome- comparing the lines of the 1st plot or comparing the lines of the 2nd plot? Please correct me if I am misunderstanding the concepts. I may be confusing stratified log regressions with the grouping you are doing here.

    • @qssd
      @qssd 5 месяцев назад

      Hi! This is a great question! Sorry for the delayed response; I needed to look into how `ggeffects()` implements interactions. You are correct that the plot here is additive; it's not an interaction. So, instead of the predicted probabilities for `scot` being based on `trust` and `age` set at their means, the predicted probabilities are when trust=1 and age=mean(age), trust=2 and age=mean(age), etc. If you do an interaction between `scot` and `trust` in the model, the predicted probability plot will look similar but not identical. If your interest is on capturing the moderating effect of a second variable, then an interaction in the model and then plotting the interaction for interpretation purposes would be better. You might want to check out the ggeffects interactions info: strengejacke.github.io/ggeffects/articles/ggeffects.html

    • @anthonymenor1152
      @anthonymenor1152 4 месяца назад

      ​@@qssd Thanks a lot for explaining! What I ended up doing to visualize an interaction was using practically the same code you shared, but instead of setting an additive model in ggpredict(), I set it to a model with the interaction term in it.

    • @qssd
      @qssd 4 месяца назад

      Makes sense!

  • @lizongzhang
    @lizongzhang 7 месяцев назад

    I am greatly inspired by your video. Thank you! I found another way to make predicted prob plot: ggpredict(model.fit, terms = "scot" ) %>% plot()

  • @GaurangiBhoot
    @GaurangiBhoot 7 месяцев назад

    how to add fixed time effects in this model?

    • @qssd
      @qssd 7 месяцев назад

      The simplest way is to add a factor version of the time variable as a predictor in the model (e.g., `as.factor(year)`). But, if you have proper time series data, you probably want to explore a time series count regression model (e.g., Poisson autoregression).

  • @bbsyduam2452
    @bbsyduam2452 8 месяцев назад

    Hi Thank you for the great video! Do you think this R code can be used when we are trying to judge whether our data has too much zeros for regression analysis other than negative binomial or poisson regression?

    • @qssd
      @qssd 8 месяцев назад

      Hi - thanks! Do you mean like for a binary outcome variable, where you would normally use logit or probit? It doesn't appear it will work for that. I took a look at the package (`performance`) and function's code on Github, and it should throw an error if it's not a checking a count distribution. I wouldn't be surprised if there are R packages/functions that allow you to test if there are too many zeros in a binary outcome variable, but I don't know them off-hand. For binary and ordered outcomes, if I'm worried there might be too many 0s or 1s (or similar for ordered case), I'll run a model with that assumes an asymmetric error distribution (e.g., complementary log-log, skewed logit), compare AIC/BIC values with the logit/probit versions, and choose the model with the lowest AIC/BIC.

  • @jessicawojcik8731
    @jessicawojcik8731 8 месяцев назад

    Thank you for the very understandable video! What model should I use, if my zero inflation test tells me my model is overfitting zeros?

    • @qssd
      @qssd 8 месяцев назад

      Thanks! If you are overfitting zeros using poisson regression, I would first use a negative binomial regression. If you are still overfitting zeros with NB regression, I would next use a hurdle model (probably the NB version) b/c of how it truncates the count data for only positive counts. That way you should get a better understanding of the relationship for 0 counts and positive counts.

  • @hen3vz
    @hen3vz 8 месяцев назад

    Show us how to run and interpret a multivariate probit in R please kind sir!

    • @qssd
      @qssd 8 месяцев назад

      Hi - sorry, I don't typically work with data used in a multivariate probit (i.e., correlated binary outcome variables) and I don't have an example queued up.

  • @Funnydudejoseph
    @Funnydudejoseph 9 месяцев назад

    How can I access the book

    • @qssd
      @qssd 9 месяцев назад

      Hi - at present, there isn't a free version of the book; I don't know if there will be. Thanks

  • @GourabNath-g6p
    @GourabNath-g6p 9 месяцев назад

    sir, can you share the word file please

    • @qssd
      @qssd 9 месяцев назад

      Hi - the Word file is just automatically generated by knitting the .Rmd in RStudio. So, there isn't anything special about the Word file; and also I'm not sure whether I still have it. Thanks

  • @guerschommugisho5569
    @guerschommugisho5569 9 месяцев назад

    Thank you Sir for this tutorial. I have a problem plotting predicted with ggprecti package. Could you help me please? I have already estimated regression output using nnet package as follow : Ado1 <- multinom(DV ~ Age + `Place of residence`+ `Completed primary school` + Region + Period + `Sexual violence` , data=ADO10km, weights = v005/1000000, family=binomial); summary(Ado1) When I want to plot with ggpredict like this : ggpredict(Ado1,termes="Period") I get the message error I usually get this error when I try to plot predicted probabilities from multinomial logistic regression with ggpredict : ! Can't extract column with `terms[2]`. x Subscript `terms[2]` must be a location, not a character `NA`. Backtrace: 1. ggeffects::ggpredict(Ado1, termes = "Period") 2. base::lapply(...) 3. ggeffects (local) FUN(X[[i]], ...) 4. ggeffects:::ggpredict_helper(...) 5. ggeffects:::.post_processing_predictions(...) ... 9. tibble:::`[[.tbl_df`(original_model_frame, terms[2]) 10. tibble:::tbl_subset2(x, j = i, j_arg = substitute(i)) 11. tibble:::vectbl_as_col_location2(j, length(x), names(x), j_arg = j_arg) 14. vctrs::vec_as_location2(j, n, names) 15. vctrs:::result_get(...)

    • @qssd
      @qssd 9 месяцев назад

      Hi - I'm not entirely sure, but I have a couple of ideas. #1, I noticed in your ggpredict() specification that you have 'termes' instead of 'terms'. I assume you are using the French for terms or this is error. Does ggpredict work with French? Have you tried "terms"? #2, have you tried adding `as_tibble()` on its own line between `ggpredict()` and `ggplot()` (with using the piping operator %>% to connect the lines)? #3, it looks like the error is saying it can't use an NA value. Does the variable "Period" have any values that only have NAs as observations? For example, all the observations for Period=2 are NA. It is also possible that NAs were created for in the ggpredict() function. For example, if the ggpredict() function didn't work correctly it may have produced NAs instead of numeric values. Have you checked the output when you run the ggpredict() function? #4, another thing to try is adding `filter(!is.na())` before `ggplot()`. So, and adding #2, ggpredict(Ado1, terms="Period") %>% as_tibble() %>% filter(!is.na()) %>% ggplot(...) (the ... is all the ggplot code). You can then see what gets plotted and see if it looks wrong. This might help you figure out the problem.

    • @guerschommugisho5569
      @guerschommugisho5569 9 месяцев назад

      @@qssd Thank you so much. It finally worked

  • @alexyankson4759
    @alexyankson4759 10 месяцев назад

    What package are you using for check_zeroinflation

    • @qssd
      @qssd 10 месяцев назад

      It's from the `performance` package

  • @RichardArchon
    @RichardArchon 11 месяцев назад

    Hi Brian, good talk and help me a lot, thanks. My data are expressed with numerators and denominators of different counts, repectively. E.g., the fish is of disease occurrence in a pond and the initial population number set up in each pond is different. I want to detect whether the pond or other treatments have significant impacts of disease infection to the fish. How to conduct two outcome variables (counts in numerators and denominators ) in Poisson, NB or zeroinflated or hurdle models? Thank you.

    • @qssd
      @qssd 11 месяцев назад

      Hi - thanks for the question. Unfortunately, I'm not sure I have a great answer. If I understand correctly, if for each pond the numerator is number of fish with disease and the denominator is total number of fish in the pond (i.e., the population in the pond), can you just create a ratio or % and use something like OLS? Otherwise, if you have two related counts as outcomes, you can use the 'bivariate' versions of count models (e.g., Famoye 2010). I'm not very familiar with these models b/c I haven't needed to use them but you can find info online.

  • @ilhamtohari306
    @ilhamtohari306 11 месяцев назад

    how we can find the confidence interval of odds ratio?

    • @qssd
      @qssd 11 месяцев назад

      Hi - good question. Honestly, I usually don't worry about confidence intervals of odds ratio as odds ratio is just a way to interpret the coefficients; where confidence intervals matter for the coefficients. I don't know an easy way per se. Since we use `exp(\beta)` to get the odds ratio value, my best guess is that we can use the `confint()` function to get the 95% confidence intervals of the coefficients and then use the `exp()` function. For example, `exp(confint(model.mlogit, level=.95))` will give you the odds ratio values of the 95% CIs of the coefficients.

  • @HannahEarly
    @HannahEarly Год назад

    Hi--thanks so much for this video and for 1.6. I am trying to add confidence intervals for group predicted probabilities and add them to a ggplot. I have tried to combine the info from the two videos but ended up getting different predicted probabilities than the group probabilities generated when I used the code from this video. Any suggestions on how to get confidence intervals for group probabilities and how it might be different than the 1.6 video? Thanks!

    • @qssd
      @qssd Год назад

      Hi - I'm not sure why the difference, but I imagine it's something small w/ how they are calculated (e.g., a control variable is held at mode in one and mean in another). But, check out these new videos that I have on this (much easier code): ruclips.net/video/qKchFtTuaBE/видео.htmlsi=7V2QQJC6OT15luG4 ruclips.net/video/0-kSeGPHMFk/видео.htmlsi=Qob26xkMM_7Bh25m

  • @prashantchoudhary138
    @prashantchoudhary138 Год назад

    Hi, While combining the graphs, I get the error "non-numeric argument to binary operator" . please help.

    • @qssd
      @qssd Год назад

      Hmmm. Generically, the error means you are trying to do something that requires numeric vectors (etc) but you are using non-numeric vectors (etc.). I don't think that matters here. My guess is that the patchwork package is not loaded (`library(patchwork)`). I got the same error as you when I didn't first run `library(patchwork)`. So, double check that.

  • @yadetafufa-qd2xi
    @yadetafufa-qd2xi Год назад

    Thank you so much for your contributions

    • @qssd
      @qssd Год назад

      You are very welcome

  • @williamdesousadias1463
    @williamdesousadias1463 Год назад

    And how about ordered probit for painel data in R? you how to do it?

    • @qssd
      @qssd Год назад

      Yeah, that's a different kettle of fish. I don't have direct experience running that in R; besides from a two-wave panel survey where the time aspect is essentially irrelevant. But, the ordered probit part is likely the easiest part. The harder part is correctly specifying the errors of the panel overtime, i.e., fixed vs. random effects, etc. I'm sure you can find an R package that does this -- just google it. Sorry can't be more helpful.

  • @isaacbaah6743
    @isaacbaah6743 Год назад

    Where can someone interested get the datasets

    • @qssd
      @qssd Год назад

      There is a link in the video description --- the line starting 'April 2023 Update:'

  • @dantitoprrito
    @dantitoprrito Год назад

    This series is amazing, thank you!

    • @qssd
      @qssd Год назад

      You're very welcome!

  • @elsavarelaredondo6868
    @elsavarelaredondo6868 Год назад

    The glimpse function is not used in this chapter in the book. It would be good to let watchers know that we have to uload the dplyr library to be able to use glipmse

    • @qssd
      @qssd Год назад

      Thanks for the comment. The `glimpse()` function is used in this chapter in the 2nd edition. The function is from the `dplyr` library, but simpler to use `library(tidyverse)` to load the core tidyverse packages at once.

  • @julianschmidt260
    @julianschmidt260 Год назад

    I don't usually post comments, but I'm doing the analysis for my bachelor thesis and I'm also using polr() models. Your videos have helped me enormously, especially in interpreting the results. Thank you very much.

    • @qssd
      @qssd Год назад

      Awesome! Glad they were useful.

  • @jkarimb
    @jkarimb Год назад

    I am very thankful for your tutorials! I could not have written my thesis without them!

    • @qssd
      @qssd Год назад

      Terrific, thanks so much!

  • @CanDoSo_org
    @CanDoSo_org Год назад

    Hi, since the estimates (coefficients) are the log of odds, why the exponential of the coefficients are not the odds, but the odds ratio? Thanks.

    • @qssd
      @qssd Год назад

      There is a bit of tricky nuance. I use this example - Think of the odds as an event occurring vs. not occurring, and the odds ratio as the odds an event occurs for one group vs. another group(s). In the regression context, we are comparing different groups (our predictors) for an event occurring. Extending beyond 'groups' (i.e., nominal predictors), Long (1997) shows that for ordered predictors exp(\beta_i) is the one-unit change in odds, which is the odds ratio and not odds. I prefer to reference logit coefficients as providing 'logits' and not 'log of odds' to reduce confusion.

    • @CanDoSo_org
      @CanDoSo_org Год назад

      @@qssd Thanks.

  • @CanDoSo_org
    @CanDoSo_org Год назад

    Hi, could you please compare the pros & cons between logit and probit, thanks?

    • @qssd
      @qssd Год назад

      Mainly, if you want to use odds ratio for interpretation you have to use logit. Otherwise, it is personal preference --- only difference between logit and probit is assumption about error distribution, and almost always get same results.

    • @CanDoSo_org
      @CanDoSo_org Год назад

      @@qssd Thanks.

  • @MahadHassan-ql9ti
    @MahadHassan-ql9ti Год назад

    Are any of these datasets available so we can work along with you?

    • @qssd
      @qssd Год назад

      Hi - The datasets, as part of the digital resources, should be available on the book's website sometime this month. When they are available, I'll include the links on the channel. Sorry they're not available yet.

    • @MahadHassan-ql9ti
      @MahadHassan-ql9ti Год назад

      @@qssd Thanks! I've just received my Kindle version so looking forward to tackling these exercises! :)

    • @qssd
      @qssd Год назад

      The data and codebooks are now up: study.sagepub.com/Fogarty2e

    • @MahadHassan-ql9ti
      @MahadHassan-ql9ti Год назад

      @@qssd Thank you!!!

  • @qssd
    @qssd Год назад

    The RStudio IDE is now downloadable from posit.co (replacing rstudio.com)

  • @silvadidierrr
    @silvadidierrr Год назад

    You just saved me 4h! For likert scales I will use as.factor and then as.numeric instead of using search and replace in Excel!

    • @qssd
      @qssd Год назад

      awesome!

  • @CanDoSo_org
    @CanDoSo_org 2 года назад

    Hi, thanks for the great tutorial, but I did not find the exact data you used here. There are a lot of dataset on the site you refer to and no clue which one is the one you used.

    • @qssd
      @qssd 2 года назад

      Thanks. You're correct about the data location. It appears that the site has been overhauled. The data used here is the 'microdata teaching file' from the 2011 Scottish Census: www.scotlandscensus.gov.uk/documents/microdata-teaching-file-and-user-guide/ Some of the variables in the dataset used in the video were recoded from the original variables in the downloadable data.

    • @CanDoSo_org
      @CanDoSo_org 2 года назад

      @@qssd Thanks for your quick reply.

  • @roypeijen
    @roypeijen 2 года назад

    Thanks for sharing this video. I've a question about the "with()" command. In your tutorial, you only have one factor variable (gender) with two levels and that is why you put the '2' after the length.out=100 option, right?. In case I would have a 3-level factor variable, this would have been a '3', right? I'm running a model now with two factor variables. One factor is a 2-level factor variable (gender) and the other is a three-level factor variable (education). My question is, what number do I need to fill at the place you filled in the 2 when knowing my model with two different factor variables? Hope you can help.

    • @qssd
      @qssd 2 года назад

      That's right - the '2' refers to the 2 values of gender (which are the lines to plot). The answer of what to do when you have two factor variables comes down to what you want to plot? For this type of plot, I believe the x-axis variable must be numeric and you can only have one variable as a factor that will generate the different lines in the plot. So the number to put in will be how many levels the one factor variable has. If your second factor is just in the plot as a 'control', like marital_dummy, you might need to convert it to a numeric or you could try setting to one of the levels (like the modal value); but I haven't tried it. An alternative way to plot these is using the `ggpredict()` function from the `ggeffects` package. The `ggpredict()` function essentially does all of the code prior to the ggplot() code and then you can specify all the ggplot() code and link the two using the pipe operator %>%

    • @roypeijen
      @roypeijen 2 года назад

      @@qssd Thanks! I converted the gender variable to a numeric one because that is not the one I wanted to plot but the three-level factor variable is. Works perfectly now!

  • @KirtiTewari
    @KirtiTewari 2 года назад

    The best video ever!

    • @qssd
      @qssd 2 года назад

      Thanks so much!

  • @deniseou2670
    @deniseou2670 2 года назад

    First time to leave the comment in RUclips! I watched all the logit regression videos you produced! You are way too good compared to my teacher! Thanks so much! I just wanna say thank you!

    • @qssd
      @qssd 2 года назад

      Wow, thanks!

  • @xtxpxhx
    @xtxpxhx 2 года назад

    oh this was great thank you much!

    • @qssd
      @qssd 2 года назад

      You're so welcome!

  • @supradg
    @supradg 2 года назад

    Why do not make the Education and Scottish Identity as (ordered) categorical variables? I think that we cannot change these by 1 percentage point, for example? Does making them numeric make sense?

    • @qssd
      @qssd 2 года назад

      You could make them explicitly ordered categorical or numeric. Here, you will get the same regression coefficients with either.

    • @kasberge7164
      @kasberge7164 Год назад

      @@qssd strictly speaking, you would have to recode them into dummies to include them in a regression model as independent variables, or not? You are treating them as numeric, which I don't get- is this accepted as a quantitative social science practice`?

    • @qssd
      @qssd Год назад

      @@kasberge7164 You would only dummy them if they are nominal-level variables. Edu and scot id are ordinal-level variables, so the simplest option is treating them as numeric. You would only recode them as dummies if wanted to create a new version(s) of the variable -- for example, scotID = 7 (so, '1' in a dummy) vs. scotID=1-6 (so, '0' in a dummy).

  • @haintuvn
    @haintuvn 2 года назад

    How can we interpret "Women's odds" so the other person who is not familiar with the concept "odds" can understand? Does that mean "possibility"?

    • @qssd
      @qssd 2 года назад

      Good question. Although odds are related to probability, we often don't think about them the same as probability (or predicted probability). Generally, we can think of odds as the "chance" or "likelihood" of an event occurring. Here, the greater the odds, the more likely an event will occur.

  • @michalispapadopoulos5090
    @michalispapadopoulos5090 2 года назад

    thanks a lot sir! Very helpful!

    • @qssd
      @qssd 2 года назад

      Glad it helped!

  • @lorenzocapitani9556
    @lorenzocapitani9556 2 года назад

    This is probably one of the greatest videos I've found on the topic. Very nicely explained. Thank you kind sir!

    • @qssd
      @qssd 2 года назад

      Wow, thanks!

  • @joshuasmith1526
    @joshuasmith1526 2 года назад

    At line 42 I get the error "Factor has new levels 0, 1" any idea how to resolve this?

    • @qssd
      @qssd 2 года назад

      The error is likely referring to a mismatch in what the factor levels are currently and what you are trying are referencing in the code. So, it might be they are 1 and 2, but you asking for levels 0 and 1. Try changing the code to '1:2' or recode the factor levels prior to the plotting code. (Sorry for the late response, you might have figured this out already).

  • @ezechielamoussou7409
    @ezechielamoussou7409 2 года назад

    Hello and thank you for you video 🙂 Would it be possible to do the graph if my variables were factors?

    • @qssd
      @qssd 2 года назад

      Sorry for the late reply. For this type of plot, the x-axis needs to be numeric in some way (so, a variable that is ordered with at minimum 3 values). You would have to use a different plotting technique to plot just factor variables (that is, nominal-level variables).

    • @ezechielamoussou7409
      @ezechielamoussou7409 2 года назад

      @@qssd thank you !

  • @joshuawelch33
    @joshuawelch33 2 года назад

    Is it possible to move the dashed line? For example, if I want it at 1 vice 0.

    • @qssd
      @qssd 2 года назад

      I don't think you can with the coefplot() function. It also would defeat the purpose of this plot as the dashed line at 0 provides info on statistical significance; most of the other coefficient plotting functions I've seen have some version of the 0 line. The coefplot() manual says you could get rid of the line with zeroType=0 as an option. Otherwise, you could try to do this plotting using ggplot2.

  • @KN-tx7sd
    @KN-tx7sd 2 года назад

    if your outcome is continuous instead of binary as you have shown how will the interpretation be done? Kindly explain,

    • @qssd
      @qssd 2 года назад

      Hi - if your outcome variable is continuous then you should use linear regression and not logit regression. If you mean a predictor variable that is continuous, then the odds ratio interpretation is done in the same way as for 'general_health' in the video. The one possible difference is that if the continuous predictor's values are meaningful then you can use them instead of the generic 'unit'. For example, if the predictor's values are raw dollars, you can still say 'for a one-unit increase', but you can also say 'for a one-dollar increase'.... You just need to make sure you know the unit of measurement (e.g., raw dollars, millions of dollars, etc.), otherwise you might mess up the interpretation.

  • @KN-tx7sd
    @KN-tx7sd 2 года назад

    excellent, excellent

    • @qssd
      @qssd 2 года назад

      Thank you! Cheers!

  • @gabriellamartinez7985
    @gabriellamartinez7985 2 года назад

    Hello thank you for this video, its been super helpful! I have a question regarding the dependent variables. How would you interpret the polr function output for dependent variables that are factors? For example, if RefvotDum was a factor with the same 0, 1 levels or No, Yes levels, how would you interpret the coefficients in that case?

    • @patric001122
      @patric001122 2 года назад

      I also would like to know, how to interpret the coefficients. Can I just use the "(exp(model.1$coefficients[-1])" command to get the odds-ratio? is there an opportunity how I could get the marginal effects? And is it also valid to use the "scale()" command to standardize the coefficients? Thank you very much for your videos, they helped a lot!

    • @qssd
      @qssd 2 года назад

      Yeah, sorry, I haven't had time to create new videos on the interpretations. To get polr to run with a factor outcome variable, you need to classify it as ordered (e.g., as.ordered(as.factor()) ). The interpretations, though, should be the same for a numeric or ordered factor outcome variables.

    • @qssd
      @qssd 2 года назад

      Yes, you can use the same code for the odds ratio. The difference for interpretation from the binary logit is that the odds need to be discussed as 'more', 'increased, 'greater', etc. instead of a specific outcome. This is b/c ordered logit uses cumulative odds ratio and so the odds value is the cumulative odds of a lower to higher outcome. For example, 'for a one-unit increase in education, the odds of *greater/increased/more* trust increases by a factor of..... I don't usually work with marginal effects, but there is a lot you can do with predicted probabilities. You should be able to use the scale() function, but I haven't thought through the impact on the interpretations... Thanks.

    • @patric001122
      @patric001122 2 года назад

      @@qssd Thank you so much for the fast answer, it helped a lot!

    • @qssd
      @qssd 2 года назад

      Welcome!

  • @Habalabaloooo
    @Habalabaloooo 2 года назад

    How do you rename the predictors from a glm output? Especially for factor vectors with levels the naming becomes burdensome.

    • @qssd
      @qssd 2 года назад

      Agreed, it is burdensome. You can use tidyverse functions mutate() and recode() to label the factor levels then pipe it (%>%) to the ggplot2 code, but it still requires you to write-out the labels at least once.

  • @loopyloup
    @loopyloup 2 года назад

    Great vid thank you. Very helpful for my term project.

    • @qssd
      @qssd 2 года назад

      Glad it was helpful!