Propensity Score Analysis in R with Nearest Neighbor, Optimal Pair, and Optimal Full Matching

Поделиться
HTML-код
  • Опубликовано: 28 янв 2025

Комментарии • 46

  • @statsguidetree
    @statsguidetree  Год назад +4

    I needed to update the rcode to load and clean the dataset to get the data ready for the analyses. Please use the updated rcode here to follow along: gist.github.com/musa5237/78a694bd6663a92a82e45e684e616724

  • @sanjanakhondaker887
    @sanjanakhondaker887 Год назад +1

    What an amazing explanation!!! Hats off. You even provided the R-script. Super helpful! You saved my thesis, thank you so very much.

  • @muhammedhadedy4570
    @muhammedhadedy4570 Год назад +1

    I've watched many tutorials explaining propensity score matching on RUclips, and I can tell that this video is the best I've ever seen.
    Well done, sir. You helped me a lot.
    ❤❤❤❤

  • @basser1995
    @basser1995 3 года назад +6

    I am pretty desperate because i need to perform propensity matched analysis, having never used R-statistics, (used SPSS). But 15 minutes into this video i can already tell it's going to be extremely helpful!

  • @lanredaodu945
    @lanredaodu945 6 месяцев назад +1

    excellent tutorial i watched 3x

  • @brainwt
    @brainwt 5 месяцев назад

    Very good guide! Thanks

  • @analyticspipeline2526
    @analyticspipeline2526 2 года назад +1

    Great video, thank you for that!

  • @amalalkalbani4572
    @amalalkalbani4572 2 года назад +2

    Thank you for the comprehansive explanation. I have an issue with my PSA, the variance ratio doesn't appear when I use the summary function. I got dots only! could you please tell me why? Thank you. (All my covariates are categorical & Binary)

    • @fleurestethique
      @fleurestethique 2 года назад +1

      I had the same problem when I entered my covariates as factors into the formula, but variance ratios appeared once I converted them as.numeric. I don't know what that means in terms of interpretation though

    • @statsguidetree
      @statsguidetree  2 года назад +2

      ​ @fleurestethique I noticed that the function to visualize the overrate imbalance love.plot() does not allow for categorical variables. However, you can still inspect the covariate imbalance when you use the summary() function.

  • @alexwisniewski7105
    @alexwisniewski7105 Год назад +1

    Do you include both the quadratic and non quadratic terms in your propensity match? For example, if my quadratic term had a lower SDM, should I remove the non quadratic term and just include the quadratic one in my final model?

    • @statsguidetree
      @statsguidetree  Год назад

      This depends on your data and the type of relationships you want to capture and what makes sense specifically for the data you are working with. If you have a quadratic term and quadratic term for your explanatory variable in the model, you are saying that the relationship between your response and the explanatory variables is quadratic and linear (i.e., your model captures both), but just keeping the quadratic term you are saying the relationship is just quadratic. Generally, if you want to capture wider scope of relationships you can leave both but be mindful this could lead to overfitting.

  • @francyy-ug1qr
    @francyy-ug1qr 7 месяцев назад +1

    thank you sm!!

  • @hasanhash12
    @hasanhash12 Год назад +1

    Hi, Thank you for video. I loaded dataset coll from the link that you have pinned and then ran the script from identify field names to adjust units for continuous variables. After running it makes all values as NULL in coll and makes coll2 as o obs. of 6 variables. what should i do?

    • @hasanhash12
      @hasanhash12 Год назад

      and also at line 136 #no psa, just regression if i run mod_test1

    • @hasanhash12
      @hasanhash12 Год назад

      I suppose problem is here at line 22:
      coll

  • @manonkinaupenne2090
    @manonkinaupenne2090 Год назад +1

    Thank you very much for this clear explanation!
    I have a small question: would you use PSM to match patients to healthy controls in a cross-sectional case-controled study? I want to look at the difference in physical activity expressed in minutes per day (dependent variable) between these two groups.
    thank you!

    • @statsguidetree
      @statsguidetree  Год назад

      Yes, PSM should always work when you have a control group.

  • @festusattah8612
    @festusattah8612 Год назад +1

    great video!!! what will you advise I do if I have more 'treated than control' and the matching approach to use if treatment is not randomized; take for example a state legislation

    • @statsguidetree
      @statsguidetree  Год назад

      You can try using K to 1 matching and optimization or you can try full matching. You can run both and compare which gives you better balance across your covariates.

  • @vikasmishra4485
    @vikasmishra4485 2 года назад +2

    This video is pretty informative. I have one question.
    In cov balancing plot using cobalt, we need to match both mean and variance stats?
    In my case mean us balanced with in the threshold but variance is not. Can i say that matching is balanced with mean balancing only?

    • @statsguidetree
      @statsguidetree  2 года назад

      It is good to have both, I presented only one set of criteria to use but there has been other suggested criteria. Also, recommendations in the literature are always changing. I would try some techniques to see if I get a better balance. But, if I cannot do a better job I would just report in the methods and discussion/limitation. Balancing the covariates will be a big part of the challenge to PS matching.

  • @priyankaroy7243
    @priyankaroy7243 2 года назад +2

    while im installing "MatchIt" it shows "There is no package called MatchIt". How to solve it?

    • @statsguidetree
      @statsguidetree  2 года назад

      Hello, just saw your post. Did you run the code library(MatchIt) first with out running install.packages("MatchIt") I did not install it again because I already installed it before. I kept that line in the code but put the hash sign # first so it was there as a note. Try running it without the hash sign.

    • @priyankaroy3686
      @priyankaroy3686 2 года назад

      @@statsguidetree Yes that's solved. Thanks!

  • @praveena6095
    @praveena6095 2 года назад +1

    Great video. If I want to include in my analysis part some additional covariates which are not used for matching, how can I get it in my data after using match.data.

    • @statsguidetree
      @statsguidetree  2 года назад

      If you want to use additional variables in the analysis phase you can enter those additional variables in the final regression model that were not included in the matching process.

  • @fleurestethique
    @fleurestethique 2 года назад +2

    This was extremely helpful thank you so much!
    When working with subsets, should I calculate the propensity scores on the whole dataset first and then apply them on the subset or directly calculate the propensity scores only for observations in my subset?
    Also, the dataset I am working requires me to incorporate additional weights due to the way they did the sampling. How can I apply both the propensity score and the other weights in my regression? Thank you

    • @statsguidetree
      @statsguidetree  2 года назад

      I may need some more information on the nature of the dataset. But, generally, you could calculate PS for the whole dataset. For your other question about weights, not all PS matching methods produce weights. For example, if 1:1 matching without replacement is used, all the weights =1. But, if you are using a PS matching method that does produce weights and you already have a set of weights you need to apply -- there are a few things you can do. The issue is the 'weights' argument in the lm() function only allows you to use a vector. Now you may have a reason depending on the nature of your dataset to not use the whole dataset and consider subsets -- if that makes sense. Or you may want to consider combining the two sets of weights by multiplying; however, you would need to look at the weights produced and see whether they make sense, before carrying out your regression analysis. Ultimately, my suggestions are just general statements, you may want to consult with some other sources (e.g., previous PS analyses using your dataset or a similar dataset, content experts, etc.).

    • @velonty
      @velonty 4 месяца назад

      @@statsguidetree Thank you for the awesome video. I have a similar question. If I am using a data set that has survey design requirement. Do I carry out propensity score matching with just the sample data or with the weighted dataset?. Or can I just carry out the PS matching with the sample and in my final regression of the matched data I use the weighted data(survey design weight)

  • @김수연-h2r5k
    @김수연-h2r5k 2 года назад +1

    Thank you for informative video. I did full matching based on your video, and ran comparisons after propensity matching. But, mean, standard deviations and p score did not change at all compared to unmatched data. How can I solve this problem?

    • @statsguidetree
      @statsguidetree  Год назад

      That is a good question, I assume you are talking about p-values in your final model post matching -- if that is the case, ultimately with PS matching you are attempting to just balance the data between your treatment and control groups to make more reliable interpretations of your final model. It could be that after balancing your data you find no average treatment effect.

  • @lizhang9898
    @lizhang9898 3 месяца назад

    Just to clarify, the first covariate is the same as your dv?

  • @user-iq2qr8lb2y
    @user-iq2qr8lb2y Год назад

    I did the the first step (design phase: selecting covariates) but only 3 out of 14 are significant. And I want to know if it is considered balanced or not and what to do.

    • @statsguidetree
      @statsguidetree  Год назад

      So if covariates are significant it won't be related to whether the values of those covariates are balanced across treatment conditions. To check balance you have to look at standardized mean difference and/or variance ratios values to see whether they are in some threshold you decide to use.

  • @mahnoorjadoon2674
    @mahnoorjadoon2674 Месяц назад

    Can you please tell me how to select outcome means on what basis?

    • @statsguidetree
      @statsguidetree  27 дней назад

      By select outcome means are you referring to checking balance and the ranges I used for the Standardized mean difference of between -.1 and +.1
      and Variance ratios between .8 and 1.25? Those ranges were are general recommendations. Zhang et al. (2019) suggested something similar.

  • @SCaRaB6288
    @SCaRaB6288 2 года назад +1

    can we use categorical covariates e.g. 1 = male 2 = female or should they be dummy coded? Thank you

    • @statsguidetree
      @statsguidetree  2 года назад +1

      Yes. Categorical covariates can be included.

  • @sharmilibalarajah1940
    @sharmilibalarajah1940 2 года назад +2

    Thank you, this was really helpful!
    Do you have any ideas about how I can approach this if I want to match three groups i.e. non-binary??

    • @statsguidetree
      @statsguidetree  2 года назад

      I can say that generally PS analyses can be conducted with non-binary treatment groups (i.e., treatment variable with more than 2 levels). But, I do not think the MatchIt package supports it (I could be wrong because it could have been updated). There is another package available if your treatment variable has 3 levels instead of 2 levels called TriMatch. I am not too familiar with the package but here is the general documentation: cran.r-project.org/web/packages/TriMatch/TriMatch.pdf

  • @katieweir4166
    @katieweir4166 Год назад

    The data doesnt work anymore!

    • @statsguidetree
      @statsguidetree  Год назад

      My apology for the delayed response, you can use the following code to load it into r: coll

  • @maddybond007
    @maddybond007 2 года назад +1

    Please validate if this link has same data, which you have posted initially, since your link is no more accessible:
    LINK: ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_04262022.zip

    • @statsguidetree
      @statsguidetree  2 года назад +1

      I will try to find a way to load the dataset on my GitHub. But, until then, I can email it you. Just send me an email at statsguidetree@gmail.com