I needed to update the rcode to load and clean the dataset to get the data ready for the analyses. Please use the updated rcode here to follow along: gist.github.com/musa5237/78a694bd6663a92a82e45e684e616724
I've watched many tutorials explaining propensity score matching on RUclips, and I can tell that this video is the best I've ever seen. Well done, sir. You helped me a lot. ❤❤❤❤
I am pretty desperate because i need to perform propensity matched analysis, having never used R-statistics, (used SPSS). But 15 minutes into this video i can already tell it's going to be extremely helpful!
Thank you for the comprehansive explanation. I have an issue with my PSA, the variance ratio doesn't appear when I use the summary function. I got dots only! could you please tell me why? Thank you. (All my covariates are categorical & Binary)
I had the same problem when I entered my covariates as factors into the formula, but variance ratios appeared once I converted them as.numeric. I don't know what that means in terms of interpretation though
@fleurestethique I noticed that the function to visualize the overrate imbalance love.plot() does not allow for categorical variables. However, you can still inspect the covariate imbalance when you use the summary() function.
Do you include both the quadratic and non quadratic terms in your propensity match? For example, if my quadratic term had a lower SDM, should I remove the non quadratic term and just include the quadratic one in my final model?
This depends on your data and the type of relationships you want to capture and what makes sense specifically for the data you are working with. If you have a quadratic term and quadratic term for your explanatory variable in the model, you are saying that the relationship between your response and the explanatory variables is quadratic and linear (i.e., your model captures both), but just keeping the quadratic term you are saying the relationship is just quadratic. Generally, if you want to capture wider scope of relationships you can leave both but be mindful this could lead to overfitting.
Hi, Thank you for video. I loaded dataset coll from the link that you have pinned and then ran the script from identify field names to adjust units for continuous variables. After running it makes all values as NULL in coll and makes coll2 as o obs. of 6 variables. what should i do?
Thank you very much for this clear explanation! I have a small question: would you use PSM to match patients to healthy controls in a cross-sectional case-controled study? I want to look at the difference in physical activity expressed in minutes per day (dependent variable) between these two groups. thank you!
great video!!! what will you advise I do if I have more 'treated than control' and the matching approach to use if treatment is not randomized; take for example a state legislation
You can try using K to 1 matching and optimization or you can try full matching. You can run both and compare which gives you better balance across your covariates.
This video is pretty informative. I have one question. In cov balancing plot using cobalt, we need to match both mean and variance stats? In my case mean us balanced with in the threshold but variance is not. Can i say that matching is balanced with mean balancing only?
It is good to have both, I presented only one set of criteria to use but there has been other suggested criteria. Also, recommendations in the literature are always changing. I would try some techniques to see if I get a better balance. But, if I cannot do a better job I would just report in the methods and discussion/limitation. Balancing the covariates will be a big part of the challenge to PS matching.
Hello, just saw your post. Did you run the code library(MatchIt) first with out running install.packages("MatchIt") I did not install it again because I already installed it before. I kept that line in the code but put the hash sign # first so it was there as a note. Try running it without the hash sign.
Great video. If I want to include in my analysis part some additional covariates which are not used for matching, how can I get it in my data after using match.data.
If you want to use additional variables in the analysis phase you can enter those additional variables in the final regression model that were not included in the matching process.
This was extremely helpful thank you so much! When working with subsets, should I calculate the propensity scores on the whole dataset first and then apply them on the subset or directly calculate the propensity scores only for observations in my subset? Also, the dataset I am working requires me to incorporate additional weights due to the way they did the sampling. How can I apply both the propensity score and the other weights in my regression? Thank you
I may need some more information on the nature of the dataset. But, generally, you could calculate PS for the whole dataset. For your other question about weights, not all PS matching methods produce weights. For example, if 1:1 matching without replacement is used, all the weights =1. But, if you are using a PS matching method that does produce weights and you already have a set of weights you need to apply -- there are a few things you can do. The issue is the 'weights' argument in the lm() function only allows you to use a vector. Now you may have a reason depending on the nature of your dataset to not use the whole dataset and consider subsets -- if that makes sense. Or you may want to consider combining the two sets of weights by multiplying; however, you would need to look at the weights produced and see whether they make sense, before carrying out your regression analysis. Ultimately, my suggestions are just general statements, you may want to consult with some other sources (e.g., previous PS analyses using your dataset or a similar dataset, content experts, etc.).
@@statsguidetree Thank you for the awesome video. I have a similar question. If I am using a data set that has survey design requirement. Do I carry out propensity score matching with just the sample data or with the weighted dataset?. Or can I just carry out the PS matching with the sample and in my final regression of the matched data I use the weighted data(survey design weight)
Thank you for informative video. I did full matching based on your video, and ran comparisons after propensity matching. But, mean, standard deviations and p score did not change at all compared to unmatched data. How can I solve this problem?
That is a good question, I assume you are talking about p-values in your final model post matching -- if that is the case, ultimately with PS matching you are attempting to just balance the data between your treatment and control groups to make more reliable interpretations of your final model. It could be that after balancing your data you find no average treatment effect.
I did the the first step (design phase: selecting covariates) but only 3 out of 14 are significant. And I want to know if it is considered balanced or not and what to do.
So if covariates are significant it won't be related to whether the values of those covariates are balanced across treatment conditions. To check balance you have to look at standardized mean difference and/or variance ratios values to see whether they are in some threshold you decide to use.
By select outcome means are you referring to checking balance and the ranges I used for the Standardized mean difference of between -.1 and +.1 and Variance ratios between .8 and 1.25? Those ranges were are general recommendations. Zhang et al. (2019) suggested something similar.
I can say that generally PS analyses can be conducted with non-binary treatment groups (i.e., treatment variable with more than 2 levels). But, I do not think the MatchIt package supports it (I could be wrong because it could have been updated). There is another package available if your treatment variable has 3 levels instead of 2 levels called TriMatch. I am not too familiar with the package but here is the general documentation: cran.r-project.org/web/packages/TriMatch/TriMatch.pdf
Please validate if this link has same data, which you have posted initially, since your link is no more accessible: LINK: ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_04262022.zip
I needed to update the rcode to load and clean the dataset to get the data ready for the analyses. Please use the updated rcode here to follow along: gist.github.com/musa5237/78a694bd6663a92a82e45e684e616724
What an amazing explanation!!! Hats off. You even provided the R-script. Super helpful! You saved my thesis, thank you so very much.
I've watched many tutorials explaining propensity score matching on RUclips, and I can tell that this video is the best I've ever seen.
Well done, sir. You helped me a lot.
❤❤❤❤
I am pretty desperate because i need to perform propensity matched analysis, having never used R-statistics, (used SPSS). But 15 minutes into this video i can already tell it's going to be extremely helpful!
Thank you so much for the compliment.
excellent tutorial i watched 3x
Very good guide! Thanks
Great video, thank you for that!
Thank you for the comprehansive explanation. I have an issue with my PSA, the variance ratio doesn't appear when I use the summary function. I got dots only! could you please tell me why? Thank you. (All my covariates are categorical & Binary)
I had the same problem when I entered my covariates as factors into the formula, but variance ratios appeared once I converted them as.numeric. I don't know what that means in terms of interpretation though
@fleurestethique I noticed that the function to visualize the overrate imbalance love.plot() does not allow for categorical variables. However, you can still inspect the covariate imbalance when you use the summary() function.
Do you include both the quadratic and non quadratic terms in your propensity match? For example, if my quadratic term had a lower SDM, should I remove the non quadratic term and just include the quadratic one in my final model?
This depends on your data and the type of relationships you want to capture and what makes sense specifically for the data you are working with. If you have a quadratic term and quadratic term for your explanatory variable in the model, you are saying that the relationship between your response and the explanatory variables is quadratic and linear (i.e., your model captures both), but just keeping the quadratic term you are saying the relationship is just quadratic. Generally, if you want to capture wider scope of relationships you can leave both but be mindful this could lead to overfitting.
thank you sm!!
Hi, Thank you for video. I loaded dataset coll from the link that you have pinned and then ran the script from identify field names to adjust units for continuous variables. After running it makes all values as NULL in coll and makes coll2 as o obs. of 6 variables. what should i do?
and also at line 136 #no psa, just regression if i run mod_test1
I suppose problem is here at line 22:
coll
Thank you very much for this clear explanation!
I have a small question: would you use PSM to match patients to healthy controls in a cross-sectional case-controled study? I want to look at the difference in physical activity expressed in minutes per day (dependent variable) between these two groups.
thank you!
Yes, PSM should always work when you have a control group.
great video!!! what will you advise I do if I have more 'treated than control' and the matching approach to use if treatment is not randomized; take for example a state legislation
You can try using K to 1 matching and optimization or you can try full matching. You can run both and compare which gives you better balance across your covariates.
This video is pretty informative. I have one question.
In cov balancing plot using cobalt, we need to match both mean and variance stats?
In my case mean us balanced with in the threshold but variance is not. Can i say that matching is balanced with mean balancing only?
It is good to have both, I presented only one set of criteria to use but there has been other suggested criteria. Also, recommendations in the literature are always changing. I would try some techniques to see if I get a better balance. But, if I cannot do a better job I would just report in the methods and discussion/limitation. Balancing the covariates will be a big part of the challenge to PS matching.
while im installing "MatchIt" it shows "There is no package called MatchIt". How to solve it?
Hello, just saw your post. Did you run the code library(MatchIt) first with out running install.packages("MatchIt") I did not install it again because I already installed it before. I kept that line in the code but put the hash sign # first so it was there as a note. Try running it without the hash sign.
@@statsguidetree Yes that's solved. Thanks!
Great video. If I want to include in my analysis part some additional covariates which are not used for matching, how can I get it in my data after using match.data.
If you want to use additional variables in the analysis phase you can enter those additional variables in the final regression model that were not included in the matching process.
This was extremely helpful thank you so much!
When working with subsets, should I calculate the propensity scores on the whole dataset first and then apply them on the subset or directly calculate the propensity scores only for observations in my subset?
Also, the dataset I am working requires me to incorporate additional weights due to the way they did the sampling. How can I apply both the propensity score and the other weights in my regression? Thank you
I may need some more information on the nature of the dataset. But, generally, you could calculate PS for the whole dataset. For your other question about weights, not all PS matching methods produce weights. For example, if 1:1 matching without replacement is used, all the weights =1. But, if you are using a PS matching method that does produce weights and you already have a set of weights you need to apply -- there are a few things you can do. The issue is the 'weights' argument in the lm() function only allows you to use a vector. Now you may have a reason depending on the nature of your dataset to not use the whole dataset and consider subsets -- if that makes sense. Or you may want to consider combining the two sets of weights by multiplying; however, you would need to look at the weights produced and see whether they make sense, before carrying out your regression analysis. Ultimately, my suggestions are just general statements, you may want to consult with some other sources (e.g., previous PS analyses using your dataset or a similar dataset, content experts, etc.).
@@statsguidetree Thank you for the awesome video. I have a similar question. If I am using a data set that has survey design requirement. Do I carry out propensity score matching with just the sample data or with the weighted dataset?. Or can I just carry out the PS matching with the sample and in my final regression of the matched data I use the weighted data(survey design weight)
Thank you for informative video. I did full matching based on your video, and ran comparisons after propensity matching. But, mean, standard deviations and p score did not change at all compared to unmatched data. How can I solve this problem?
That is a good question, I assume you are talking about p-values in your final model post matching -- if that is the case, ultimately with PS matching you are attempting to just balance the data between your treatment and control groups to make more reliable interpretations of your final model. It could be that after balancing your data you find no average treatment effect.
Just to clarify, the first covariate is the same as your dv?
I did the the first step (design phase: selecting covariates) but only 3 out of 14 are significant. And I want to know if it is considered balanced or not and what to do.
So if covariates are significant it won't be related to whether the values of those covariates are balanced across treatment conditions. To check balance you have to look at standardized mean difference and/or variance ratios values to see whether they are in some threshold you decide to use.
Can you please tell me how to select outcome means on what basis?
By select outcome means are you referring to checking balance and the ranges I used for the Standardized mean difference of between -.1 and +.1
and Variance ratios between .8 and 1.25? Those ranges were are general recommendations. Zhang et al. (2019) suggested something similar.
can we use categorical covariates e.g. 1 = male 2 = female or should they be dummy coded? Thank you
Yes. Categorical covariates can be included.
Thank you, this was really helpful!
Do you have any ideas about how I can approach this if I want to match three groups i.e. non-binary??
I can say that generally PS analyses can be conducted with non-binary treatment groups (i.e., treatment variable with more than 2 levels). But, I do not think the MatchIt package supports it (I could be wrong because it could have been updated). There is another package available if your treatment variable has 3 levels instead of 2 levels called TriMatch. I am not too familiar with the package but here is the general documentation: cran.r-project.org/web/packages/TriMatch/TriMatch.pdf
The data doesnt work anymore!
My apology for the delayed response, you can use the following code to load it into r: coll
Please validate if this link has same data, which you have posted initially, since your link is no more accessible:
LINK: ed-public-download.app.cloud.gov/downloads/CollegeScorecard_Raw_Data_04262022.zip
I will try to find a way to load the dataset on my GitHub. But, until then, I can email it you. Just send me an email at statsguidetree@gmail.com