More in this series 👇 Intro to Causal Effects: ruclips.net/video/BOPOX_mTS0g/видео.html Do-operator: ruclips.net/video/dejZzJIZdow/видео.html DAGs: ruclips.net/video/ASU5HG5EqTM/видео.html Regression techniques: ruclips.net/video/O72uByJlnMw/видео.html Intro to Causality: ruclips.net/video/WqASiuM4a-A/видео.html Causal Inference: ruclips.net/video/PFBI-ZfV5rs/видео.html Causal Discovery: ruclips.net/video/tufdEUSjmNI/видео.html
I have some problems when using pickle to read the 'df_propensity_score.p' file on your github, seems that my new version is not compatible with the pickle you used to save files. Can I get a new .p file or .csv file, thanks a lot!
Thank you for this. Can you please do a similar video or series on uplift modeling. There are lots of videos and literature explaining the concept but not enough examples of practical application.
hi sir what if i totally don't know the variables and want to measure the causal effect sub kpi to main kpi, in this case, I don't know what sub kpi should be classified as confounder . Can you gimme suggestion
Causal inference is design based, not model-based. If you don‘t know about your data generating process and don’t know what variables could confound your relationship, you can not estimate causal effects
Thanks for explaining Propensity Score Matching. How do I get the complete python code (Jupyter notebook) to run the nearest neighbourhood matching method. I already have my data
A notebook version of the example code is available here: github.com/ShawhinT/RUclips-Blog/blob/main/causality/propensity_score/propensity_score_example.ipynb
Good question. I haven't come across any special considerations for observational study sample sizes. Traditional sample size determination methods should be a good start.
Here's the original dataset I used for the example code: archive.ics.uci.edu/dataset/20/census+income A modified version is available at the GitHub repo: github.com/ShawhinT/RUclips-Blog/tree/main/propensity_score
Good question. This depends on your particular use case. However, broadly speaking, if computing the statistical significance is helpful then I'd say do it.
thank you for the great video! i have question, as you mention the original data i found the original data is different from what you use in this video. why did you make feature enginering to the data? please answer my question, thank you in advance
Thanks for the good question. The main reason is some variables were originally string types which cannot be used directly with the given library. Additionally, the use of boolean variables makes the example a bit easier to follow IMO.
thanks for your amazing video ,but i have a question can i use this line df=pd.read_csv("dataseta-casual90.csv" to load dataset instead of df=pickle.load(open('df_prospenty_score.p','rb'))
You may need to do some additional data prep after reading in the .csv file, but that should work! The library I use here works with pandas dataframes.
@@shadiaelgazzar9195 if you are reading in a raw csv with pandas you may need to do data prep like: checking dtypes, handling missing values, looking out for outliers, etc.
@@ShawhinTalebi i used this line tp load the dataset df=pd.read_csv("dataseta-casual90.csv") when i run the file it gives me: Exception: Propensity score methods are applicable only for binary treatments
This is really wonderful, but I'd like to offer a suggestion. The suggestion has to do with cadence (i.e., the musical quality of speech and where you place emphasis in the sentence). When you start out the video, you are speaking in a fairly natural way and you use emphasis in an appropriate way. In other words, if you emphasize what needs to be emphasized. However, by the half-way point in the video (especially in the section about programming), you slip into an ossified, mechanical cadence which is repetitive, static, and which is divorced from what needs to be emphasized. The emphasis is no longer falling on the term / idea which needs emphasizing, you're just emphasizing because you happen to be near the end of the sentence. It begins to feel like you're trying to hypnotize us - like a tour guide who has offered the tour too many times to be awed by what is being shown. The programming part is really important - please don't drone thru it. Use cadence and emphasis to signal to us that you're engaged, that this is important, and that we should be paying attention. To me, cadence and emphasis are important signalers in a presentation - that is why parents (when reading to children) use cadence to amplify what is happening in the story. If you're placing emphasis on words which don't need them, you are effectively confusing or misdirecting the listener. It is like reading a story to a child, but using a spooky-voice when telling a story about a beach trip and a happy voice when talking about monsters under the bed. If the programming is important, don't use a cadence reminiscent of someone snoring. In your case, your cadence during these hypnotic periods is characterized by low, flat, fairly rapid rolling speech terminating in a loud WORD trailed by a deflation. It is almost like you are bored of this part or want to get thru it quickly and that it is not very important. blah-blah-blah-BLAHhh. blah-blah-blah-BLAHhh. Please embrace that your cadence is a powerful tool for you to wield wisely. Don't let your cadence confess to us that you're bored. As silly as it sounds, your cadence helps students retain information.
Thanks for the feedback. Developing my communication skills is a big focus for me, and cadence is a key part of that skill set. I’ll definitely be more critical of that aspect for future videos.
@@ShawhinTalebi Thank you. I worried my suggestion would come across as harsh (which was not my intention). You seem to be more mature than most and may have seen something of value in the suggestion. Thank you.
I'll offer some constructive criticism regarding the cadence of your delivery. Obviously, you can take it or leave it. When starting to speak a sentence you speak fairly rapidly with a low volume, and low variability, but near the end of each sentence, you select one of the penultimate words upon which you will vastly increase the volume and slow the pace. Then start the next sentence very low and rapid and again end slow and loud. The challenge is that it is not as though you placed an emphasis on a term that is particularly worthy of emphasis - it just happens to be one of the words at the end of a sentence. It is just a rote pattern played over and over and over. It reminds me of a tour-bus operator who has been giving the same tour for the last decade and is bored to tears. For me, personally, I find this pattern very irritating and a little like Chinese water torture and since the emphasis is placed on terms not requiring emphasis, my attention is diverted to low-information terms. This is probably the natural way you speak and not something you can change. But just compare you delivery to the delivery of news casters, actors, comedians, any other entity that presents information verbally - I would argue that this terminal-spiking patter pattern is not optimal for the listener.
More on Propensity Scores 👇
📰Read More: towardsdatascience.com/propensity-score-5c29c480130c?sk=45f0ec6803eba962c0d2d0162185741d
💻Example Code: github.com/ShawhinT/RUclips-Blog/tree/main/causality/propensity_score
More in this series 👇
Intro to Causal Effects: ruclips.net/video/BOPOX_mTS0g/видео.html
Do-operator: ruclips.net/video/dejZzJIZdow/видео.html
DAGs: ruclips.net/video/ASU5HG5EqTM/видео.html
Regression techniques: ruclips.net/video/O72uByJlnMw/видео.html
Intro to Causality: ruclips.net/video/WqASiuM4a-A/видео.html
Causal Inference: ruclips.net/video/PFBI-ZfV5rs/видео.html
Causal Discovery: ruclips.net/video/tufdEUSjmNI/видео.html
Thanks for taking time to put this video, appreciate the working example in addition to theory.
Great video as always! Looking forward to more. 😃
Thank you!
I have some problems when using pickle to read the 'df_propensity_score.p' file on your github, seems that my new version is not compatible with the pickle you used to save files. Can I get a new .p file or .csv file, thanks a lot!
Thanks for raising this! Would you mind submitting an issue on the GitHub repo?
github.com/ShawhinT/RUclips-Blog/issues
Thank you for this. Can you please do a similar video or series on uplift modeling. There are lots of videos and literature explaining the concept but not enough examples of practical application.
Great suggestion! I'll add that to the list :)
hi sir what if i totally don't know the variables and want to measure the causal effect sub kpi to main kpi, in this case, I don't know what sub kpi should be classified as confounder . Can you gimme suggestion
That sounds interesting! I might need more content before giving a suggestion. Feel free to email me here: shawhint.github.io/connect.html
Causal inference is design based, not model-based. If you don‘t know about your data generating process and don’t know what variables could confound your relationship, you can not estimate causal effects
Thanks for explaining Propensity Score Matching. How do I get the complete python code (Jupyter notebook) to run the nearest neighbourhood matching method. I already have my data
A notebook version of the example code is available here: github.com/ShawhinT/RUclips-Blog/blob/main/causality/propensity_score/propensity_score_example.ipynb
@@ShawhinTalebi thanks a lot. But I am only familiar with Jupyter notebook IDLE
While working with observational data, how we decide how much sample size for results to be statistically significance?
Good question. I haven't come across any special considerations for observational study sample sizes. Traditional sample size determination methods should be a good start.
Hey, is there any dataset that you could recommend, where I can work on this method?
Here's the original dataset I used for the example code: archive.ics.uci.edu/dataset/20/census+income
A modified version is available at the GitHub repo: github.com/ShawhinT/RUclips-Blog/tree/main/propensity_score
do you need to test if the result is statistically significant?
Good question. This depends on your particular use case. However, broadly speaking, if computing the statistical significance is helpful then I'd say do it.
thank you for the great video! i have question, as you mention the original data i found the original data is different from what you use in this video. why did you make feature enginering to the data? please answer my question, thank you in advance
Thanks for the good question. The main reason is some variables were originally string types which cannot be used directly with the given library. Additionally, the use of boolean variables makes the example a bit easier to follow IMO.
thanks for your amazing video ,but i have a question can i use this line df=pd.read_csv("dataseta-casual90.csv"
to load dataset instead of
df=pickle.load(open('df_prospenty_score.p','rb'))
You may need to do some additional data prep after reading in the .csv file, but that should work!
The library I use here works with pandas dataframes.
@@ShawhinTalebi additional data prep like what? I used
The pickle and give me error
@@shadiaelgazzar9195 if you are reading in a raw csv with pandas you may need to do data prep like: checking dtypes, handling missing values, looking out for outliers, etc.
@@shadiaelgazzar9195 what error are you getting?
@@ShawhinTalebi i used this line tp load the dataset
df=pd.read_csv("dataseta-casual90.csv")
when i run the file it gives me:
Exception: Propensity score methods are applicable only for binary treatments
Is there any email that I can contact you? I am working on PSM!
You can email me here: shawhintalebi.com/contact/
Thank you for such digestible and concise video!
Do you mind if I shoot you an email with a couple q's?
Happy to help! Feel free to email me here: www.shawhintalebi.com/contact
Thank you! Will do later next week!
@@ShawhinTalebi
This is really wonderful, but I'd like to offer a suggestion. The suggestion has to do with cadence (i.e., the musical quality of speech and where you place emphasis in the sentence).
When you start out the video, you are speaking in a fairly natural way and you use emphasis in an appropriate way. In other words, if you emphasize what needs to be emphasized. However, by the half-way point in the video (especially in the section about programming), you slip into an ossified, mechanical cadence which is repetitive, static, and which is divorced from what needs to be emphasized. The emphasis is no longer falling on the term / idea which needs emphasizing, you're just emphasizing because you happen to be near the end of the sentence. It begins to feel like you're trying to hypnotize us - like a tour guide who has offered the tour too many times to be awed by what is being shown. The programming part is really important - please don't drone thru it. Use cadence and emphasis to signal to us that you're engaged, that this is important, and that we should be paying attention.
To me, cadence and emphasis are important signalers in a presentation - that is why parents (when reading to children) use cadence to amplify what is happening in the story. If you're placing emphasis on words which don't need them, you are effectively confusing or misdirecting the listener. It is like reading a story to a child, but using a spooky-voice when telling a story about a beach trip and a happy voice when talking about monsters under the bed. If the programming is important, don't use a cadence reminiscent of someone snoring.
In your case, your cadence during these hypnotic periods is characterized by low, flat, fairly rapid rolling speech terminating in a loud WORD trailed by a deflation. It is almost like you are bored of this part or want to get thru it quickly and that it is not very important. blah-blah-blah-BLAHhh. blah-blah-blah-BLAHhh.
Please embrace that your cadence is a powerful tool for you to wield wisely. Don't let your cadence confess to us that you're bored. As silly as it sounds, your cadence helps students retain information.
Thanks for the feedback. Developing my communication skills is a big focus for me, and cadence is a key part of that skill set. I’ll definitely be more critical of that aspect for future videos.
@@ShawhinTalebi Thank you. I worried my suggestion would come across as harsh (which was not my intention). You seem to be more mature than most and may have seen something of value in the suggestion. Thank you.
Thanks. Growth is important to me, so I value (and often prefer) critical feedback.
I'll offer some constructive criticism regarding the cadence of your delivery. Obviously, you can take it or leave it. When starting to speak a sentence you speak fairly rapidly with a low volume, and low variability, but near the end of each sentence, you select one of the penultimate words upon which you will vastly increase the volume and slow the pace. Then start the next sentence very low and rapid and again end slow and loud. The challenge is that it is not as though you placed an emphasis on a term that is particularly worthy of emphasis - it just happens to be one of the words at the end of a sentence. It is just a rote pattern played over and over and over. It reminds me of a tour-bus operator who has been giving the same tour for the last decade and is bored to tears. For me, personally, I find this pattern very irritating and a little like Chinese water torture and since the emphasis is placed on terms not requiring emphasis, my attention is diverted to low-information terms.
This is probably the natural way you speak and not something you can change. But just compare you delivery to the delivery of news casters, actors, comedians, any other entity that presents information verbally - I would argue that this terminal-spiking patter pattern is not optimal for the listener.
That's good feedback. Were there specific points from this video where this pattern stood out most?