I was asked offline how to create two kernel density plots on one graphic. I provide two solutions below. One for just two kernel density plots and one more generalizable to more than two plots. Here is part of an email exchange I had showing how to do this: There are at least two ways to create the visualization you want. First, "kdensity" has an option called "addplot" that provides a way to add an additional graph to your kernel density plot. This option shows up in some other graphics as well. So, if you really only have two groups, the following example shows how to use "addplot": /* Addplot example */ sysuse auto, clear kdensity mpg if foreign==0, addplot(kdensity mpg if foreign==1) name(g1, replace) Alternatively, and if you have more than two groups, you can use "kdensity" to produce the underlying data and plot them using "graph twoway" commands. This looks like this: /* How to generate kernel density data for plotting */ kdensity mpg if foreign==0, generate(x0 y0) nograph kdensity mpg if foreign==1, generate(x1 y1) nograph graph twoway (line y0 x0) (line y1 x1), name(g2, replace) Does that make sense? Will it work for you? Sincerely, Alan Neustadtl
Hi , thank you so much for the video. I have a question, what is the zero hypothesis when you compare the error graph with normal distribution graph ( what is p-value for)
Hi Alan, Thanks for posting this video, it is easy to understand. I am wondering will you be willing to share us a do.file of out multiple graphs on a single output. I was trying to do this and have some issues. Your template will be super useful. Thanks!
Hi Xu, you didn't specify the kind of graphs you wanted to have on one plot so I have two examples below. Because you watched the kernel density plot video I first show how to add a histogram to a kernel density plot. Because these are older, legacy commands for plots, Stata provided an "addplot()" option for kdensity. In later versions of Stata the graphics engine was updated and now most of the plots are produced using the "graph twoway" command followed by a graph type. This is very flexible. So, second, I provide some code that produces four graphs in one visualization. Hopefully, this code will get you started and point to areas of the manual you can explore for more details. Here is the code: sysuse auto, clear kdensity mpg hist mpg kdensity mpg, addplot(hist mpg) #d ; graph twoway (scatter mpg weight if foreign==0) (scatter mpg weight if foreign==1) (qfit mpg weight if foreign==0) (qfit mpg weight if foreign==1); #d cr
@@smilex3 I use this approach to overlay kdensity to a categorical variable female=1 and female=0 however, the legend does not indicate which curve is which, i could like to distinguish the two curves one for male and female respectively, how do i do that
Thank you very much Alan, you really helped me! My dissertation is gonna be way better ilustrated now, thanks to you! Can I ask you to share the examples you used?
Thank you very much for your tutorial. This is so concise yet so comprehensive. I am wondering if you could help me in drawing bivariate kernel density plot? Specifically, I have data of a variable for different countries for two years. I want to take values of variable in year 1 on x-axis and values of variable in year 2 on y-axis. Please! help.
Hi Faisal, I am not certain I understand exactly what you want to do, but I show you some Stata cod below that may point you in a good direction. One way of reading your question is that you want to produce two kernel density plote to compare distributions across time. I show two solutions below. The first uses the addplot() notion and the second uses graph twoway density with the by() option. But, if you really want to plot the density data from one year against the other, I show a third solution using kdensity to generate new variables with the plotted data values that you can then use with any Stata graphing command. Here, I use graph twoway scatter. Hopefully something below helps you: use www.stata-press.com/data/r17/nlswork.dta kdensity hours if year==87, addplot(kdensity hours if year==68) scheme(s2mono) kdensity hours if year==87, addplot(kdensity hours if year==68) scheme(s2mono) two kdensity hours if year==68 | year==87, by(year) kdensity hours if year==68, generate(x68 d68) kdensity hours if year==87, generate(x87 d87) graph twoway scatter d68 d87
@@smilex3 Thank you very much for your reply. Unfortunately, I could not make you understand what I wanted to do. My problem is; assume, I have income data of 80 individuals in year 1996 and 2002 and I want to analyze the evolution of distribution between 1996 to 2002. I want to do this to see whether their earning differential has reduced or increased or stayed the same in this period. I wrongly told you earlier that I want to do bivariate kernel density analysis. I have done some readings later and I came to know that i have to perform stochastic Kernal analysis instead of bivariate kernel density estimation. Stochastic Kernal estimation will give me a three dimensional graph with year 1996 on x-axis and year 2002 on y-axis and density in third dimension. Normally, this is accompanied with the contour plot as well. Please guide me how this can be implemented in Stata. I will be grateful to you.
Hi, Angel. If you want to do this in the context of kdensity you might be able to use the "addplot" option. Here is an example: sysuse auto, clear separate mpg, by(foreign) kdensity mpg0, addplot(kdensity mpg1)
thank you very much this was great!! I have emailed you (on the address at the end of this video) regarding a few questions, which would really help with my dissertation. I look forward to hearing from you :)
Thank you so much Sir! But why do I get a more bumpy density when I just use "kdensity" only, yet a smoother when I add the "twoway" (other options the same). Is it the case that "kdensity" is an estimate while adding "twoway" before it changes it to a descriptive? Can anybody help me on this?
Hmmm...I never noticed that before. I looked through the help file but a deeper dive in the pdf manuals might explain the difference. Also, a call to Stata support would probably get you an answer to your interesting question. At first I thought that the default settings might be different across the two plots, but that does not appear to be the case. I created two kernel density plots with the same settings and while quite similar, there are some differences. Here is the Stata code I used to test this idea: sysuse auto, clear kdensity mpg, bwidth(1.9746) kernel(epanechnikov) n(300) /// title("Default kdensity plot") /// scheme(s2mono) /// name(kdens, replace) /// nodraw tw kdensity mpg, bwidth(1.9746) kernel(epanechnikov) n(300) /// title("Default twoway kdensity plot") /// xtitle("Mileage (mpg)") /// note("kernel = epanechnikov, bandwidth = 1.9746") /// scheme(s2mono) /// name(kdenstw, replace) /// nodraw
graph combine kdens kdenstw, nocopies ycommon xcommon scheme(s2mono) Looking at the pdf help files I found this "graph twoway kdensity varname uses the kdensity command to obtain an estimate of the density of varname and uses graph twoway line to plot the result." So, it is possible that how the line is produced is different and leads to this issue. I thought this could be tracked down by generating the density data points but this is not an option in twoway kdensity. So, it might just be about how the plot is produced using tw line versus om other method.
@@smilex3 Thank you so much for an answer lots of care and attempt! I just like to spot some details that way, which slows me down on STATA sometimes ^^! But I guess "twoway kdensity" is still the best way to have the 1st look at a variable.
Are you looking to do full Mahalanobis matching and propensity score matching methods? If so, from the command window type "findit psmatch2" and "findit pstest" to read the help file and optionally install two user-written Stata applications.
bhabesh, I can recommend three possibilities that come to mind. In order they are: 1. Use -kdensity- with the -addplot- option 2. Produce a side-by-side plot using -graph combine-. 3. Use -kdensity- to generate the density data and plot those using -graph twoway- commands. I have three examples below showing each of these methods. Best wishes, Alan /* Stata code begins here */ sysuse citytemp4.dta, clear kdensity heatdd if region==1, addplot(kdensity heatdd if region==3) name(comb1, replace) kdensity heatdd if region==1, name(kdens1, replace) nodraw kdensity heatdd if region==3, name(kdens3, replace) nodraw graph combine kdens1 kdens3, name(comb2, replace) kdensity heatdd if region==1, generate(x1 d1) nodraw kdensity heatdd if region==3, generate(x2 d2) nodraw graph twoway (line d1 x1) (line d2 x2) /* Stata code ends here */
A very concise, quick and useful tutorial. Thank you!
I was asked offline how to create two kernel density plots on one graphic. I provide two solutions below. One for just two kernel density plots and one more generalizable to more than two plots.
Here is part of an email exchange I had showing how to do this:
There are at least two ways to create the visualization you want. First, "kdensity" has an option called "addplot" that provides a way to add an additional graph to your kernel density plot. This option shows up in some other graphics as well. So, if you really only have two groups, the following example shows how to use "addplot":
/* Addplot example */
sysuse auto, clear
kdensity mpg if foreign==0, addplot(kdensity mpg if foreign==1) name(g1, replace)
Alternatively, and if you have more than two groups, you can use "kdensity" to produce the underlying data and plot them using "graph twoway" commands. This looks like this:
/* How to generate kernel density data for plotting */
kdensity mpg if foreign==0, generate(x0 y0) nograph
kdensity mpg if foreign==1, generate(x1 y1) nograph
graph twoway (line y0 x0) (line y1 x1), name(g2, replace)
Does that make sense? Will it work for you?
Sincerely,
Alan Neustadtl
Hi , thank you so much for the video. I have a question, what is the zero hypothesis when you compare the error graph with normal distribution graph ( what is p-value for)
Thank you so much, Alan. Seven years later and you saved my life.
Hi Alan,
Thanks for posting this video, it is easy to understand. I am wondering will you be willing to share us a do.file of out multiple graphs on a single output. I was trying to do this and have some issues. Your template will be super useful. Thanks!
Hi Xu, you didn't specify the kind of graphs you wanted to have on one plot so I have two examples below. Because you watched the kernel density plot video I first show how to add a histogram to a kernel density plot. Because these are older, legacy commands for plots, Stata provided an "addplot()" option for kdensity.
In later versions of Stata the graphics engine was updated and now most of the plots are produced using the "graph twoway" command followed by a graph type. This is very flexible. So, second, I provide some code that produces four graphs in one visualization. Hopefully, this code will get you started and point to areas of the manual you can explore for more details. Here is the code:
sysuse auto, clear
kdensity mpg
hist mpg
kdensity mpg, addplot(hist mpg)
#d ;
graph twoway (scatter mpg weight if foreign==0)
(scatter mpg weight if foreign==1)
(qfit mpg weight if foreign==0)
(qfit mpg weight if foreign==1);
#d cr
@@smilex3 I use this approach to overlay kdensity to a categorical variable female=1 and female=0 however, the legend does not indicate which curve is which, i could like to distinguish the two curves one for male and female respectively, how do i do that
thanks! very helpful! loved the jingle.
Thank you very much Alan, you really helped me! My dissertation is gonna be way better ilustrated now, thanks to you! Can I ask you to share the examples you used?
Andre,
I'm glad the video was useful. Good luck on your dissertation!
Great work, Are you able to share that do file?
Thank you very much for your tutorial. This is so concise yet so comprehensive. I am wondering if you could help me in drawing bivariate kernel density plot? Specifically, I have data of a variable for different countries for two years. I want to take values of variable in year 1 on x-axis and values of variable in year 2 on y-axis. Please! help.
Hi Faisal, I am not certain I understand exactly what you want to do, but I show you some Stata cod below that may point you in a good direction. One way of reading your question is that you want to produce two kernel density plote to compare distributions across time. I show two solutions below. The first uses the addplot() notion and the second uses graph twoway density with the by() option. But, if you really want to plot the density data from one year against the other, I show a third solution using kdensity to generate new variables with the plotted data values that you can then use with any Stata graphing command. Here, I use graph twoway scatter. Hopefully something below helps you:
use www.stata-press.com/data/r17/nlswork.dta
kdensity hours if year==87, addplot(kdensity hours if year==68) scheme(s2mono)
kdensity hours if year==87, addplot(kdensity hours if year==68) scheme(s2mono)
two kdensity hours if year==68 | year==87, by(year)
kdensity hours if year==68, generate(x68 d68)
kdensity hours if year==87, generate(x87 d87)
graph twoway scatter d68 d87
@@smilex3 Thank you very much for your reply. Unfortunately, I could not make you understand what I wanted to do. My problem is; assume, I have income data of 80 individuals in year 1996 and 2002 and I want to analyze the evolution of distribution between 1996 to 2002. I want to do this to see whether their earning differential has reduced or increased or stayed the same in this period. I wrongly told you earlier that I want to do bivariate kernel density analysis. I have done some readings later and I came to know that i have to perform stochastic Kernal analysis instead of bivariate kernel density estimation. Stochastic Kernal estimation will give me a three dimensional graph with year 1996 on x-axis and year 2002 on y-axis and density in third dimension. Normally, this is accompanied with the contour plot as well.
Please guide me how this can be implemented in Stata. I will be grateful to you.
Hello Alan. Thanks for your video. I was wondering if you know the command I can use to plot differences between distributions.
Hi, Angel. If you want to do this in the context of kdensity you might be able to use the "addplot" option. Here is an example:
sysuse auto, clear
separate mpg, by(foreign)
kdensity mpg0, addplot(kdensity mpg1)
thank you so much alan!
thank you very much this was great!! I have emailed you (on the address at the end of this video) regarding a few questions, which would really help with my dissertation. I look forward to hearing from you :)
Thank you so much Sir!
But why do I get a more bumpy density when I just use "kdensity" only, yet a smoother when I add the "twoway" (other options the same). Is it the case that "kdensity" is an estimate while adding "twoway" before it changes it to a descriptive? Can anybody help me on this?
Hmmm...I never noticed that before. I looked through the help file but a deeper dive in the pdf manuals might explain the difference. Also, a call to Stata support would probably get you an answer to your interesting question.
At first I thought that the default settings might be different across the two plots, but that does not appear to be the case. I created two kernel density plots with the same settings and while quite similar, there are some differences. Here is the Stata code I used to test this idea:
sysuse auto, clear
kdensity mpg, bwidth(1.9746) kernel(epanechnikov) n(300) ///
title("Default kdensity plot") ///
scheme(s2mono) ///
name(kdens, replace) ///
nodraw
tw kdensity mpg, bwidth(1.9746) kernel(epanechnikov) n(300) ///
title("Default twoway kdensity plot") ///
xtitle("Mileage (mpg)") ///
note("kernel = epanechnikov, bandwidth = 1.9746") ///
scheme(s2mono) ///
name(kdenstw, replace) ///
nodraw
graph combine kdens kdenstw, nocopies ycommon xcommon scheme(s2mono)
Looking at the pdf help files I found this "graph twoway kdensity varname uses the kdensity command to obtain an estimate of the density of varname and uses graph twoway line to plot the result."
So, it is possible that how the line is produced is different and leads to this issue. I thought this could be tracked down by generating the density data points but this is not an option in twoway kdensity.
So, it might just be about how the plot is produced using tw line versus om other method.
@@smilex3 Thank you so much for an answer lots of care and attempt! I just like to spot some details that way, which slows me down on STATA sometimes ^^!
But I guess "twoway kdensity" is still the best way to have the 1st look at a variable.
thank you for your post this very intersting vedio. i would like ask you how can i conduct pstest
regards
Are you looking to do full Mahalanobis matching and propensity
score matching methods? If so, from the command window type "findit psmatch2" and "findit pstest" to read the help file and optionally install two user-written Stata applications.
helpful video but how to plot relative density between groups
bhabesh,
I can recommend three possibilities that come to mind. In order they are:
1. Use -kdensity- with the -addplot- option
2. Produce a side-by-side plot using -graph combine-.
3. Use -kdensity- to generate the density data and plot those using -graph twoway- commands.
I have three examples below showing each of these methods.
Best wishes,
Alan
/* Stata code begins here */
sysuse citytemp4.dta, clear
kdensity heatdd if region==1, addplot(kdensity heatdd if region==3) name(comb1, replace)
kdensity heatdd if region==1, name(kdens1, replace) nodraw
kdensity heatdd if region==3, name(kdens3, replace) nodraw
graph combine kdens1 kdens3, name(comb2, replace)
kdensity heatdd if region==1, generate(x1 d1) nodraw
kdensity heatdd if region==3, generate(x2 d2) nodraw
graph twoway (line d1 x1) (line d2 x2)
/* Stata code ends here */
Thanks!!