Using Stata to Create Bar Graphs

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • Bar graphs provide a way to examine a continuous or quantitative measure across one or more discrete measures. In this video see how to create simple bar graphs and learn about the over suboption in Stata.

Комментарии • 46

  • @paulonieto9780
    @paulonieto9780 7 лет назад

    Hi man! Thanks a lot, it was very useful!! Greetings from Colombia!

  • @SkiBumK27
    @SkiBumK27 2 года назад +1

    Hello, thanks for the useful video. 2 questions: 1) How did you get the bars to be different colors in the last graph? 2) How can I change the color of a single bar? TIA

    • @smilex3
      @smilex3  2 года назад

      There are two options that can help control the look of the bars. Which one you use depends on what you want your graph to look like and how your variables were created. The options are:
      ascategory which treat yvars as first over() group and asyvars which treat the first over() group as yvars

      In the example for the last graph in the video, I have one variable to be graphed over two categories. The default option is ascategory. Copy and run this code snippet:
      sysuse auto, clear
      set scheme s2color
      graph bar mpg if rep78>=3, over(foreign) over(rep78) ascategory name(bar1, replace)
      graph bar mpg if rep78>=3, over(foreign) over(rep78) asyvars name(bar2, replace)
      But, what if I had a different set of variables for mpg? One of the variables contains the mpg values for domestic cars and is missing for the foreign ones. The other variable contains mpg values for the foreign cars and is missing for the domestic ones. We can create that variable and graph them like this:
      separate mpg, by(foreign) generate(mpgfor)
      graph bar mpgfor0 mpgfor1 if rep78>=3, over(rep78) name(bar3, replace)
      See how the graphs bar2 and bar3 are identical. But, using the asyvars option I did not need to create and manage any new variables.
      Regarding your second question, changing the color of a single bar is easy if you only have one over grouping variable. The following code shows the easy solution in the first block of code. The second block shows how there might be an issue with more than one over grouping variable:
      graph bar mpg if rep78>=3, asyvars bargap(80) over(rep78) ///
      bar(1, bcolor(navy)) ///
      bar(2, bcolor(navy)) ///
      bar(3, bcolor(lime)) ///
      name(bar4, replace)
      graph bar mpg if rep78>=3, asyvars bargap(80) over(rep78) over(foreign) ///
      bar(1, bcolor(navy)) ///
      bar(2, bcolor(navy)) ///
      bar(3, bcolor(lime)) ///
      name(bar5, replace)
      Here is one more possible solution if you have more than one over group. It involves taking the two over variables and combining them into a single over variable, using the asyvars option, and controlling the colors of the bars individually. Here is an example:
      egen forrep78=group(foreign rep78), label
      graph bar mpg if rep78>=3, asyvar over(forrep78) bargap(80) ///
      bar(1, bcolor(navy)) ///
      bar(2, bcolor(navy)) ///
      bar(3, bcolor(lime)) ///
      bar(4, bcolor(navy)) ///
      bar(5, bcolor(navy)) ///
      bar(6, bcolor(navy)) ///
      name(bar6, replace)
      If none of the above gets you to where you want to be, you can move to graph twoway bar where you can layer one graph over another. So, youcould create a base graph with the default colors, then add a second graph with just a single bar of a different color.

  • @librarystudyntour
    @librarystudyntour 4 года назад +1

    /*
    Using Stata to create bar graphs
    */
    /* Basic bar graph examples - Summarize the mean */
    summarize realrinc
    graph bar realrinc, name(bar1, replace) // Start with a default graph and add options.
    tabstat realrinc, by(marital)
    graph bar realrinc, over(marital) name(bar2, replace) //
    /* Using multiple -over()- options */
    graph bar agewed, over(divorce) over(sex) name(box1, replace)
    graph bar agewed, over(sex) over(divorce) name(box2, replace)
    graph bar agewed, over(sex) over(divorce) name(box2, replace) asy // as y variable
    graph hbar agewed, over(sex) over(divorce) name(box2, replace)
    /* Relabeling */
    #delimit ;
    graph bar agewed, over(sex, relabel(1 "Men" 2 "Women")) // Note relabel within over.
    over(divorce, relabel(1 "Divorced" 2 "Not divorced"))
    name(bar1, replace);
    #delimit cr
    /* Displaying different statistics */
    #delimit ;
    graph bar (p75) agewed, over(sex, relabel(1 "Men" 2 "Women"))
    over(divorce, relabel(1 "Divorced" 2 "Not divorced"))
    name(bar1, replace);
    #delimit cr
    /* Example of a polished graphic */
    #delimit ;
    graph bar agewed, over(sex, relabel(1 "Men" 2 "Women"))
    over(divorce, relabel(1 "Divorced" 2 "Not divorced"))
    bargap(10)
    title("Age When First Married by Gender and Ever Divorced", span)
    ytitle("Age")
    note("Source: General Social Survey, 2006")
    ymtick(10(5)40)
    legend(col(1) ring(0) position(1))
    graphregion(color(white))
    name(box4, replace);
    #delimit cr

  • @suyeonlee-tauler5302
    @suyeonlee-tauler5302 7 лет назад

    Very useful video. Thanks!!

  • @Xavicardenas
    @Xavicardenas 4 года назад

    Thank youuu !!!!

  • @GillTokyo
    @GillTokyo 4 года назад +1

    Hello, great video, thank you!
    I have a really simple question, I would like to graph means and sd bars for several different variables (over country, where I compare the country mean for one country with other countries in the study) on one chart for each variable.
    if you have time to help me with the code, I would be very grateful.

    • @smilex3
      @smilex3  4 года назад

      Hi Gill, I am not certain if I understand your problem fully. But, I fear that you are trying to pack too much information onto one graph. I think that this is a difficult visualization problem, not a difficult programming problem. Of the several issues I expect to find is the possible difference in scale across different variables. That said, maybe the following code will give you some ideas:
      /* Idea #1 */
      sysuse auto, clear
      graph bar (mean) mpg trunk (sd) mpg trunk, over(rep78) name(g1, replace)
      /* Idea #2 */
      preserve
      collapse (mean) mpgmean=mpg (sd) mpgsd=mpg (mean) trunkmean=trunk (sd) trunksd=trunk, by(rep78)
      drop if rep78==.
      rename rep78 rep78a
      gen rep78b=rep78a+.25
      gen mpglb=mpgmean-mpgsd
      gen mpgub=mpgmean+mpgsd
      gen trunklb=trunkmean-trunksd
      gen trunkub=trunkmean+trunksd
      list
      twoway (rcap mpglb mpgub rep78a) ///
      (rcap trunklb trunkub rep78b) ///
      (scatter mpgmean rep78a, c(l)) ///
      (scatter trunkmean rep78b, c(l))
      twoway (bar mpgmean trunkmean rep78a)
      restore
      This article may guide your thinking as well: stats.idre.ucla.edu/stata/faq/how-can-i-make-a-bar-graph-with-error-bars/

    • @GillTokyo
      @GillTokyo 4 года назад

      @@smilex3 Thank you very much for getting back to me. I really appreciate it.
      I didn't explain well. I've already looked at the UCLA website and cannot find what I am looking for.
      What I want is embarrassingly simple: the variables are all scaled in the same way, and all measure the same underlying construct (e.g. one chart that shows 'antagonism toward political elites.' This is measured by average agreement with 'most politicians are corrupt' [Japan vs Other countries in the study] and average agreement toward 'elected officials don't care what ordinary people think' [Japan vs other countries in the study].

    • @smilex3
      @smilex3  4 года назад

      @@GillTokyo Hi Gill, did the code I posted give you some direction? I am still not seeing exactly what you are trying to accomplish. Maybe you could use the dataex program in Stata to produce a small, representative set of data that we can use to experiment with.

  • @kennah7648
    @kennah7648 9 лет назад

    How would you go about producing bar charts with percentages as opposed to the mean or percentiles

  • @beatriceruocco3715
    @beatriceruocco3715 Год назад

    hi! there is a way to make a bar graph on stata without making the mean(or median)? I only want to plot my data, hist doesn't work because I have the mean and the median of two variables to plot over a string variable. With graph bar stata plots the mean of the mean and the mean of the median. How I can solve it? Thanks

    • @smilex3
      @smilex3  Год назад

      Hi Beatrice, I am uncertain exactly what you want to accomplish but I have suggestions for two situations that may help you. The first situation has a typical dataset of recorded values for each observation for several variables. In this situation, you might want to graph the results of some simple descriptive statistics (e.g. means, medians, standard deviations, etc.)
      In the second situation, you could have a dataset of descriptive statistics (e.g. means, etc.) for variables calculated across some other (string) variable. People are sometimes confused by "graph bar" in Stata expecting something like a histogram. But the help file clearly states, "In a vertical bar chart, the y axis is numerical, and the x axis is categorical."
      My example uses the auto.dta dataset to demonstrate graphing the means and medians of three variables calculated for two groups. Here is example 1:
      sysuse auto, clear
      summarize mpg trunk turn
      graph bar (mean) mpg turn trunk ///
      (median) mpg turn trunk, ///
      over(foreign) blabel(bar, format(%4.0f)) ///
      name(bar1, replace)
      I believe this accomplishes graphing means and medians for variables across or over a categorical variable. But, if your dataset contains summary statistics, you can create the same graph shown in example 2:
      clear
      input str10 foreign byte (mpgmean mpgmed turnmean turnmed trunkmean trunkmed)
      "Domestic" 20 19 41 42 15 16
      "Foreign" 25 25 35 36 11 11
      end
      graph bar mpgmean turnmean trunkmean mpgmed turnmed trunkmed, ///
      over(foreign) blabel(bar) name(bar2, replace)
      Note here that I have simply created a new dataset of summary statistics, an unnecessary step if I have access to the original raw data (e.g. example 1).
      Hopefully, this helps you a bit. If I have misunderstood your question, feel free to post again. It would be helpful if you could use a built-in s=dataset like the auto dataset to show me an analogous problem that we can work on together.

  • @gonzalyu
    @gonzalyu 10 лет назад

    Very useful video.
    I do have a question. How can you graph a bar with percentage instead of means?

    • @smilex3
      @smilex3  10 лет назад

      Yuritzy,
      Do you mean something like a simple histogram? If so, try the following Stata code and see if it produces what you want. Note that I selected a discrete orderable measure, rep78, so I used the -discrete- option. To display percentages instead of densities I used the -percentage- option. Finally, I displayed the exact percentages, which I don't find attractive, using the -addlabels- option. Entering -help histogram- in the command window will give you many more details.
      Best,
      Alan
      sysuse auto
      histogram rep78, discrete percent addlabels

    • @gonzalyu
      @gonzalyu 10 лет назад

      Hello Alan
      A histogram is helpful; I am trying to do a histogram for two variables. I tried the following code and it is not working for me.
      twoway (histrogram kotelchuck if CDE_PERC_n==1, start (30) width(5) color(darkgreen))
      (histrogram kotelchuck if CDE_PERC_n==2, start(30) width(5),
      legend (order(1 "Cawem" 2 "Cawem Plus"

    • @smilex3
      @smilex3  10 лет назад

      Yuritzy Glez
      I just saw your reply and apologize for not getting back to you sooner. You were not clear when you said your code was "not working". It is difficult to help without that critical information.
      But, looking at your code I can make a guess about the problem. Using graph twoway, Stata attempts to superimpose one graph over the other. Stata does not provide transparency in their graphics and one graphic can block or occlude the other. Sometimes you can correct for this by carefully selecting the order of your graphics. The graphs are displayed in order. In the code below the first example shows how one graphic can overlay, and hide, part of the graphic underneath.
      The next two examples provide solutions by graphing the two subsets of data side-by-side. The histogram on the left shows the distribution of mpg for card less than or equal to the median weight. The histogram on the right shows mpg for card greater than the median weight.
      Why two solutions? The first using the by option is short an simple. But, I couldn't figure out how to color the two histograms differently! There may be a way, but it is not obvious to me. If you want to control the look of the bars independently, the last example should work. Here I create two histograms (suppressing their display) and then put them into one graphic using the graph combine command.
      Hopefully this helps if you haven't already found a solution. Let me know how it goes.
      Best wishes,
      Alan
      /* Stata Code Follows */
      sysuse auto, clear
      /* Problem with occlusion */
      #delimit ;
      gr tw (histogram mpg if weight 3190, width(5) color(yellow) fintensity(60)),
      name(hist1, replace);
      #delimit cr
      /* Solution #1 */
      cap drop split
      gen byte split=weight>3190 if !missing(weight)
      tab split
      gr tw (histogram mpg, by(split) color(green) fintensity(60) name(hist2, replace))
      /* Solution #2 */
      gr tw (histogram mpg if split==0, color(green) fintensity(60) name(hist3, replace) nodraw)
      gr tw (histogram mpg if split==1, color(yellow) fintensity(60) name(hist4, replace) nodraw)
      gr combine hist3 hist4

  • @juliorivas106
    @juliorivas106 8 лет назад

    i dont get it, what if I need just a single categoric variable? how to do a graph bar?

    • @smilex3
      @smilex3  8 лет назад

      +Julio Rivas Julio, Do you mean something like this:
      sysuse auto, clear
      hist rep78, discrete percent barwidth(.8)

  • @lauradagostini2872
    @lauradagostini2872 9 лет назад

    It's usefull, thank you! I have a question: how can I add to my graph the stars of significance? In fact between the symbols I found there aren't any stars...

    • @smilex3
      @smilex3  9 лет назад

      +Laura D'Agostini Laura, what kind of graph are you making? Where do you want to display the stars? Usually people just use superscripted asterisks "*" to indicate significance. Does the following example give you some ideas?
      /* Example */
      sysuse auto, clear
      #delimit ;
      graph twoway (scatter mpg weight),
      title("Statistically Significant {sup:***}")
      text(11 2100 "{&beta}{sub: 1}=-.0060087{sup:***}");
      #delimit cr

    • @lauradagostini2872
      @lauradagostini2872 9 лет назад

      +Alan Neustadtl I'm making a bar graph and I'd like to add the stars between the bars that are statistically different (results of Student test). I've tried with the graph editor but there are only little square , triangles or circles. How do I make superscripted asterisks where I want on the graph? Sorry for my ignorance, but I'm completely new with stata and I'm getting mad with it...

    • @smilex3
      @smilex3  9 лет назад

      +Laura D'Agostini Laura, Okay, I get a little better what you are trying to do. You could do it with graphic markers, but it might be easier with text. In the following program, I read in some data, produce a bar graph, and using the "text()" option add labels indicating significance, or at least asterisks. I am partial to the solution for Bar 2. The y-dimension is easy to get, the x-dimension was determined by trial and error since bar graphs do not treat the X-axis as numeric.
      Does this make sense?
      clear *
      input byte x y
      1 1
      2 2
      3 3
      4 4
      5 5
      end
      #delimit ;
      graph bar y, over(x)
      text(1.25 9 "Bar 1 {sup:***}")
      text(4.25 71 "Bar 4 {sup:**}")
      text(2.25 29 "Bar 2")
      text(2.30 35 "***");
      #delimit cr

    • @lauradagostini2872
      @lauradagostini2872 9 лет назад

      +Alan Neustadtl Yes, it makes a great sense, perfect!! I could do it on my graph and it's exactly what I was looking for, thank you so much!!!

    • @smilex3
      @smilex3  9 лет назад

      Laura D'Agostini I'm glad we figured out a way to get this done.

  • @bruhankm7583
    @bruhankm7583 6 лет назад

    Hi ... I would like to know how to superimpose the bar graphs. Any help in this regard is highly appreciated

    • @smilex3
      @smilex3  6 лет назад

      Can you please provide more details? Superimpose what on what? Overlapping bars? Transparent bars? A theoretical distribution? Something else?

    • @bruhankm7583
      @bruhankm7583 6 лет назад

      I would like to superimpose one bar graph over the other. I have more than two categories to compare, therefore on each bar if I can be able to put one more, it reduces the number of graphs I need to produce.

  • @mantshonyanelent1652
    @mantshonyanelent1652 4 года назад

    Anyboyu know how you can add the IQR after computing median bar graph?Thanks alot

    • @smilex3
      @smilex3  4 года назад

      Mantshonyane, I don't think this is baked into Stata. But, you can create the constituent pieces and build the graph. Here is some Stata code that might point you in the direction you want:
      sysuse auto, clear
      recode rep78 (1/2=1) (3=2) (4=3) (5=4), gen(rep78new)
      collapse (median) median=mpg (p25) p25mpg=mpg (p75) p75mpg=mpg, by(rep78new)
      graph twoway (bar median rep78new, barw(.8)) (rcap p* rep78new)

  • @claire2247
    @claire2247 4 года назад

    Help! I need to be able to use error bars with bar graphs. Thanks :)

    • @smilex3
      @smilex3  4 года назад +1

      Hi Claire, Stata does not have this baked into its graphics as a graph or an option for a graph. But, you can build your own in a couple of ways depending on your application. For example, using margins and marginsplot for discrete variables. But, you probably want something outside of regression. UCLA has an excellent tutorial that explains how to build the graph you want from it constituent parts. You can find the explanation at stats.idre.ucla.edu/stata/faq/how-can-i-make-a-bar-graph-with-error-bars/

    • @claire2247
      @claire2247 4 года назад

      @@smilex3 thank you!!! Have a beautiful day.

  • @sonoman81
    @sonoman81 7 лет назад

    how do I can add the SD to a mean bar graph?

    • @smilex3
      @smilex3  7 лет назад

      Hello, I am not certain what you mean by "add the SD", but if you intend to show the numeric value for each group. Here is one example. I use the levelsof command to assign the unique values of rep78 into a macro. Then I loop over values of rep78, calculating and storing SD's into macros that will be used in the bar graph. Finally, I use the text option in the bar graph to display the SD's. I had to play around with the Y, X coordinates to get the SD's displayed nicely.
      Here is the Stata code to play with:
      sysuse auto, clear
      levelsof rep78, local(repstatus)
      foreach num of numlist `repstatus' {
      summarize mpg if rep78==`num'
      local sd`num': di %3.1f r(sd)
      }
      #delimit ;
      graph bar mpg, over(rep78)
      text( -2.5 9 "sd=`sd1'")
      text( -2.5 30 "sd=`sd2'")
      text( -2.5 51 "sd=`sd3'")
      text( -2.5 71 "sd=`sd4'")
      text( -2.5 92 "sd=`sd5'");
      #d cr

  • @hossamelfeki8581
    @hossamelfeki8581 6 лет назад

    How to add error margin?

    • @smilex3
      @smilex3  6 лет назад

      This is conceptually easy to do, but there is no simple option and it takes a fair amount of programming. Here is an extended example that you can copy-and-paste into your Stata do-file editor. If it makes sense great. If not, go through it one line at a time and break it down. The basic idea is to create a smaller dataset with the bar and error data and then use graph twoway bar to create the bars and overlay the error bars using graph twoway rcap.
      /* Get a dataset */
      use stats.idre.ucla.edu/stat/stata/notes/hsb2, clear
      /*
      Use the collapse command to make the mean and standard
      deviation by race and ses.
      */
      collapse (mean) meanwrite= write (sd) sdwrite=write (count) n=write, by(race ses)
      /* Make the upper and lower values of the confidence interval. */
      generate hiwrite = meanwrite + invttail(n-1,0.025)*(sdwrite / sqrt(n))
      generate lowrite = meanwrite - invttail(n-1,0.025)*(sdwrite / sqrt(n))
      /* Create variables used to control the color of the error bars */
      generate sesrace = race if ses == 1
      replace sesrace = race+ 5 if ses == 2
      replace sesrace = race+10 if ses == 3
      sort sesrace
      list sesrace ses race, sepby(ses)
      /* Make the graph */
      #d ;
      twoway (bar meanwrite sesrace if race==1)
      (bar meanwrite sesrace if race==2)
      (bar meanwrite sesrace if race==3)
      (bar meanwrite sesrace if race==4)
      (rcap hiwrite lowrite sesrace, lwidth(medthick) lcolor(lime)),
      xtitle("Socio Economic Status")
      ytitle("Mean Writing Score")
      xlabel(2.5 "Low" 7.5 "Middle" 12.5 "High", noticks)
      ylabel(, angle(0))
      legend(row(1) order(1 "Hispanic" 2 "Asian" 3 "Black" 4 "White")
      region(lcolor(white)))
      graphregion(color(white));
      #d cr

  • @danieldelahormaza4336
    @danieldelahormaza4336 10 лет назад

    Anybody out there knows what does "asis" works for?

    • @smilex3
      @smilex3  10 лет назад

      The command - (asis) - says show the data as is in - graph bar|hbar|dot -, so not as means or as something else. So, if your data are already collapsed or summarized, you can specify - (asis) - as part of your command. In this case it can effect how the axis labels and legend appears.
      You cannot use this command with a dataset that requires summarizing the data before graphing them.
      The example below uses collapsed data. So, for each of the four regions, I have an average temperature for the month of January. So, I specify - (asis) -.
      However, if I had 31 days of temperature data for each region, so 4 regions times 31 days or 124 temperature/days, I could not use - (asis) - and would need another command, probably - (mean) - to instruct Stata to calculate the average of the temperatures.
      The example below shows how the Y-axis changes due to using - (asis) -.
      Best,
      Alan
      This typically affects the axis labels and legends.
      clear
      input str10 region float tempjan
      region tempjan
      N.E. 27.9
      "N. Central" 21.7
      South 46.1
      West 46.2
      end
      label var tempjan "Jan. Temp."
      label var region "Country region"
      graph bar (asis) tempjan, over(region) name(g1, replace)
      graph bar tempjan, over(region) name(g2, replace)

    • @danieldelahormaza4336
      @danieldelahormaza4336 10 лет назад

      Thank you very much! Just another question:
      I am doing a graph bar with a data base which includes 30 countries, but I need in my graph to include only five countries. I am using the command "if" to do so; however, the graph only appears with one country or gives me an error.
      Do you know the right way to do this?
      Thanks again.

    • @mohammedibrahim-sr5tb
      @mohammedibrahim-sr5tb 7 лет назад

      Very clear and helpful. Thank you Alan

    • @mohammedibrahim-sr5tb
      @mohammedibrahim-sr5tb 7 лет назад

      I would like to graph my lab results to draw the cut-off, thus could you help me how to do it?

  • @PeniHausia
    @PeniHausia 8 лет назад

    #delimit ;
    graph bar Agewed, over(Sex, relabel(1 "Men" 2 "Women"))
    over(Divorced, relabel(1 "Divorced" 2 "Never divorced")
    bargap(10)
    asy
    title("Average Age First Married by Gender and Ever Divorced", span)
    ytitle("")
    note("Source: General Social Survey, 2006")
    ymtick(10(5)25)
    legend(col(1) ring(0) position(11))
    graphregion(color(white))
    name(bar4, replace);
    #delimit crWhat's wrong with my syntax - I tried so many times but it keeps saying - parentheses do not balance

    • @smilex3
      @smilex3  8 лет назад +1

      +Peni Hausia I took your code and ran one line at a time, a tried and true method to debug programs. Your error is in your second line which should read: "over(Divorced, relabel(1 "Divorced" 2 "Never divorced"))". Notice how there are now to open parentheses and two closed ones.

    • @PeniHausia
      @PeniHausia 8 лет назад

      +Alan Neustadtl - Thank you so much.

    • @smilex3
      @smilex3  8 лет назад

      I'm glad you got your Stata code to work! Your graphic looks very nice.