Creating New Variables Using Stata

Поделиться
HTML-код
  • Опубликовано: 21 авг 2024
  • This video introduces the programming concepts and syntax for creating new variables. The Stata commands covered include generate, replace, recode, label define, label values, label variable, and label data.

Комментарии • 46

  • @alluneedislove4ever
    @alluneedislove4ever 6 лет назад

    Thank you very much! I was actually working on confidence in institutions in the world using the WVS 6. I'll be back on your videos when I'll have a little bit more time :D

  • @auddjurhuus3704
    @auddjurhuus3704 8 лет назад

    thank you, you have been a big help for me in doing my maters thesis.

    • @smilex3
      @smilex3  8 лет назад +1

      +Aud Djurhuus I am glad these videos have been useful for you. Please let me know if ou have suggestions for new videos on Stata.

  • @DkAlexus
    @DkAlexus 9 лет назад +5

    Like the intro song!! :D

  • @ianyohane8182
    @ianyohane8182 4 года назад

    this is awesome guys

  • @GeorgyPorgy76
    @GeorgyPorgy76 7 лет назад

    Thank you! Very helpful.

  • @smilex3
    @smilex3  8 лет назад

    Ricardo, there is no reply link on your posting, but a quick Internet search shows three different ways to gain access to the General Social Survey (GSS) data. Each link goes to a page that works a little differently, but you should be able to get to the data. The GSS a great social science dataset that is interesting to explore.
    www3.norc.org/GSS+Website/Download/
    www.icpsr.umich.edu/icpsrweb/landing.jsp
    sda.berkeley.edu/sdaweb/analysis/?dataset=gss14
    Best wishes,
    Alan

  • @smilex3
    @smilex3  10 лет назад

    Santiago,
    The reply option is not showing up on your commune on RUclips. Hopefully, you will see this reply.
    It is hard to give you good advice without more information. But, The Stata code below may be helpful. It uses one of the example datasets that comes with Stata and show one way to collapse a dataset by year producing means and standard deviations.
    Best,
    Alan

  • @santiagoc85
    @santiagoc85 10 лет назад +1

    Hi Alan. By chance I found this video. Perhaps if you don't mind you could help me with my issue. I'am working with a GMM model (dynamic panel data) and I need to collapse my data into 5-years averages and sd. I have created a new variable named period with each pair of year (eg. period=80 if year>=1980 & year

  • @goswaminilanjana07
    @goswaminilanjana07 9 лет назад +2

    hi! how can i recode a string variable to numeric variable. I want to recode principal occupation occupation into 4 levels .the principal occupation is a string variable .

    • @smilex3
      @smilex3  9 лет назад

      Nilanjana, without more details about your data I can't be too specific. But, I can point you to a couple of possible solutions. These include the Stata commands "destring" and "encode". The following Stata code builds a sample dataset and shows their simplest use.
      Best wishes,
      Alan
      /* Create a small dataset with two string variables */
      clear *
      input str1 var1str str4 var2str
      1 occ1
      2 occ2
      3 occ3
      4 occ4
      end
      list
      describe
      /* Use destring to convert numbers stored as strings to numbers */
      destring var1str, gen(var1num)
      /* Use encode to convert strings with non-numeric characters to numbers */
      encode var2str, gen(var2num)
      list
      list, nolabel
      describe

  • @Aurelaso
    @Aurelaso 7 лет назад

    Alan, thanks for your videos. I have a quick question. Is it possible to randomly assign values in variable "x" to variable "y" (which only has missing values)?

    • @smilex3
      @smilex3  7 лет назад

      Aurelio, I am not certain why you need to do this, but if I understand your question, the following should point you to a solution:
      sysuse auto, clear
      generate rndnum=runiform()
      sort rndnum
      rename mpg mpg1
      keep mpg1
      tempname rndorder
      save `rndorder'
      sysuse auto, clear
      merge 1:1 _n using `rndorder'
      list mpg mpg1
      pwcorr mpg mpg1
      Best,
      Alan

  • @johndupont8596
    @johndupont8596 8 лет назад

    Hi Alan! Thanks a lot for your videos
    I just have a small question as I am having a small issue:
    I am looking at trade flows between countries and I have the following 5 variables in my dataset:
    COUNTRY PARTNER TRADE_FLOWS TIME GDP_COUNTRY
    Now my problem is that I would like to generate a new variable that indicates the gdp of the PARTNER country as well, thus I will have :
    COUNTRY PARTNER TRADE_FLOWS TIME GDP_COUNTRY GDP_PARTNER
    All countries are included in COUNTRY and PARTNER, thus i am looking for a command that says: "if PARTNER= usa then GDP_PARTNER=GDP_COUNTRY when COUNTRY==usa
    Any help with this will be greatly appreciated!!
    Many Thanks!!
    Best,
    John

    • @nnnwitharya
      @nnnwitharya 4 года назад

      ruclips.net/video/kr2v3LuBw2I/видео.html

  • @kamieog
    @kamieog 8 лет назад

    thank you very much :D

    • @smilex3
      @smilex3  8 лет назад

      +SLKRJD I'm glad you found this video helpful.

  • @Stine2207
    @Stine2207 10 лет назад

    Hi Alan. Maybe you can help me. I have a dataset which consists of two questionnaires. I want to create a age variable of the two variables with info on the age. So a new variable that combines two variables. How do I do that?
    /Stine

  • @earningsmanagementestimati6028
    @earningsmanagementestimati6028 6 лет назад

    Hi sir
    I need your help on how to generate instrumental variables according to ivreg2h using STATA. in other words, how to generate instrumental variables from my data because I don't have external instruments. The method developed by (Lewbel, 2012). Please you help is highly appreciated.

    • @smilex3
      @smilex3  6 лет назад

      You did not give me much information to work with. Maybe this video from StataCorp will be helpful.

  • @ibidunnioloniniyi8806
    @ibidunnioloniniyi8806 5 лет назад

    Hi,
    I need some help.
    I have a variable which is monthly income but the data is from three different countries.
    What command can i use to convert these incomes to US dollars since the denominator for each country currency is different.
    Awaiting your response.
    Thanks

    • @smilex3
      @smilex3  5 лет назад

      Hi Ibidunni, It may be possible and it depends on the data you have. DO you have a second variable which denotes which currency is recorded for each observation? Do you have a variable indicating the location of the respondent that can be used for this purpose? Is there a pattern to the income variable. In other words, the first 300 cases are in pounds, the next 250 cases are in lira, and the remaining cases are in Swiss francs. Or, do the currencies alternate like this 1, 2, 3, 1, 2, 3, etc. With a little more information or a look at an example set of data I might be able to provide a bit more help.

  • @3foss191
    @3foss191 7 лет назад

    Sorry, i'm working on a dataset were they are missing values(represented with the "dots" (not really looking like the dot i saw in the preview data on which i've been working)), the problem here is that when i try to delete the missing values , stata (14) did not allow me to do it.
    . drop if missing (hc)
    missing not found
    r(111);
    end of do-file
    r(111);
    Please could You give me a hand. thks

    • @smilex3
      @smilex3  7 лет назад

      It is hard to tell from you description what is going on. My first suspicion is that the variable you are using to determine what to drop is a string variable. Missing values for string variables a null entries, not periods.
      If you install the user-written program dataex, you could send me a small amount of your data showing me exactly what you have. From the command window type: findit dataex. The help file will tell you how to use the program.
      Finally, here is some Stata code demonstrating how missing values and the drop command work for numeric data:
      clear
      input byte y x
      1 5
      2 4
      . 3
      4 .
      5 1
      end
      list
      drop if missing(y)
      list
      clear
      input byte y x
      1 5
      2 4
      . 3
      4 .
      5 1
      end
      list
      drop if missing(y)
      drop if missing(x)
      list
      clear
      input byte y x
      1 5
      2 4
      . 3
      4 .
      5 1
      end
      list
      drop if missing(y, x)
      list

    • @smilex3
      @smilex3  7 лет назад

      Here is some more followup that is perhaps clearer. In this example y is numeric and x is string. x also contains the value ".". Yo can see in the first small program that the record where x=. is not treated as missing. I am still able to drop that record as shown in the second exaple.

    • @3foss191
      @3foss191 7 лет назад

      I will control again and let you know. thks a lot

    • @3foss191
      @3foss191 7 лет назад

      Sorry, but i don't know how it functions(dataex).

    • @3foss191
      @3foss191 7 лет назад

      The variable is not a string but a numeric

  • @afraakbar9162
    @afraakbar9162 5 лет назад

    Please help me with this. There are three questions about to clarify informal or formal sector.
    1st one is EPF. Under this there are 3 answers
    1-Yes
    2-No
    3-Don't know
    2nd question
    Whether your instituition keeping the accounts.
    The same three answers
    3rd one how many regular employees in your instituition
    More than 10 consider as formal and less than 10 consider as informal.
    In my data analysis I need to get all the formal as EPF yes, accounts yes and more than 10 employees and how to calculate the total employees in the formal sector that people who have EPF, accounts and more than 10 employees. Please help me with this.

    • @smilex3
      @smilex3  5 лет назад

      This is very difficult to answer without knowing a lot more about your data. But, based on what you wrote the following might get you going on a solution:
      generate byte sector=0
      replace sector=1 if epf=="yes" & q2=="yes" & q3>10
      label define sectorlab 1 "formal" 0 "informal"
      label values sector sectorlab
      tab sector
      count if sector==1

  • @loanpham9365
    @loanpham9365 10 лет назад

    Hi Alan. Would you mind helping me this situation? I import my data in excel into Stata. I want to use this data for tssmooth ma. But i cannot do it because Stata requires tsset varname. Then i creat a new var but when i use tssmooth ma, error is integer number accepted only (my data is decimal). Please help me! How can i use tssmooth ma in this situation?

    • @smilex3
      @smilex3  10 лет назад

      It is hard to respond without more information. I assume that you have some kind of time series or longitudinal or crosssectional data. But, knowing bow to use -tsset- depends on your particular data. For example, using the General Social Survey longitudinal data requires reshaping from -wide- to -long- and in the process creating a an idnum and a panel wave identifier so the data can be -tsset-.
      So, either you have a straight time-series data set in which case you can do the following:
      tsset timevar [, options]
      Or, you have panel data and you can do the following:
      tsset panelvar timevar [, options]
      Best,
      Alan

    • @loanpham9365
      @loanpham9365 10 лет назад

      Alan Neustadtl I appreciate your support! My data is time-series, stata error is stata just accept integers while my data is decimal number. How can solve this situation? Eventhough form of data is float. I know that it is difficult to you without information. So, would you mind checking my data? How can i send it to you? Please support me! Thank you so much!

  • @TheKaduzin
    @TheKaduzin 8 лет назад

    If the main goal is teach about create news variables, I just dont understand why the data isnt avaliable... =/

  • @afraakbar9162
    @afraakbar9162 5 лет назад

    Please help needed. I want to create "1" as ever married and "2" as never married. But there are 1 as married 2 never married 3 widowed 4 seperated 5 divorced. What is the code I have to use for this. Please help me

    • @smilex3
      @smilex3  5 лет назад +1

      Hi Afra, there are several ways to do this but let me give you one of them that creates a new variable that only contains the values 1, 2, and missing. I will call the original variable "marital" and it has the values that you described above and create a new variable called "marital2cat". Here is one solution:
      generate byte marital2cat=.
      replace marital2cat=1 if marital==1
      replace marital2cat=2 if marital==2
      label define marcat 1 "ever married" 2 "never married"
      label values marital2cat marcat
      Some people prefer using "recode":
      recode marital (1=1 "ever married") (2=2 "never married") (else=.), generate(marital2cat)
      Best,
      Alan

    • @afraakbar9162
      @afraakbar9162 5 лет назад

      @@smilex3 Thank you so much. I did it and I got the results. I am doing my dissertation and stuck with analyzing data. Is there any way that you can help me? And I have again a question. I want to create a variable as log wage. But in the questionnaire there is a wage and my supervisor told me to get wages from 3 other questions too as I am doing the dissertation about informal economy. How can I get it. Please help me.

    • @smilex3
      @smilex3  5 лет назад

      @@afraakbar9162 So, you can use either the -log10()- or -log()- functions. It sounds like you want something like the following:
      generate lwage=log10(wage1+wage2+wage3)
      Be careful in case you have a value of 0 in the sum of your wages measure since log10(0) is not defined.
      Best,
      Alan

  • @kevinsegers8083
    @kevinsegers8083 7 лет назад

    I have a few questions for my Master thesis about creating lag variables in stata. Is there a possibility you could help me with that?
    KS

    • @smilex3
      @smilex3  7 лет назад

      Kevin, Check out the Stata faq on this topic and see if that is enough to get you going. The URL is www.stata.com/support/faqs/data-management/creating-lagged-variables/

    • @kevinsegers8083
      @kevinsegers8083 7 лет назад

      So by using the command 'gen lag1' I can create lag Variables of existing ones? Where do I put the Variable names in the command?
      Thx!

    • @smilex3
      @smilex3  7 лет назад

      Here is an example:
      sysuse auto, clear
      gen lag1 = mpg[_n-1]
      gen lag2 = mpg[_n-2]
      gen lead1 = mpg[_n+1]
      list mpg lag? lead1

    • @kevinsegers8083
      @kevinsegers8083 7 лет назад

      Alright, Thanks! Can I do this for the dependent variable as well? I'm using a time series of the volatility of stock returns.

    • @smilex3
      @smilex3  7 лет назад

      Sure. Stata neither knows nor cares what variables you consider to be dependent or indepedent. For Stata (and all statsitical applications) they are just variables.