Data management: How to convert categorical string variables to labeled numeric variables
HTML-код
- Опубликовано: 30 ноя 2016
- This video demonstrates how to convert categorical string variables to labeled numeric variables in Stata using the encode command.
www.stata.com
Copyright 2011-2019 StataCorp LLC. All rights reserved.
how do you ensure what categorical variable gets assigned a particular numeric value?
Big thank you!!! You and your video saved my life!!! Very helpful!!
Hey Chuck, this is a very useful tool. I've lost many hours encoding string variables. Thanks a lot!
if the command destring doesn't work, you have to check if there is/are non numeric charactor(s) in the column(s) that you want to change into numeric variable(s), you can use tab command to check it. and suppose that you find a value "-", that pretending a variable from changing into numeric variable. then you can use replace command, by typing [ replace varname="" if varname =="-"] where "" is missing value of string variable. sometimes you couldnt use tab command as the coloumn has too many values. if so use duplicates drop command. this command deals with duplicate values . by using duplicates drop command u can find out what string value is in the column. after using replace command to make it missing value , then use desting
This saved my day, thank you.
Fantastic! Thank you so much!
Thank you for your videos. I am learning a lot from them
This is so helpful! I wanted to know though how stata assigns a particular value as the base value? Like in this base it assigned “Black” as our base value and we got the coefficients for other and white wrt Black , is there a way to change which value gets selected as the base value? This works exactly like creating multiple dummies except that you don’t need to create multiple variables but just indicate that it’s a categorical variable using the prefix “i.” And stata automatically calculates everything in the same way as if it was creating dummies but only in the background ??
This was very helpful!
thank you, this is amazingggg!!!!
TKS,this video solve my problem!!!!!!!
Thank you!
It's odd. On Stata, I hate using the drop down menus and prefer to find the commands and write them out only. In SPSS though, it's the opposite. Love the drop down menus in SPSS
save time, use encode + your variable name , generate (new name)
thanks a lot!
Thanks alot man
Thanks.
I use this way to encode gender to numerik, but when I run the regress, it did not work. . it says variable sbp not found, help please
regress sbp i.Gender1
variable sbp not found
Can't, we just replace the original string variable, rather than generating new variable ?
i wonder what the n stands for in the variable name
I guess it is just an abbreviation for 'numeric' to differentiate it from the original string variable. You can name it in any way you like.
Why we are putting this i.race instead of just race which is the variable name
Because race is a categorical variable. But, it is not naturally dichotomous like sex, there are more than 2 races. Therefore, race is a factor variable. The i dot is telling stata to automatically code the variable as a factor variable for you (e.g. white = 1, black = 2, asian = 3, arab =4, etc). But, if you want, you can create multiple dummy variables where you set a specific race as 1, and every other race as 0.
"i." is used for binary/categorical variables, and shows you results for each variable value
it is not visible