建立新的变量——generate
generate [type] newva = exp [if] [in]
type是可选项 用于指定创建的变量的类型 类型包括byte、int、long、float、double,
Newva表明新创建的变量名称,exp是关于变量赋值的表达式。
例:
gen educ2 = educ^2
gen educ_exper = educ*exper if educ >= 9 //生成交叉项
gen logexper = log(exper) //变量exper取对数
gen educ3 = round (sqrt(educ)) //round的含义是取整,sqrt()表示开根号,若()内为负数该函数将返回缺失值
更改已有的变量——replace
replace oldvar = exp [if] [in] [,nopromate] 改变已经存在的变量的赋值
[,nopromate]:阻止Stata 改变变量的类型来适应变量新的赋值
gen eudcat = 0
replace educat = 1 if educ==4 | educ==5 | educ==6
replace educat = 2 if educ >= 7 & educat <= 9
创建变量——egen
egen [type] newvar = fcn(arguments) [if] [in] [,options]
其中,fcn(arguments)指特有函数
egen命令函数清单:
sd(exp) 创建标准差
skew(varname) 创建偏度
std(exp) 创建标准化的赋值,默认均值为0,标准差为1
total(exp) 数值为表达式exper的加总
iqr(exp) 创建含有四分位距的变量
kurt(varname) 创建峰度新变量
mean(exp) 创建平均数变量
median(exp) 创建中位数变量
egen educavg = mean(educ)
egen educed = educ - educavg
sum educ educavg educed //总结这三个变量的关键统计指标,呈现基本情况
bysort female: egen educmed = median(educ) //分性别计算变量,产生educ中位数
egen stdeduc = std(educ)
egen higheduc = anyvalue(educ), v(13/18) //若educ的取值为13-18,则生成新变量higheduc赋值为educ;否则higheduc取值为缺失值“.”
list higheduc educ in 1/20 //列举前20个观测值,对比这两个变量。
egen sexmar = group(female married) //使用了group函数,说明用于分组的标准为female 和married
数值和字符串的转换——encode、decode
字符型到数值型:为已经存在的字符串变量添加一个去了标签的数值型变量
encode varname [if] [in], generate (newvar) [label(name)]
数值型到字符型:为已经存在的数值型变量和它的标签生成一个字符串变量
decode varname [if] [in], generate (newvar) [maxlength(#)]
real() 用于从合适的字符串表达式中得到相应的数值
encode sex, gen(gender) //生成数值型变量gender
list sex gender in 1/4, nolabel //不加标签的情况下罗列前四个观测值
label list gender //查看gender标签的具体情况
decode gender, gen(gender2) //根据gender变量重新生成一个字符型变量gender2
real("5.2")+1=6.2 //real()得到字符串的真实数值,如果不包含数字则返回缺失值
生成分类变量和虚拟变量
1.虚拟变量
gen college = 0
replace college=1 if educ>=12
list educ college in 1/10 //列表呈现前十个观测值
gen college = (educ>=12)
gen collede2 = (educ>12 & educ<17)
gen master = (educ >16 & educ<.)
tab1 edu0-master //查看新生成变量的频数表
2.分类变量
recode命令
recode oldvar (rule) [,gen(newvar)]
rule: #=# 3=1 3转换为1;##=# 2.=9 2和缺失值转化为9;#/#=# 1/5=9 1到5转化为9
nonmissing=# Nonmissing=8 所有非缺失值转化为8
missing = # miss = 9 所有缺失值转化为9
例:recode x (1 = 2) ,gen(nx)
recode x (1 = 2) (2 = 1) ,gen(nx1)
recode x (1 2 = 3) (4/7 = 3),gen(nx1)
(1)gen edu6 = 0
replace edu6=1 if educ>0 & educ<7
replace edu6=2 if educ>6 & educ<10
replace edu6=3 if educ>9 & educ<13
replace edu6=4 if educ>12 & educ<17
replace edu6=5 if educ>16 & educ<.
Tabulate edu6
(2)autocode(x,n,xmin,xmax) //把x自动分成间隔相等的n组
group(x) //等距分成x组
recode(x,x1,x2,...,xn) //当x缺失时求得缺失值
*gen exper1 = autocode(exper,5,1,51)
tabulate exper1
*sort exper
gen exper2 = group(5)
tabulate exper2
*gen exper3 = recode (exper,5,15,25,40,51)
tabulate exper3