STATA变量的相关操作

建立新的变量——generate

generate [type] newva = exp [if] [in]

type是可选项 用于指定创建的变量的类型 类型包括byte、int、long、float、double,

Newva表明新创建的变量名称,exp是关于变量赋值的表达式。

例:

gen educ2 = educ^2

gen educ_exper = educ*exper if educ >= 9  //生成交叉项

gen logexper = log(exper)  //变量exper取对数

gen educ3 = round (sqrt(educ))  //round的含义是取整,sqrt()表示开根号,若()内为负数该函数将返回缺失值

更改已有的变量——replace

replace oldvar = exp [if] [in] [,nopromate] 改变已经存在的变量的赋值

[,nopromate]:阻止Stata 改变变量的类型来适应变量新的赋值

gen eudcat = 0

replace educat = 1 if educ==4 | educ==5 | educ==6

replace educat = 2 if educ >= 7 & educat <= 9

创建变量——egen

egen [type] newvar = fcn(arguments) [if] [in] [,options]

其中,fcn(arguments)指特有函数

egen命令函数清单:

sd(exp) 创建标准差

skew(varname) 创建偏度

std(exp) 创建标准化的赋值,默认均值为0,标准差为1

total(exp) 数值为表达式exper的加总

iqr(exp) 创建含有四分位距的变量

kurt(varname) 创建峰度新变量

mean(exp) 创建平均数变量

median(exp) 创建中位数变量

egen educavg = mean(educ)
egen educed = educ - educavg
sum educ educavg educed  //总结这三个变量的关键统计指标,呈现基本情况
bysort female: egen educmed = median(educ)  //分性别计算变量,产生educ中位数
egen stdeduc = std(educ)
egen higheduc = anyvalue(educ), v(13/18)  //若educ的取值为13-18,则生成新变量higheduc赋值为educ;否则higheduc取值为缺失值“.”
list higheduc educ in 1/20  //列举前20个观测值,对比这两个变量。
egen sexmar = group(female married) //使用了group函数,说明用于分组的标准为female 和married

数值和字符串的转换——encode、decode

字符型到数值型:为已经存在的字符串变量添加一个去了标签的数值型变量

encode varname [if] [in], generate (newvar) [label(name)]

数值型到字符型:为已经存在的数值型变量和它的标签生成一个字符串变量

decode varname [if] [in], generate (newvar) [maxlength(#)]

real() 用于从合适的字符串表达式中得到相应的数值

encode sex, gen(gender)   //生成数值型变量gender

list sex gender in 1/4, nolabel   //不加标签的情况下罗列前四个观测值

label list gender  //查看gender标签的具体情况

decode gender, gen(gender2)  //根据gender变量重新生成一个字符型变量gender2

real("5.2")+1=6.2 //real()得到字符串的真实数值,如果不包含数字则返回缺失值

生成分类变量和虚拟变量

1.虚拟变量

gen college = 0

replace college=1 if educ>=12

list educ college in 1/10 //列表呈现前十个观测值


gen college = (educ>=12)

gen collede2 = (educ>12 & educ<17)

gen master = (educ >16 & educ<.)

tab1 edu0-master //查看新生成变量的频数表

2.分类变量

recode命令

recode oldvar (rule) [,gen(newvar)]

rule: #=#  3=1  3转换为1;##=#  2.=9  2和缺失值转化为9;#/#=#  1/5=9  1到5转化为9

nonmissing=#  Nonmissing=8  所有非缺失值转化为8

missing = #    miss = 9      所有缺失值转化为9

例:recode x (1 = 2) ,gen(nx)

recode x (1 = 2) (2 = 1) ,gen(nx1)

recode x (1 2 = 3) (4/7 = 3),gen(nx1)

(1)gen edu6 = 0

replace edu6=1 if educ>0 & educ<7

replace edu6=2 if educ>6 & educ<10

replace edu6=3 if educ>9 & educ<13

replace edu6=4 if educ>12 & educ<17

replace edu6=5 if educ>16 & educ<.

Tabulate edu6



(2)autocode(x,n,xmin,xmax) //把x自动分成间隔相等的n组

group(x) //等距分成x组

recode(x,x1,x2,...,xn) //当x缺失时求得缺失值


*gen exper1 = autocode(exper,5,1,51)

tabulate exper1

*sort exper

gen exper2 = group(5)

tabulate exper2

*gen exper3 = recode (exper,5,15,25,40,51)

tabulate exper3

你可能感兴趣的:(stata,学习)