STATA的因子变量

定义

因子变量(factor variable)是对现有变量的展开,即展开成一组变量。常用于从分类变量中创建虚拟变量。注意,带因子变量操作符的分类变量的取值必须是非0的正整数,不能存在小于0的负数。

因子变量运算符

Operator Description 说明
i. unary operator to specify indicators 指定为分类变量各类别
c. unary operator to treat as continuous 指定为连续变量
o. unary operator to omit a variable or indicator 忽略一个变量或类别
# binary operator to specify interactions 交互
## binary operator to specify full-factorial interactions 全因子交互

例子

Factor specification Result
i.group indicators for levels of group
i.group#i.sex indicators for each combination of levels of group and sex, a two-way interaction
group#sex same as i.group#i.sex
group#sex#arm indicators for each combination of levels of group, sex, and arm, a three-way interaction
group##sex same as i.group i.sex group#sex
group##sex##arm same as i.group i.sex i.arm group#sex group#arm sex#arm group#sex#arm
sex#c.age two variables—age for males and 0 elsewhere, and age for females and 0 elsewhere; if age is also in the model, one of the two virtual variables will be treated as a base
sex##c.age same as i.sex age sex#c.age
c.age same as age
c.age#c.age age squared
c.age#c.age#c.age age cubed

基准类别

默认为组1(取值最小的组别)为基准类别。指定基准类别,使用操作符ib.

Base operator [1] Description 说明
ib#. use # as base, # = value of variable 指定值
ib(##). use the #th ordered value as base [2] 指定次序值
ib(first). use smallest value as base (default) 指定最小值,即第一组
ib(last). use largest value as base 指定最大值,即最后一组
ib(freq). use most frequent value as base 指定频数最高
ibn. no base level 没有基准项

操作符ibn.的特殊用法

i.varlist的系数为其他类别与基准类别的偏差。
ibn.varlist配合noconstant选项使用,则varlist的系数变成各类别的实际系数而非偏差。
试比较下列命令的结果。

reg  y  i.group age
reg  y  ibn.group age,  noconstant

参考文献

STATA参考手册[U] User's Guide

  • 11 Language syntax
    • 11.4 varname and varlists
      • 11.4.3 Factor variables

  1. The i may be omitted. For instance, you can type ib2.group or b2.group. ↩

  2. For example, ib(#2). means to use the second value as the base. ↩

你可能感兴趣的:(STATA的因子变量)