天鹰(中南财大——博士研究生)
E-mail: [[email protected]]
最近在利用STATA跑回归的过程中,发现了一个问题,在利用reg和xtreg这两个命令做单向固定效应模型时,出现了相同的结果。原本的认识是利用reg添加虚拟变量的形式能够实现个体固定、时间固定以及个体时间双向固定,而xtreg,fe实现的是个体时间双向固定,在错误的认知下,发现reg i.id和xtreg,fe跑出的结果竟然完全一致,这是不应该有的结果,产生这样的结果也促使自己再次追本溯源,一步步发现问题所在。
- 接下来,本文利用本人论文中的相关数据,对上述问题进行演示,同时,进一步汇总单向、双向以及多维固定效应的相关命令,以便对上述问题有一个更全面认识。
- 我们在论文中经常会见到列示OLS、随机效应、个体固定、时间固定以及双向固定的回归结果,那么对于面板数据来说,常用的相关命令无非是reg、xtreg等。
1.xtreg(官方命令)
xtreg,fe是固定效应模型的官方命令,使用这一命令估计出来的系数是最为纯正的固定效应估计量(组内估计量)。xtreg对数据格式有严格要求,要求必须是面板数据,在使用xtreg命令之前,我们首先需要使用xtset命令进行面板数据声明,定义截面(个体)维度和时间维度。
在xtreg命令后加上选项fe,那就表示使用固定效应组内估计方法进行估计,并且默认为个体固定效应,定义在xtset所设定的截面维度上。如果要进行时间固定,则需要在模型中通过i.year引入虚拟变量来表示。
结果演示:
xtreg rca_gvc l.ai lncd lnpi lnsize lnimr ,fe / / 个体固定效应
Fixed-effects (within) regression Number of obs = 238
Group variable: id Number of groups = 17
R-sq: Obs per group:
within = 0.1593 min = 14
between = 0.0173 avg = 14.0
overall = 0.0205 max = 14
F(5,216) = 8.18
corr(u_i, Xb) = -0.3699 Prob > F = 0.0000
------------------------------------------------------------------------------
rca_gvc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ai |
L1. | .3066382 .0959536 3.20 0.002 .1175129 .4957634
|
lncd | -.1003965 .0677319 -1.48 0.140 -.2338966 .0331035
lnpi | -.1923152 .0942642 -2.04 0.043 -.3781107 -.0065197
lnsize | .1256957 .0444703 2.83 0.005 .0380445 .213347
lnimr | .1070733 .0641571 1.67 0.097 -.0193809 .2335275
_cons | 1.741834 .4940647 3.53 0.001 .7680291 2.71564
-------------+----------------------------------------------------------------
sigma_u | .67217532
sigma_e | .14649365
rho | .95465604 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(16, 216) = 174.47 Prob > F = 0.0000
- 如果是双向固定,命令如下:
xtreg rca_gvc l.ai lncd lnpi lnsize lnimr i.year ,fe / / 个体时间双固定效应
结果如下:
Fixed-effects (within) regression Number of obs = 238
Group variable: id Number of groups = 17
R-sq: Obs per group:
within = 0.2555 min = 14
between = 0.0006 avg = 14.0
overall = 0.0000 max = 14
F(18,203) = 3.87
corr(u_i, Xb) = -0.6814 Prob > F = 0.0000
------------------------------------------------------------------------------
rca_gvc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ai |
L1. | .479147 .106974 4.48 0.000 .2682243 .6900697
|
lncd | .229894 .1027937 2.24 0.026 .0272137 .4325743
lnpi | -.1892991 .1000526 -1.89 0.060 -.3865747 .0079765
lnsize | .3399658 .0713975 4.76 0.000 .19919 .4807415
lnimr | .0390783 .0704173 0.55 0.580 -.0997648 .1779214
|
year |
2002 | -.0395395 .050029 -0.79 0.430 -.1381826 .0591036
2003 | -.059694 .0540411 -1.10 0.271 -.166248 .0468599
2004 | -.1355244 .0630919 -2.15 0.033 -.2599239 -.011125
2005 | -.1629442 .0714925 -2.28 0.024 -.3039073 -.0219811
2006 | -.2361056 .0885851 -2.67 0.008 -.4107705 -.0614407
2007 | -.3275978 .1047054 -3.13 0.002 -.5340475 -.1211481
2008 | -.3937222 .123663 -3.18 0.002 -.6375509 -.1498935
2009 | -.4627217 .1311296 -3.53 0.001 -.7212724 -.2041711
2010 | -.5822361 .1501323 -3.88 0.000 -.8782549 -.2862174
2011 | -.6646753 .1765024 -3.77 0.000 -1.012688 -.3166623
2012 | -.7010857 .1884788 -3.72 0.000 -1.072713 -.3294585
2013 | -.7910881 .2010942 -3.93 0.000 -1.187589 -.3945869
2014 | -.894121 .2109027 -4.24 0.000 -1.309962 -.4782801
|
_cons | -1.021565 .9437412 -1.08 0.280 -2.882358 .8392272
-------------+----------------------------------------------------------------
sigma_u | .8650854
sigma_e | .14220283
rho | .9736901 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(16, 203) = 182.56 Prob > F = 0.0000
- 其实,对于上述结果,完全可以利用reg添加虚拟变量的形式进行实现。
- 利用reg实现个体固定效应,命令和结果如下:
reg rca_gvc l.ai lncd lnpi lnsize lnimr i.id / / 个体固定效应
Source | SS df MS Number of obs = 238
-------------+---------------------------------- F(21, 216) = 198.11
Model | 89.2834348 21 4.25159213 Prob > F = 0.0000
Residual | 4.63544423 216 .02146039 R-squared = 0.9506
-------------+---------------------------------- Adj R-squared = 0.9458
Total | 93.918879 237 .39628219 Root MSE = .14649
------------------------------------------------------------------------------
rca_gvc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ai |
L1. | .3066382 .0959536 3.20 0.002 .1175129 .4957634
|
lncd | -.1003965 .0677319 -1.48 0.140 -.2338966 .0331035
lnpi | -.1923152 .0942642 -2.04 0.043 -.3781107 -.0065197
lnsize | .1256957 .0444703 2.83 0.005 .0380445 .213347
lnimr | .1070733 .0641571 1.67 0.097 -.0193809 .2335275
|
id |
2 | 1.882314 .1139589 16.52 0.000 1.6577 2.106928
3 | .9406015 .1224498 7.68 0.000 .6992519 1.181951
4 | .8582898 .127059 6.76 0.000 .6078555 1.108724
5 | -.0231274 .0606444 -0.38 0.703 -.1426579 .0964031
6 | .3334653 .1081376 3.08 0.002 .1203253 .5466053
7 | -.1342764 .1270614 -1.06 0.292 -.3847154 .1161625
8 | .0188374 .0861588 0.22 0.827 -.1509821 .188657
9 | -.7181154 .0702175 -10.23 0.000 -.8565147 -.5797161
10 | .3462213 .0788628 4.39 0.000 .1907822 .5016604
11 | .3725729 .0869125 4.29 0.000 .2012678 .543878
12 | .5364166 .0660762 8.12 0.000 .4061799 .6666532
13 | -.1958921 .0961684 -2.04 0.043 -.3854407 -.0063436
14 | -.8969968 .1289829 -6.95 0.000 -1.151223 -.6427706
15 | .020054 .1922435 0.10 0.917 -.3588594 .3989674
16 | -.5532008 .266913 -2.07 0.039 -1.079288 -.0271133
17 | -.2964652 .0828788 -3.58 0.000 -.45982 -.1331104
|
_cons | 1.595323 .5000321 3.19 0.002 .6097556 2.58089
------------------------------------------------------------------------------
2.reg
- 利用reg实现个体时间双固定效应,命令和结果如下:
. reg rca_gvc l.ai lncd lnpi lnsize lnimr i.id i.year / / 个体时间双固定效应
Source | SS df MS Number of obs = 238
-------------+---------------------------------- F(34, 203) = 130.63
Model | 89.8138851 34 2.64158486 Prob > F = 0.0000
Residual | 4.10499394 203 .020221645 R-squared = 0.9563
-------------+---------------------------------- Adj R-squared = 0.9490
Total | 93.918879 237 .39628219 Root MSE = .1422
------------------------------------------------------------------------------
rca_gvc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ai |
L1. | .479147 .106974 4.48 0.000 .2682243 .6900697
|
lncd | .229894 .1027937 2.24 0.026 .0272137 .4325743
lnpi | -.1892991 .1000526 -1.89 0.060 -.3865747 .0079765
lnsize | .3399658 .0713975 4.76 0.000 .19919 .4807415
lnimr | .0390783 .0704173 0.55 0.580 -.0997648 .1779214
|
id |
2 | 2.306169 .1521725 15.15 0.000 2.006128 2.606211
3 | 1.299284 .1397215 9.30 0.000 1.023792 1.574775
4 | 1.412836 .1691219 8.35 0.000 1.079375 1.746297
5 | .0287146 .0607942 0.47 0.637 -.0911544 .1485837
6 | .8257237 .1478991 5.58 0.000 .5341082 1.117339
7 | -.4835427 .1427926 -3.39 0.001 -.7650895 -.2019959
8 | -.1414247 .0902781 -1.57 0.119 -.3194277 .0365783
9 | -.6196797 .0711185 -8.71 0.000 -.7599054 -.4794541
10 | .4309111 .081576 5.28 0.000 .2700661 .5917561
11 | .3707501 .0894323 4.15 0.000 .1944148 .5470854
12 | .344849 .0791605 4.36 0.000 .1887668 .5009312
13 | -.080348 .1029481 -0.78 0.436 -.2833327 .1226366
14 | -1.006378 .1339116 -7.52 0.000 -1.270414 -.7423421
15 | .1569042 .2019668 0.78 0.438 -.2413176 .555126
16 | -1.03071 .2875253 -3.58 0.000 -1.597629 -.4637909
17 | .35005 .1753643 2.00 0.047 .0042808 .6958191
|
year |
2002 | -.0395395 .050029 -0.79 0.430 -.1381826 .0591036
2003 | -.059694 .0540411 -1.10 0.271 -.166248 .0468599
2004 | -.1355244 .0630919 -2.15 0.033 -.2599239 -.011125
2005 | -.1629442 .0714925 -2.28 0.024 -.3039073 -.0219811
2006 | -.2361056 .0885851 -2.67 0.008 -.4107705 -.0614407
2007 | -.3275978 .1047054 -3.13 0.002 -.5340475 -.1211481
2008 | -.3937222 .123663 -3.18 0.002 -.6375509 -.1498935
2009 | -.4627217 .1311296 -3.53 0.001 -.7212724 -.2041711
2010 | -.5822361 .1501323 -3.88 0.000 -.8782549 -.2862174
2011 | -.6646753 .1765024 -3.77 0.000 -1.012688 -.3166623
2012 | -.7010857 .1884788 -3.72 0.000 -1.072713 -.3294585
2013 | -.7910881 .2010942 -3.93 0.000 -1.187589 -.3945869
2014 | -.894121 .2109027 -4.24 0.000 -1.309962 -.4782801
|
_cons | -1.266513 .9626042 -1.32 0.190 -3.164498 .6314721
------------------------------------------------------------------------------
- 但是由上述回归结果可以发现,结果中会一并呈现出个体或者时间虚拟变量的结果,给人产生冗余感,那么另一个命令可以很好解决这个问题,即areg,absorb(),不想出现个体或时间虚拟变量,只需在absorb()中添加对应的类别变量即可。
3.areg
对应的命令和结果演示如下:
. areg rca_gvc l.ai lncd lnpi lnsize lnimr i.id , absorb(year)
Linear regression, absorbing indicators Number of obs = 238
F( 21, 203) = 210.60
Prob > F = 0.0000
R-squared = 0.9563
Adj R-squared = 0.9490
Root MSE = 0.1422
------------------------------------------------------------------------------
rca_gvc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ai |
L1. | .479147 .106974 4.48 0.000 .2682243 .6900697
|
lncd | .229894 .1027937 2.24 0.026 .0272137 .4325743
lnpi | -.1892991 .1000526 -1.89 0.060 -.3865747 .0079765
lnsize | .3399658 .0713975 4.76 0.000 .19919 .4807415
lnimr | .0390783 .0704173 0.55 0.580 -.0997648 .1779214
|
id |
2 | 2.306169 .1521725 15.15 0.000 2.006128 2.606211
3 | 1.299284 .1397215 9.30 0.000 1.023792 1.574775
4 | 1.412836 .1691219 8.35 0.000 1.079375 1.746297
5 | .0287146 .0607942 0.47 0.637 -.0911544 .1485837
6 | .8257237 .1478991 5.58 0.000 .5341082 1.117339
7 | -.4835427 .1427926 -3.39 0.001 -.7650895 -.2019959
8 | -.1414247 .0902781 -1.57 0.119 -.3194277 .0365783
9 | -.6196797 .0711185 -8.71 0.000 -.7599054 -.4794541
10 | .4309111 .081576 5.28 0.000 .2700661 .5917561
11 | .3707501 .0894323 4.15 0.000 .1944148 .5470854
12 | .344849 .0791605 4.36 0.000 .1887668 .5009312
13 | -.080348 .1029481 -0.78 0.436 -.2833327 .1226366
14 | -1.006378 .1339116 -7.52 0.000 -1.270414 -.7423421
15 | .1569042 .2019668 0.78 0.438 -.2413176 .555126
16 | -1.03071 .2875253 -3.58 0.000 -1.597629 -.4637909
17 | .35005 .1753643 2.00 0.047 .0042808 .6958191
|
_cons | -1.655874 1.052143 -1.57 0.117 -3.730405 .4186569
-------------+----------------------------------------------------------------
year | F(13, 203) = 2.018 0.021 (14 categories)
但是这对于两个分类固定效应还好,但是如果多维控制,那么使用areg,absorb()也不是很方便,这时候,一个解决上述问题的外部命令就应运而生reghdfe,absorb()。
4.reghdfe
reghdfe 主要用于实现多维固定效应线性回归。有些时候,我们需要控制多个维度(如城市-行业-年度)的固定效应,xtreg等命令也OK,但运行速度会很慢,reghdfe解决的就是这一痛点,其在运行速度方面远远优于xtreg等命令。reghdfe是一个外部命令,作者是Sergio Correia,在使用之前需要安装(ssc install reghdfe)。
reghdfe命令可以包含多维固定效应,只需 absorb (var1,var2,...),不需要使用i.var的方式引入虚拟变量,相比xtreg等命令方便许多,并且不会汇报一大长串虚拟变量回归结果,我个人也最为推荐这一命令。
- 利用reghdfe实现上述个体时间双向固定效应命令和结果如下:
. reghdfe rca_gvc l.ai lncd lnpi lnsize lnimr ,absorb(year id) / / 个体时间双向固定
(converged in 3 iterations)
HDFE Linear regression Number of obs = 238
Absorbing 2 HDFE groups F( 5, 203) = 10.16
Prob > F = 0.0000
R-squared = 0.9563
Adj R-squared = 0.9490
Within R-sq. = 0.2002
Root MSE = 0.1422
------------------------------------------------------------------------------
rca_gvc | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
ai |
L1. | .479147 .106974 4.48 0.000 .2682243 .6900697
|
lncd | .229894 .1027937 2.24 0.026 .0272137 .4325743
lnpi | -.1892991 .1000526 -1.89 0.060 -.3865747 .0079765
lnsize | .3399658 .0713975 4.76 0.000 .19919 .4807415
lnimr | .0390783 .0704173 0.55 0.580 -.0997648 .1779214
-------------+----------------------------------------------------------------
Absorbed | F(29, 203) = 103.061 0.000 (Joint test)
------------------------------------------------------------------------------
Absorbed degrees of freedom:
---------------------------------------------------------------+
Absorbed FE | Num. Coefs. = Categories - Redundant |
-------------+-------------------------------------------------|
year | 14 14 0 |
id | 16 17 1 |
---------------------------------------------------------------+
下面为大家总结了xtreg,reg,areg和reghdfe四个命令估计双向固定效应的方法。
命令 | 个体效应 | 时间效应 | 个体时间双效应 |
---|---|---|---|
xtreg | fe | i.year | i.year,fe |
reg | i.id | i.year | i.id i.year |
areg | absorb(id) | i.year | i.year ,absorb(id) |
reghdfe | absorb(id) | absorb(year) | absorb( id year) |
- 让我们看看xtreg,reg,areg和reghdfe四个命令的估计差别。
esttab FE_xtreg FE_reg FE_areg FE_reghdfe ,b(%6.3f) se scalars(N r2) star(* 0.1 ** 0.05 *** 0.01) ///
> keep( L.ai lncd lnpi lnsize lnimr) nogaps mtitles("FE_xtreg" "FE_reg" "FE_areg" "FE_reghdfe")
----------------------------------------------------------------------------
(1) (2) (3) (4)
FE_xtreg FE_reg FE_areg FE_reghdfe
----------------------------------------------------------------------------
L.ai 0.479*** 0.479*** 0.479*** 0.479***
(0.107) (0.107) (0.107) (0.107)
lncd 0.230** 0.230** 0.230** 0.230**
(0.103) (0.103) (0.103) (0.103)
lnpi -0.189* -0.189* -0.189* -0.189*
(0.100) (0.100) (0.100) (0.100)
lnsize 0.340*** 0.340*** 0.340*** 0.340***
(0.071) (0.071) (0.071) (0.071)
lnimr 0.039 0.039 0.039 0.039
(0.070) (0.070) (0.070) (0.070)
----------------------------------------------------------------------------
N 238 238 238 238
r2 0.255 0.956 0.956 0.956
----------------------------------------------------------------------------
Standard errors in parentheses
* p<0.1, ** p<0.05, *** p<0.01
从汇总表格展示的回归结果发现,xtreg,reg,areg和reghdfe四个命令估计的系数大小是一致的(有时标准误会有略微差异,这个数据呈现的结果无差别)。
- 其中,xtreg和reghdfe命令估计得到的标准误是一致的,它们背后的估计方法是固定效应。
- 而reg和areg命令估计得到的标准误是一致的,因为这两个命令背后的估计方法是特殊的混合OLS(LSDV方法)。