解释变量和误差项存在内生性问题
内生性问题主要来自于三个方面,分别为:遗漏变量、联立性以及度量误差
比如:研究犯罪率和警察数量的关系,一般来说,警察数量越多,犯罪率越低;但反过来,犯罪率降低,警察数量也会减少
通常有两种方法解决内生性问题即使用内生变量的滞后一期和工具变量法。
操作:
第一阶段:内生变量工具变量的回归
x 3 x_3 x3 = π 1 {\pi_1} π1 + π 2 x 2 {\pi_2}{x_2} π2x2 + π 3 z 1 {\pi_3}{z_1} π3z1 + π 4 z 2 {\pi_4}{z_2} π4z2 + u 1 u_1 u1
x 4 x_4 x4 = γ 1 {\gamma_1} γ1 + γ 2 x 2 {\gamma_2}{x_2} γ2x2 + γ 3 z 1 {\gamma_3}{z_1} γ3z1 + γ 4 z 2 {\gamma_4}{z_2} γ4z2 + u 2 u_2 u2
第二阶段:用预测回归的拟合值进行(代入第一阶段预测值)
y y y = β 1 \beta_1 β1 + β 2 x 2 {\beta_2}{x_2} β2x2 + β 3 x 3 ⋅ {\beta_3}{x_3^·} β3x3⋅ + β 4 x 4 ⋅ {\beta_4}{x_4^·} β4x4⋅
STATA实现
regress x3 x1 z1 z2
predict v
regress x4 x2 z1 z2
predict w
regress y x2 v w
TSLS的难点不在于估计方法,而在于恰当的工具变量的选择。若存在N个潜在的内生解释变量,则至少需要N个IV。
原理:
使用工具变量法进行估计时,我们需要对工具变量进行三项检验,分别为:内生性检验、相关性检验、外生性检验。
use crime.dta //打开数据集
des //查看数据
##结果
obs: 630
vars: 59 5 Jun 2007 14:32
---------------------------------------------------------------------------------------------------------------------
storage display value
variable name type format label variable label
---------------------------------------------------------------------------------------------------------------------
county int %9.0g county identifier
year byte %9.0g 81 to 87
crmrte float %9.0g crimes committed per person
prbarr float %9.0g 'probability' of arrest
prbconv float %9.0g 'probability' of conviction
prbpris float %9.0g 'probability' of prison sentenc
avgsen float %9.0g avg. sentence, days
polpc float %9.0g police per capita
density float %9.0g people per sq. mile
taxpc float %9.0g tax revenue per capita
west byte %9.0g =1 if in western N.C.
central byte %9.0g =1 if in central N.C.
urban byte %9.0g =1 if in SMSA
pctmin80 float %9.0g perc. minority, 1980
wcon float %9.0g weekly wage, construction
wtuc float %9.0g wkly wge, trns, util, commun
wtrd float %9.0g wkly wge, whlesle, retail trade
wfir float %9.0g wkly wge, fin, ins, real est
wser float %9.0g wkly wge, service industry
wmfg float %9.0g wkly wge, manufacturing
wfed float %9.0g wkly wge, fed employees
wsta float %9.0g wkly wge, state employees
wloc float %9.0g wkly wge, local gov emps
mix float %9.0g offense mix: face-to-face/other
pctymle float %9.0g percent young male
d82 byte %9.0g =1 if year == 82
d83 byte %9.0g =1 if year == 83
d84 byte %9.0g =1 if year == 84
d85 byte %9.0g =1 if year == 85
d86 byte %9.0g =1 if year == 86
d87 byte %9.0g =1 if year == 87
lcrmrte float %9.0g log(crmrte)
lprbarr float %9.0g log(prbarr)
lprbconv float %9.0g log(prbconv)
lprbpris float %9.0g log(prbpris)
lavgsen float %9.0g log(avgsen)
lpolpc float %9.0g log(polpc)
...
xtset county year //设置面板数据格式
##结果
panel variable: county (strongly balanced)
time variable: year, 81 to 87
delta: 1 unit
xtdes //查看数据
##结果
county: 1, 3, ..., 197 n = 90
year: 81, 82, ..., 87 T = 7
Delta(year) = 1 unit
Span(year) = 7 periods
(county*year uniquely identifies each observation)
Distribution of T_i: min 5% 25% 50% 75% 95% max
7 7 7 7 7 7 7
Freq. Percent Cum. | Pattern
---------------------------+---------
90 100.00 100.00 | 1111111
---------------------------+---------
90 100.00 | XXXXXXX
sum lcrmrte lprbarr lprbconv lprbpris lavgsen lpolpc ldensity lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc lpctymle lpctmin west central urban
## 结果
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
lcrmrte | 630 -3.609225 .5728077 -6.31355 -1.808895
lprbarr | 630 -1.274264 .415897 -2.833214 1.011601
lprbconv | 630 -.6929193 .6095949 -2.682732 3.610918
lprbpris | 630 -.8786315 .2305144 -1.904239 -.3877662
lavgsen | 630 2.153344 .2737295 1.439835 3.251537
-------------+---------------------------------------------------------
lpolpc | 630 -6.490637 .5266539 -7.687507 -3.336024
ldensity | 630 -.0159271 .7747352 -1.62091 2.177889
lwcon | 630 5.462869 .2481783 4.183905 7.751303
lwtuc | 630 5.915883 .3702186 3.362377 8.020257
lwtrd | 630 5.232423 .2143915 2.82576 7.715457
-------------+---------------------------------------------------------
lwfir | 630 5.579433 .2772037 1.257233 6.233362
lwser | 630 5.364625 .3600984 .6118253 7.685734
lwmfg | 630 5.615181 .2727473 4.623305 6.472115
lwfed | 630 5.988757 .1587609 5.542831 6.393507
lwsta | 630 5.677787 .1761313 5.153407 6.306275
-------------+---------------------------------------------------------
lwloc | 630 5.540139 .1596908 5.097363 5.961237
lpctymle | 630 -2.443015 .1967842 -2.77808 -1.29332
lpctmin | 630 2.913361 .9546147 .2497076 4.164309
west | 630 .2333333 .4232887 0 1
central | 630 .3777778 .4852169 0 1
-------------+---------------------------------------------------------
urban | 630 .0888889 .2848094 0 1
twoway (scatter lcrmrte lprbarr) (lfit lcrmrte lprbarr) //关键变量与被解释变量的散点图并画出回归直线
xtline lcrmrte //关键变量的时间序列图
xtivreg lcrmrte lprbconv lprbpris lavgsen ldensity lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc lpctymle lpctmin west central urban d82 d83 d84 d85 d86 d87 (lprbarr lpolpc= ltaxpc lmix), fe //双向固定效应的两阶段最小二乘估计
##结果
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
lcrmrte | 630 -3.609225 .5728077 -6.31355 -1.808895
lprbarr | 630 -1.274264 .415897 -2.833214 1.011601
lprbconv | 630 -.6929193 .6095949 -2.682732 3.610918
lprbpris | 630 -.8786315 .2305144 -1.904239 -.3877662
lavgsen | 630 2.153344 .2737295 1.439835 3.251537
-------------+---------------------------------------------------------
lpolpc | 630 -6.490637 .5266539 -7.687507 -3.336024
ldensity | 630 -.0159271 .7747352 -1.62091 2.177889
lwcon | 630 5.462869 .2481783 4.183905 7.751303
lwtuc | 630 5.915883 .3702186 3.362377 8.020257
lwtrd | 630 5.232423 .2143915 2.82576 7.715457
-------------+---------------------------------------------------------
lwfir | 630 5.579433 .2772037 1.257233 6.233362
lwser | 630 5.364625 .3600984 .6118253 7.685734
lwmfg | 630 5.615181 .2727473 4.623305 6.472115
lwfed | 630 5.988757 .1587609 5.542831 6.393507
lwsta | 630 5.677787 .1761313 5.153407 6.306275
-------------+---------------------------------------------------------
lwloc | 630 5.540139 .1596908 5.097363 5.961237
lpctymle | 630 -2.443015 .1967842 -2.77808 -1.29332
lpctmin | 630 2.913361 .9546147 .2497076 4.164309
west | 630 .2333333 .4232887 0 1
central | 630 .3777778 .4852169 0 1
-------------+---------------------------------------------------------
urban | 630 .0888889 .2848094 0 1
. twoway (scatter lcrmrte lprbarr) (lfit lcrmrte lprbarr)
. xtline lcrmrte
. xtivreg lcrmrte lprbconv lprbpris lavgsen ldensity lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc lpctymle l
> pctmin west central urban d82 d83 d84 d85 d86 d87 (lprbarr lpolpc= ltaxpc lmix), fe
Fixed-effects (within) IV regression Number of obs = 630
Group variable: county Number of groups = 90
R-sq: Obs per group:
within = 0.3587 min = 7
between = 0.4442 avg = 7.0
overall = 0.4431 max = 7
Wald chi2(22) = 368612.24
corr(u_i, Xb) = -0.1867 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
lcrmrte | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lprbarr | -0.576 0.802 -0.72 0.473 -2.148 0.997
lpolpc | 0.658 0.847 0.78 0.438 -1.002 2.317
lprbconv | -0.423 0.502 -0.84 0.399 -1.407 0.561
lprbpris | -0.250 0.279 -0.90 0.371 -0.798 0.297
lavgsen | 0.009 0.049 0.19 0.853 -0.087 0.105
ldensity | 0.139 1.021 0.14 0.891 -1.862 2.141
lwcon | -0.029 0.054 -0.54 0.591 -0.134 0.076
lwtuc | 0.039 0.031 1.27 0.205 -0.021 0.100
lwtrd | -0.018 0.045 -0.39 0.695 -0.107 0.071
lwfir | -0.009 0.037 -0.26 0.798 -0.081 0.062
lwser | 0.019 0.039 0.48 0.632 -0.057 0.095
lwmfg | -0.243 0.420 -0.58 0.562 -1.065 0.579
lwfed | -0.451 0.527 -0.86 0.392 -1.484 0.582
lwsta | -0.019 0.281 -0.07 0.947 -0.569 0.532
lwloc | 0.263 0.312 0.84 0.399 -0.349 0.876
lpctymle | 0.351 1.011 0.35 0.728 -1.631 2.333
lpctmin | 0.000 (omitted)
west | 0.000 (omitted)
central | 0.000 (omitted)
urban | 0.000 (omitted)
d82 | 0.038 0.062 0.61 0.540 -0.083 0.159
d83 | -0.044 0.042 -1.05 0.295 -0.127 0.039
d84 | -0.045 0.055 -0.82 0.410 -0.153 0.062
d85 | -0.021 0.074 -0.28 0.777 -0.166 0.124
d86 | 0.006 0.128 0.05 0.961 -0.245 0.257
d87 | 0.044 0.216 0.20 0.840 -0.380 0.467
_cons | 2.943 2.694 1.09 0.275 -2.337 8.223
-------------+----------------------------------------------------------------
sigma_u | .41829289
sigma_e | .14923885
rho | .88708121 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(89,518) = 13.93 Prob > F = 0.0000
------------------------------------------------------------------------------
Instrumented: lprbarr lpolpc
Instruments: lprbconv lprbpris lavgsen ldensity lwcon lwtuc lwtrd lwfir
lwser lwmfg lwfed lwsta lwloc lpctymle lpctmin west central
urban d82 d83 d84 d85 d86 d87 ltaxpc lmix
------------------------------------------------------------------------------
est store FE2SLS
xtivreg lcrmrte lprbconv lprbpris lavgsen ldensity lwcon lwtuc lwtrd lwfir lwser lwmfg lwfed lwsta lwloc lpctymle lpctmin west central urban d82 d83 d84 d85 d86 d87 (lprbarr lpolpc= ltaxpc lmix), ec2sls //随机效应的两阶段最小二乘估计
##结果
EC2SLS random-effects IV regression Number of obs = 630
Group variable: county Number of groups = 90
R-sq: Obs per group:
within = 0.4521 min = 7
between = 0.8158 avg = 7.0
overall = 0.7840 max = 7
Wald chi2(26) = 575.73
corr(u_i, X) = 0 (assumed) Prob > chi2 = 0.0000
------------------------------------------------------------------------------
lcrmrte | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lprbarr | -0.413 0.097 -4.24 0.000 -0.604 -0.222
lpolpc | 0.435 0.090 4.85 0.000 0.259 0.611
lprbconv | -0.323 0.054 -6.03 0.000 -0.428 -0.218
lprbpris | -0.186 0.042 -4.44 0.000 -0.269 -0.104
lavgsen | -0.010 0.027 -0.38 0.706 -0.063 0.043
ldensity | 0.429 0.055 7.82 0.000 0.322 0.537
lwcon | -0.007 0.040 -0.19 0.850 -0.085 0.070
lwtuc | 0.045 0.020 2.30 0.022 0.007 0.084
lwtrd | -0.008 0.041 -0.20 0.844 -0.089 0.073
lwfir | -0.004 0.029 -0.13 0.900 -0.060 0.053
lwser | 0.006 0.020 0.28 0.780 -0.034 0.045
lwmfg | -0.204 0.080 -2.54 0.011 -0.362 -0.046
lwfed | -0.164 0.159 -1.03 0.305 -0.476 0.149
lwsta | -0.054 0.106 -0.51 0.609 -0.261 0.153
lwloc | 0.163 0.120 1.36 0.173 -0.071 0.398
lpctymle | -0.108 0.140 -0.77 0.439 -0.382 0.166
lpctmin | 0.189 0.041 4.56 0.000 0.108 0.270
west | -0.227 0.100 -2.28 0.023 -0.422 -0.032
central | -0.194 0.060 -3.24 0.001 -0.311 -0.077
urban | -0.225 0.116 -1.95 0.052 -0.452 0.001
d82 | 0.011 0.026 0.42 0.677 -0.040 0.061
d83 | -0.084 0.031 -2.73 0.006 -0.144 -0.024
d84 | -0.103 0.037 -2.79 0.005 -0.176 -0.031
d85 | -0.096 0.049 -1.94 0.053 -0.193 0.001
d86 | -0.069 0.060 -1.16 0.248 -0.186 0.048
d87 | -0.031 0.071 -0.45 0.656 -0.170 0.107
_cons | -0.954 1.284 -0.74 0.458 -3.470 1.563
-------------+----------------------------------------------------------------
sigma_u | .2145596
sigma_e | .14923885
rho | .67394424 (fraction of variance due to u_i)
------------------------------------------------------------------------------
Instrumented: lprbarr lpolpc
Instruments: lprbconv lprbpris lavgsen ldensity lwcon lwtuc lwtrd lwfir
lwser lwmfg lwfed lwsta lwloc lpctymle lpctmin west central
urban d82 d83 d84 d85 d86 d87 ltaxpc lmix
------------------------------------------------------------------------------
est store EC2SLS
hausman FE2SLS EC2SLS //hausman检验
##结果
---- Coefficients ----
| (b) (B) (b-B) sqrt(diag(V_b-V_B))
| FE2SLS EC2SLS Difference S.E.
-------------+----------------------------------------------------------------
lprbarr | -.5755052 -.4129264 -.1625788 .7962526
lpolpc | .657526 .4347488 .2227773 .8421081
lprbconv | -.423144 -.3228871 -.1002569 .4990749
lprbpris | -.2502547 -.1863195 -.0639352 .2762967
lavgsen | .0090987 -.0101765 .0192752 .0408606
ldensity | .139412 .4290282 -.2896162 1.019765
lwcon | -.0287308 -.007475 -.0212558 .0360199
lwtuc | .0391292 .0454451 -.0063158 .0236726
lwtrd | -.0177536 -.0081411 -.0096124 .0184617
lwfir | -.0093443 -.0036395 -.0057048 .0223483
lwser | .0185855 .0056098 .0129756 .0331904
lwmfg | -.2431675 -.2041395 -.039028 .411768
lwfed | -.4513386 -.1635112 -.2878273 .5024337
lwsta | -.0187447 -.0540496 .0353049 .2601761
lwloc | .2632589 .1630526 .1002062 .2885798
lpctymle | .3511095 -.1081064 .4592159 1.001351
d82 | .037856 .0107451 .0271109 .0560526
d83 | -.0443806 -.0837946 .039414 .0292202
d84 | -.0451873 -.1034999 .0583125 .040481
d85 | -.020942 -.095702 .07476 .0548511
d86 | .0063223 -.0688986 .0752209 .1133461
d87 | .0435043 -.0314075 .0749118 .2039854
------------------------------------------------------------------------------
b = consistent under Ho and Ha; obtained from xtivreg
B = inconsistent under Ha, efficient under Ho; obtained from xtivreg
Test: Ho: difference in coefficients not systematic
chi2(22) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 19.50
Prob>chi2 = 0.6140
esttab FE FE2SLS EC2SLS ,b(%9.3f) se mtitle( FE FE2SLS EC2SLS) obslast star (* 0.1 ** 0.05 *** 0.01) compress nogap a
esttab FE FE2SLS EC2SLS using tabl.rtf ,b(%9.3f) se mtitle( FE FE2SLS EC2SLS) obslast star (* 0.1 ** 0.05 *** 0.01) compress nogap a