这篇文章主要是:R语言中工具变量的使用、涉及到的数据处理以及模型含义。
两个例题:例题一涉及6个问题,使用的数据集为R语言自带的fertil2。例题二涉及3个问题,使用的数据集为stata格式的eitc.dta。本文介绍例题一。例题二见上篇。
例题一
1.1 问题
使用数据集fertil2,因变量为children,educ为自变量(是否为内生在问题1-6中会讨论),还有其他自变量如age等。
1.2 我的解答
1.3 R语言代码
> library(wooldridge) # To download database
> library(tidyverse) # Tidyverse packages
> library(stargazer) # To draw output tables
> library(lmtest)
> library(AER) # applied econometrics in R
> data("fertil2")
> view(fertil2)
> head(fertil2,6) #看数据前6行
mnthborn yearborn age electric radio tv bicycle educ ceb agefbrth children knowmeth usemeth monthfm yearfm agefm
1 5 64 24 1 1 1 1 12 0 NA 0 1 0 NA NA NA
2 1 56 32 1 1 1 1 13 3 25 3 1 1 11 80 24
3 7 58 30 1 0 0 0 5 1 27 1 1 0 6 83 24
4 11 45 42 1 0 1 0 4 3 17 2 1 0 1 61 15
5 5 45 43 1 1 1 1 11 2 24 2 1 1 3 66 20
6 8 52 36 1 0 0 0 7 1 26 1 1 1 11 76 24
idlnchld heduc agesq urban urb_educ spirit protest catholic frsthalf educ0 evermarr
1 2 NA 576 1 12 0 0 0 1 0 0
2 3 12 1024 1 13 0 0 0 1 0 1
3 5 7 900 1 5 1 0 0 0 0 1
4 3 11 1764 1 4 0 0 0 0 0 1
5 2 14 1849 1 11 0 1 0 1 0 1
6 4 9 1296 1 7 0 0 0 0 0 1
#问题1
> model_1 = lm(children~ educ + age + agesq,data = fertil2)
> summary(model_1)
Call:
lm(formula = children ~ educ + age + agesq, data = fertil2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.1383066 0.2405942 -17.200 <2e-16 ***
educ -0.0905755 0.0059207 -15.298 <2e-16 ***
age 0.3324486 0.0165495 20.088 <2e-16 ***
agesq -0.0026308 0.0002726 -9.651 <2e-16 ***
#问题4
> model_2 = lm(educ~ frsthalf,data = fertil2)
> summary(model_2)
Call:
lm(formula = educ ~ frsthalf, data = fertil2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.36277 0.08711 73.042 < 2e-16 ***
frsthalf -0.93766 0.11849 -7.913 3.15e-15 ***
#问题5
> model_3 = lm(children~ frsthalf + age + agesq,data = fertil2)
> summary(model_3)
Call:
lm(formula = children ~ frsthalf + age + agesq, data = fertil2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.0501211 0.2412071 -20.937 < 2e-16 ***
frsthalf 0.1461660 0.0455053 3.212 0.00133 **
age 0.3421186 0.0169552 20.178 < 2e-16 ***
agesq -0.0025856 0.0002795 -9.252 < 2e-16 ***
#问题5
> model_4 = lm(children~ educ + age + agesq + electric + tv + bicycle,
+ data = fertil2)
> summary(model_4)
Call:
lm(formula = children ~ educ + age + agesq + electric + tv +
bicycle, data = fertil2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -4.3897837 0.2403173 -18.267 < 2e-16 ***
educ -0.0767093 0.0063526 -12.075 < 2e-16 ***
age 0.3402038 0.0164417 20.692 < 2e-16 ***
agesq -0.0027081 0.0002706 -10.010 < 2e-16 ***
electric -0.3027293 0.0761869 -3.974 7.20e-05 ***
tv -0.2531443 0.0914374 -2.768 0.00566 **
bicycle 0.3178950 0.0493661 6.440 1.33e-10 ***
> model_5 = ivreg(children~educ + age + agesq + electric + tv + bicycle|
+ frsthalf + age + agesq + electric + tv + bicycle,
+ data = fertil2)
> summary(model_5)
Call:
ivreg(formula = children ~ educ + age + agesq + electric + tv +
bicycle | frsthalf + age + agesq + electric + tv + bicycle,
data = fertil2)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.5913324 0.6450889 -5.567 2.74e-08 ***
educ -0.1639814 0.0655269 -2.503 0.0124 *
age 0.3281451 0.0190587 17.218 < 2e-16 ***
agesq -0.0027222 0.0002766 -9.843 < 2e-16 ***
electric -0.1065314 0.1659650 -0.642 0.5210
tv -0.0025550 0.2092301 -0.012 0.9903
bicycle 0.3320724 0.0515264 6.445 1.28e-10 ***
以上是我自己做的答案,也不知道正确答案如何,如果有会的同学来点评帮助一下,小编将感激不尽。共勉。