Lesson 9 : Logit 回归

一、广义线性模型

glm(formula, family=binomial(link="logit"), data=df)
分 布 族 默认的连接函数
binomial link = “logit”
gaussian link = “identity”
gamma link = “inverse”

二、Logit 回归

1、步骤

Step 1 定义因变量
把出轨次数转为是否出轨(0,1),再转化成名义型因子才可以作为Logit回归的因变量。

data(Affairs, package="AER")
# 从不同次数转为是否出轨0,1
Affairs$ynaffair[affairs > 0] <- 1 
Affairs$ynaffair[affairs == 0] <- 0
# 转为名义型因子yes,no
Affairs$ynaffair <- factor(Affairs$ynaffair,levels=c(0,1),labels=c("No","Yes"))  

Step 2 选择合适的回归模型

fit <- glm((ynaffair ~ gender + age + yearsmarried + children +
                  religiousness + education + occupation +rating,  
                  family=binomial(link="logit"), data=df)
fit.reduced <- glm((ynaffair ~ age + yearsmarried + religiousness + rating,  
                  family=binomial(link="logit"), data=df)
anova(fit,fit.reduced) # 根据结果,应使用fit.reduced

Step 3 预测

# 自定义预测数据集
testdata <- data.frame(rating=c(1, 2, 3, 4, 5), age=mean(Affairs$age),
                       yearsmarried=mean(Affairs$yearsmarried),
                       religiousness=mean(Affairs$religiousness))
# 预测
testdata$prob <- predict(fit.reduced, newdata=testdata, type="response")

2、过度离势

fit <- glm(ynaffair ~ age + yearsmarried + religiousness + rating, 
           family = binomial(), data = Affairs)
deviance(fit.reduced)/df.residual(fit.reduced) # 如果接近于1,没有过度离势
# 这里的df是自带的函数,不是数据框

如果存在过度离势,应使用准二项分布

fit.od <- glm(ynaffair ~ age + yearsmarried + religiousness + rating, 
              family = quasibinomial(), data = Affairs)

把两个回归式比较,若p值显著,则存在过度离势,应使用fit.od

pchisq(summary(fit.od)$dispersion * fit$df.residual, fit$df.residual, lower = F)
# df是自带的函数

你可能感兴趣的:(零基础,r语言)