R语言机器学习与临床预测模型53--贝叶斯线性回归(Bayesian Linear Regression)

本内容为【科研私家菜】R语言机器学习与临床预测模型系列课程

你想要的R语言学习资料都在这里, 快来收藏关注【科研私家菜】


01 贝叶斯线性回归(Bayesian Linear Regression)

贝叶斯线性回归的引⼊主要是在最⼤似然估计中很难决定模型的复杂程度,ridge回归加⼊的惩罚参数其实也是解决这个问题的,同时可以采⽤的⽅法还有对数据进⾏正规化处理,另⼀个可以解决此问题的⽅法就是采⽤贝叶斯⽅法。
线性回归用最小二乘法,转换为极大似然估计求解参数W,但这很容易导致过拟合,由此引入了带正则化的最小二乘法(可证明等价于最大后验概率)


贝叶斯线性回归不仅可以解决极大似然估计中存在的过拟合的问题。

它对数据样本的利用率是100%,仅仅使用训练样本就可以有效而准确的确定模型的复杂度。
在极大似然估计线性回归中我们把参数看成是一个未知的固定值,而贝叶斯学派则把看成是一个随机变量。
贝叶斯回归主要研究两个问题:inference(求后验)和prediction


1)贝叶斯回归的inference问题

2) 贝叶斯回归的prediction问题

根据inference求得的后验求prediction


3) 总结

02 贝叶斯回归的优缺点

优点:

贝叶斯回归对数据有自适应能力,可以重复的利用实验数据,并防止过拟合
贝叶斯回归可以在估计过程中引入正则项
先验分布:如果具备领域知识或者对于模型参数的猜测,我们可以在模型中将它们包含进来,而不是像在线性回归的频率方法那样:假设所有关于参数的所需信息都来自于数据。如果事先没有没有任何的预估,我们可以为参数使用无信息先验,比如一个正态分布。(例如高斯先验引入了L2正则化)
后验分布:使用贝叶斯线性回归的结果是一个基于训练数据和先验概率的模型参数的分布。这使得我们能够量化对模型的不确定性:如果我们拥有较少的数据点,后验分布将更加分散。
缺点:

贝叶斯回归的学习过程开销太大

03 贝叶斯线性回归(Bayesian Linear Regression) 的R语言实现

(部分)

data <- read.csv("Kmeans_scaled_data100.csv")
# Taking out the proportions since I now have my logit data which converted the 
# predictor's domain from -inf to +inf
# 1 was defaulting on loan
# 0 not defaulting on loan
data <- data[,-c(dim(data)[2]- 1)]

index = data$Cluster_ID
# Taking out the cluster numbers from the analysis
data <- data[,-c(1)]


# Let us normalize the data...
# Storing the Response
response = data$logit_loan_default

# Normalizing the covariates
data = data[,-c(dim(data)[2])]

normalize <- function(x) {
  return ((x - min(x)) / (max(x) - min(x)))
}

dfNorm <- as.data.frame(lapply(data, normalize))


y = response
x = dfNorm

# Combining the data...
# Separating training and testing sets...

dat <- cbind(x,y)
names(dat)[9] <- "LOGIT_LOAN_DEFAULT"
# Seperating for test and train...
# Predicting 7 Observations (Since I only have 30 observations )
test <- dat[1:20,]
train <- dat[-c(1:20),]

y = train$LOGIT_LOAN_DEFAULT
x = train[,c(1:8)]

model_lm <- lm(LOGIT_LOAN_DEFAULT~., data = dat)
pred_lm <- predict(model_lm, test[,c(-9)])
library(Metrics)
rmse(actual = test$LOGIT_LOAN_DEFAULT, predicted = pred_lm)
# construct your various confidence intervals etc...


# Designing the Gibbs Sampler (Markov Chains Monte Carlo Method)
# MCMC comprise a class of algorithms for sampling form a probability distr.
# "By constructing a Markov chain that has the desired distribution as its
# equiliburium distirbution, you can obtain a sample of the desired distribution
# by recording states from the chain." - Wiki

# Prior Specifications
m0 <- 0
m1 <- 0 
m2 <- 0
m3 <- 0 
m4 <- 0
m5 <- 0 
m6 <- 0
m7 <- 0
m8 <- 0

g <- 10
a <- 0.01
b <- 0.01

# Constants for MCMC
n <- dim(train)[[1]]

# Number of variables + Intercept
p <- 8 + 1

# Iterations
I <- 1000 #100000 # 100,000
betaStore <- matrix(0,I,9) # Stores Beta values
phiStore <- matrix(0,I,1) # Stores Phi Values
yPredStore <- matrix(0,I,1) # Stores predictions
pPredStore <- matrix(0,I,1) # Stores p values

testPred <- matrix(0,I,20)
predAccuracy <- matrix(0,I,1)

  #Draw Beta_8
  yStar <- y-betaCur0-betaCur1*x[,1]-betaCur2*x[,2]-betaCur3*x[,3]-betaCur4*x[,4] -betaCur5*x[,5]-betaCur6*x[,6]-betaCur7*x[,7]
  mnStar8 <- (sum(x[,8]^2)+1/g)^(-1)*(sum(x[,8]*yStar)+m8/g)
  sdStar8 <- sqrt((sum(x[,8]^2)+1/g)^(-1)*phiCur^(-1))
  betaCur8 <- rnorm(1,mnStar8,sdStar8)
  
  #---------------------------------------------------------------------------
  #Draw phi
  aStar <- n/2+p/2+a
  bTemp1 <- sum( (y-betaCur0-betaCur1*x[,1]-betaCur2*x[,2]-betaCur3*x[,3]-betaCur4*x[,4] -betaCur5*x[,5]-betaCur6*x[,6]-betaCur7*x[,7]-betaCur8*x[,8])^2)
  bTemp2 <- ( (betaCur0-m0)^2+(betaCur1-m1)^2+(betaCur2-m2)^2+(betaCur3-m3)^2 + (betaCur4-m4)^2+(betaCur5-m5)^2+(betaCur6-m6)^2+(betaCur7-m7)^2+(betaCur8-m8)^2) /g
  bStar <- .5*(bTemp1+bTemp2)+b
  phiCur <- rgamma(1,aStar,rate=bStar)
  #Store the draws
  betaStore[c,1] <- betaCur0
  betaStore[c,2] <- betaCur1
  betaStore[c,3] <- betaCur2
  betaStore[c,4] <- betaCur3
  betaStore[c,5] <- betaCur4
  betaStore[c,6] <- betaCur5
  betaStore[c,7] <- betaCur6
  betaStore[c,8] <- betaCur7
  betaStore[c,9] <- betaCur8
  phiStore[c,1] <- phiCur
  #predY
  muPred <- betaCur0+betaCur1*test[,1]+betaCur2*test[,2]+betaCur3*test[,3]+betaCur4*test[,4]+betaCur5*test[,5]+betaCur6*test[,6]+betaCur7*test[,7]+betaCur8*test[,8]
  sdPred <- sqrt(1/phiCur)
  yPredStore[c,1] <- rnorm(1, muPred, sdPred)
  pPredStore[c,1] <- (exp(yPredStore[c,1]) / (1+exp(yPredStore[c,1])))
  # This is to 'convert' the logit function back to the original 'form'
  
  PA = 0
  for (m in 1:dim(test)[1]) {
    for (k in 1:dim(test)[1]) {
      muPred <- betaCur0+betaCur1*test[k,1]+betaCur2*test[k,2]+betaCur3*test[k,3]+betaCur4*test[k,4]+betaCur5*test[k,5]+betaCur6*test[k,6]+betaCur7*test[k,7]+betaCur8*test[k,8]
      sdPred <- sqrt(1/phiCur)
      testPred[c,k] <- rnorm(1, muPred, sdPred)
    }
    # Finding the interval
    
    CITestPred<- quantile(testPred[c,], probs = c(0.025,0.975))
    if (test$LOGIT_LOAN_DEFAULT[m] > as.numeric(CITestPred[1]) & (test$LOGIT_LOAN_DEFAULT[m] < as.numeric(CITestPred[2]))) {
      PA = PA + 1
    }
  }
  predAccuracy[c,1] <- PA / 20
  print(c)
  
}

# Printing the output of the MCMC
par(mfrow=c(5,2))
plot(betaStore[,1], main="Beta0", type="l")
plot(betaStore[,2], main="Beta1", type="l")
plot(betaStore[,3], main="Beta2", type="l")
plot(betaStore[,4], main="Beta3", type="l")
plot(betaStore[,5], main="Beta4", type="l")
plot(betaStore[,6], main="Beta5", type="l")
plot(betaStore[,7], main="Beta6", type="l")
plot(betaStore[,8], main="Beta7", type="l")
plot(betaStore[,9], main="Beta8", type="l")
plot(phiStore[,1], main="Phi", type="l")

04 参考资料:

https://www.shuzhiduo.com/A/8Bz8jo4y5x/
https://www.shuzhiduo.com/A/Gkz1ZwY2JR/
https://blog.csdn.net/qq_32742009/article/details/81485887
https://zhuanlan.zhihu.com/p/269420459



关注R小盐,关注科研私家菜(VX_GZH: SciPrivate),有问题请联系R小盐。让我们一起来学习 R语言机器学习与临床预测模型

你可能感兴趣的:(R语言机器学习与临床预测模型53--贝叶斯线性回归(Bayesian Linear Regression))