这一章我们讨论的是非线性回归模型

神经网络

通过引入一个非线性函数，将非线性关系转换成线性关系：

其中g(u)为非线性函数，也称为隐含层，用于做转换，转换的目的就是将非线性关系转换成线性关系：

这张关系图可以很好的说明这个问题

当经过隐含层转换时，我们的output可以写成：

其中hk代表经过隐含层转化的结果，γk是各项系数
那么计算这个模型的误差如下：

λ为惩罚系数
其中 j为输入层的第j项，k是隐含层的第k项，事实上将β带入隐含层转换，即有k项

R：

 nnetFit <- nnet(predictors, outcome, 
size = 5,decay = 0.01, linout = TRUE, 
## Reduce the amount of printed output 
trace = FALSE, 
## Expand the number of iterations to find 
## parameter estimates.. 
maxit = 500, 
## and the number of parameters used by the model 
MaxNWts=5*(ncol(predictors) + 1) + 5 + 1)

nnetAvg <- avNNet(predictors, outcome, + size = 5, + decay = 0.01, 
## Specify how many models to average 
repeats = 5, 
linout = TRUE, 
## Reduce the amount of printed output 
trace = FALSE, 
## Expand the number of iterations to find 
## parameter estimates.. 
maxit = 500, 
## and the number of parameters used by the model 
MaxNWts=5*(ncol(predictors) + 1) + 5 + 1)

##预测
predict(nnetFit, newData) 
predict(nnetAvg, newData)

MARS

MARS的原理是利用代用的特征代替原始的决策变量
这个模型的最大特点是，对于决策变量，我们设置一个值作为分界点，例如：

这个例子以x=0作为分界点，当x≥0，取值为x，否则为0
因此我们往往写为h(x-a)和h(a-x):

也就是当MolWeight-5.94516≥0，那么保留前面一项；否则保留后面一项，这样就把数据集一分为二，非线性模型就可以转换为线性模型
此时，我们回归方程只与5.94516与其决策变量的差值有关
R：

marsFit <- earth(solTrainXtrans, solTrainY)

summary(marsFit)

 # Define the candidate models to test 
marsGrid <- expand.grid(.degree = 1:2, .nprune = 2:38) 
# Fix the seed so that the results can be reproduced 
set.seed(100) 
marsTuned <- train(solTrainXtrans, solTrainY, + method = "earth", 
# Explicitly declare the candidate models to test 
tuneGrid = marsGrid, 
trControl = trainControl(method = "cv"))

SVM

SVM常用于的是分类模型，那么运用到回归模型我们称为线性SVM：
回顾多元线性模型

SVR

接下来我们α和x代替β，最后一项号内的我们把它称为kernel function

image.png

这样的化就构造了关于u的线性模型，各项系数为α
可参考：https://www.cnblogs.com/kexinxin/p/9858496.html
简而言之，SVM的核心是找到一个超平面，使得各数据点到这个平面的距离和最小，从而达到分类的目的
在SVR中即找到一个超平面，使得有一定线性关系的点到这个超平面距离和最小，从而达到回归的目的

R：

svmFit <- ksvm(x = solTrainXtrans, y = solTrainY, + kernel ="rbfdot", kpar = "automatic", + C = 1, epsilon = 0.1)

svmRTuned <- train(solTrainXtrans, solTrainY, 
method = "svmRadial", 
preProc = c("center", "scale"), 
tuneLength = 14, + trControl = trainControl(method = "cv"))

K-Nearest Neighbors

这个方法的核心是找到附近K个点到某点距离和最小

R：

 # Remove a few sparse and unbalanced fingerprints first 
knnDescr <- solTrainXtrans[, -nearZeroVar(solTrainXtrans)] 
set.seed(100) 
knnTune <- train(knnDescr, solTrainY, 
method = "knn", 
# Center and scaling will occur for new predictions too 
preProc = c("center", "scale"), 
tuneGrid = data.frame(.k = 1:20), 
trControl = trainControl(method = "cv"))

我个人认为，如果数据点很多，并且有线性关系，那么可以用KNN减少一些数据点（用K个点最近的那个点表示这K个点），这样大大减少计算量

R与非线性回归模型

神经网络

MARS

SVM

K-Nearest Neighbors

你可能感兴趣的:(R与非线性回归模型)