R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)

预测数值型数据——回归方法(线性回归、回归树、模型树)

  • Example: Predicting Medical Expenses
    • Part 1: Linear Regression
      • Step 1: Exploring and preparing the data ----
      • Step 2: Training a model on the data ----
      • Step 3: Evaluating model performance ----
      • Step 4: Improving model performance ----
  • Example: Estimating Wine Quality ----
    • Part 2: Regression Trees and Model Trees
      • Step 1: Exploring and preparing the data ----
      • Step 2: Training a model on the data ----
      • Step 3: Evaluate model performance ----
      • Step 4: Improving model performance ----

Example: Predicting Medical Expenses

Part 1: Linear Regression

Step 1: Exploring and preparing the data ----

insurance <- read.csv("F:\\rwork\\Machine Learning with R (2nd Ed.)\\Chapter 06\\insurance.csv", stringsAsFactors = TRUE)
str(insurance)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第1张图片
summarize the charges variable

summary(insurance$expenses)

在这里插入图片描述
histogram of insurance charges

hist(insurance$expenses)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第2张图片

table of region

table(insurance$region)

在这里插入图片描述

exploring relationships among features: correlation matrix

cor(insurance[c("age", "bmi", "children", "expenses")])

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第3张图片

visualing relationships among features: scatterplot matrix

pairs(insurance[c("age", "bmi", "children", "expenses")])

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第4张图片

more informative scatterplot matrix

#install.packages('psych')
library(psych)
pairs.panels(insurance[c("age", "bmi", "children", "expenses")])

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第5张图片

Step 2: Training a model on the data ----

ins_model <- lm(expenses ~ ., data = insurance)

see the estimated beta coefficients

ins_model

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第6张图片

Step 3: Evaluating model performance ----

see more detail about the estimated beta coefficients

summary(ins_model)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第7张图片

Step 4: Improving model performance ----

add a higher-order “age” term

insurance$age2 <- insurance$age^2

add an indicator for BMI >= 30

insurance$bmi30 <- ifelse(insurance$bmi >= 30, 1, 0)

create final model

ins_model2 <- lm(expenses ~ age + age2 + children + bmi + sex +
                   bmi30*smoker + region, data = insurance)
                   
summary(ins_model2)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第8张图片

Example: Estimating Wine Quality ----

Part 2: Regression Trees and Model Trees

Step 1: Exploring and preparing the data ----

wine <- read.csv("F:\\rwork\\Machine Learning with R (2nd Ed.)\\Chapter 06\\whitewines.csv")

examine the wine data

str(wine)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第9张图片

the distribution of quality ratings

hist(wine$quality)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第10张图片
summary statistics of the wine data

summary(wine)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第11张图片

wine_train <- wine[1:3750, ]
wine_test <- wine[3751:4898, ]

Step 2: Training a model on the data ----

regression tree using rpart

library(rpart)
m.rpart <- rpart(quality ~ ., data = wine_train)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第12张图片

get basic information about the tree

m.rpart

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第13张图片

get more detailed information about the tree

summary(m.rpart)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第14张图片
R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第15张图片
R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第16张图片
R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第17张图片

use the rpart.plot package to create a visualization

#install.packages('rpart.plot')
library(rpart.plot)

a basic decision tree diagram

rpart.plot(m.rpart, digits = 3)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第18张图片

a few adjustments to the diagram

rpart.plot(m.rpart, digits = 4, fallen.leaves = TRUE, type = 3, extra = 101)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第19张图片

Step 3: Evaluate model performance ----

generate predictions for the testing dataset

p.rpart <- predict(m.rpart, wine_test)

compare the distribution of predicted values vs. actual values

summary(p.rpart)
summary(wine_test$quality)

compare the correlation

cor(p.rpart, wine_test$quality)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第20张图片
function to calculate the mean absolute error

MAE <- function(actual, predicted) {
  mean(abs(actual - predicted))  }

mean absolute error between predicted and actual values

MAE(p.rpart, wine_test$quality)

mean absolute error between actual values and mean value

mean(wine_train$quality) # result = 5.87
MAE(5.87, wine_test$quality)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第21张图片

Step 4: Improving model performance ----

train a M5’ Model Tree

library(RWeka)
m.m5p <- M5P(quality ~ ., data = wine_train)

display the tree

m.m5p

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第22张图片

​(信息过长,有163种情况,这里就不全部截图了)

get a summary of the model’s performance

summary(m.m5p)

generate predictions for the model

p.m5p <- predict(m.m5p, wine_test)

summary statistics about the predictions

summary(p.m5p)

correlation between the predicted and true values

cor(p.m5p, wine_test$quality)

mean absolute error of predicted and true values
(uses a custom function defined above)

MAE(wine_test$quality, p.m5p)

R语言实现 预测数值型数据——回归方法(线性回归、回归树、模型树)_第23张图片
​这里就得出了一个很奇怪的结论,因为一般来说使用先进的M5算法后模型的准确度会得到提高,即平均绝对误差(MAE)会减小,但是这次不减反增。大家可以自己选择数据试试,有问题留言哦~

你可能感兴趣的:(R语言——机器学习,机器学习)