R语言基于逻辑回归模型做投资预测-正确率94%

基于逻辑回归模型,我们可以去预算市场未来的走势。

示例代码大约有94%的正确率。

要说的都在代码注释里面。

cat("\014")

# 载入示例股票

library(quantmod)
getSymbols("^DJI", src = "yahoo")
dji <- DJI[, "DJI.Close"]

# 生成技术指标

avg10 <- rollapply(dji, 10, mean)
avg20 <- rollapply(dji, 20, mean)
std10 <- rollapply(dji, 10, sd)
std20 <- rollapply(dji, 20, sd)
rsi5 <- RSI(dji, 5, "SMA")
rsi14 <- RSI(dji, 14, "SMA")
macd12269 <- MACD(dji, 12, 26, 9, "SMA")
macd7205 <- MACD(dji, 7, 20, 5, "SMA")
bbands <- BBands(dji, 20, "SMA", 2)

# 生成市场方向,收盘价与之后20天价格比较

direction <- NULL
direction[dji > Lag(dji, 20)] <- 1
direction[dji < Lag(dji, 20)] <- 0

# 合并结果
dji <-
  cbind(dji,
        avg10,
        avg20,
        std10,
        std20,
        rsi5,
        rsi14,
        macd12269,
        macd7205,
        bbands,
        direction)

dm <- dim(dji)
dm
colnames(dji)[dm[2]] <- "Direction"
colnames(dji)[dm[2]]

# 样本内is和样本外os

issd <- "2010-01-01"
ised <- "2014-12-31"
ossd <- "2015-01-01"
osed <- "2015-12-31"

isrow <- which(index(dji) >= issd & index(dji) <= ised)
osrow <- which(index(dji) >= ossd & index(dji) <= osed)

isdji <- dji[isrow,]
osdji <- dji[osrow,]

# 数据标准化转化

isme <- apply(isdji, 2, mean, na.rm = TRUE)
isstd <- apply(isdji, 2, sd, na.rm = TRUE)

isidn <- matrix(1, dim(isdji)[1], dim(isdji)[2])

norm_isdji <- (isdji - t(isme * t(isidn))) / t(isstd * t(isidn))

dm <- dim(isdji)
norm_isdji[, dm[2]] <- direction[isrow]

# 建模

formula <- as.formula("Direction ~ .")
model <- glm(formula, family = "binomial", data = norm_isdji)

summary(model)

pred <- predict(model, norm_isdji)

prob <- 1 / (1   exp(-pred))

# 拟合效果和概率值

# par(mfrow = c(2, 1))
# 还是这个问题:Error in plot.new() : figure margins too large
plot(pred, type = "l")
plot(prob, type = "l")

pred_direction <- NULL
pred_direction[prob > 0.5] <- 1
pred_direction[prob <= 0.5] <- 0

# 模型预测正确率

library(caret)
ismatrix <- confusionMatrix(as.factor(pred_direction),
                          as.factor(norm_isdji$Direction))

ismatrix

# 样本外数据测试泛化性能

osidn <- matrix(1, dim(osdji)[1], dim(osdji)[2])
norm_osdji <- (osdji - t(isme * t(osidn))) / t(isstd * t(osidn))
norm_osdji[, dm[2]] <- direction[osrow]

ospred <- predict(model, norm_osdji)
osprob <- 1 / (1   exp(-ospred))

ospred_direction <- NULL
ospred_direction[osprob > 0.5] <- 1
ospred_direction[osprob <= 0.5] <- 0

osmatrix <- confusionMatrix(as.factor(ospred_direction),
                            as.factor(norm_osdji$Direction))
osmatrix

结果

模型概况

> summary(model)

Call:
glm(formula = formula, family = "binomial", data = norm_isdji)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.0080  -0.0107   0.0366   0.1533   3.1790  

Coefficients: (3 not defined because of singularities)
              Estimate Std. Error z value Pr(>|z|)    
(Intercept)   1.658760   0.222691   7.449 9.43e-14 ***
DJI.Close    44.051359   8.409499   5.238 1.62e-07 ***
DJI.Close.1 -44.561952  17.549358  -2.539   0.0111 *  
DJI.Close.2   0.577137  17.620013   0.033   0.9739    
DJI.Close.3  -0.003556   0.291865  -0.012   0.9903    
DJI.Close.4  -0.264309   0.312768  -0.845   0.3981    
rsi           0.046117   0.339620   0.136   0.8920    
rsi.1        -2.306590   0.565594  -4.078 4.54e-05 ***
macd          2.562233   1.300929   1.970   0.0489 *  
signal        1.476838   0.610356   2.420   0.0155 *  
macd.1       -1.032963   0.798086  -1.294   0.1956    
signal.1      3.871052   1.635221   2.367   0.0179 *  
dn                  NA         NA      NA       NA    
mavg                NA         NA      NA       NA    
up                  NA         NA      NA       NA    
pctB          1.269642   0.521006   2.437   0.0148 *  
---
Signif. codes:  0***0.001**0.01*0.05.0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 1579.37  on 1257  degrees of freedom
Residual deviance:  348.17  on 1245  degrees of freedom
AIC: 374.17

Number of Fisher Scoring iterations: 8

模型拟合效果
R语言基于逻辑回归模型做投资预测-正确率94%_第1张图片
概率
R语言基于逻辑回归模型做投资预测-正确率94%_第2张图片

样本内数据正确率93.88%

> ismatrix
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 362  35
         1  42 819
                                          
               Accuracy : 0.9388          
                 95% CI : (0.9241, 0.9514)
    No Information Rate : 0.6789          
    P-Value [Acc > NIR] : <2e-16          
                                          
                  Kappa : 0.859           
                                          
 Mcnemar's Test P-Value : 0.4941          
                                          
            Sensitivity : 0.8960          
            Specificity : 0.9590          
         Pos Pred Value : 0.9118          
         Neg Pred Value : 0.9512          
             Prevalence : 0.3211          
         Detection Rate : 0.2878          
   Detection Prevalence : 0.3156          
      Balanced Accuracy : 0.9275          
                                          
       'Positive' Class : 0   

样本外数据正确率84.92%

> osmatrix
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0 115  26
         1  12  99
                                         
               Accuracy : 0.8492         
                 95% CI : (0.7989, 0.891)
    No Information Rate : 0.504          
    P-Value [Acc > NIR] : < 2e-16        
                                         
                  Kappa : 0.6981         
                                         
 Mcnemar's Test P-Value : 0.03496        
                                         
            Sensitivity : 0.9055         
            Specificity : 0.7920         
         Pos Pred Value : 0.8156         
         Neg Pred Value : 0.8919         
             Prevalence : 0.5040         
         Detection Rate : 0.4563         
   Detection Prevalence : 0.5595         
      Balanced Accuracy : 0.8488         
                                         
       'Positive' Class : 0  

感谢阅读,欢迎关注和留言

你可能感兴趣的:(R语言基于逻辑回归模型做投资预测-正确率94%)