逻辑回归,non-numeric argument to binary operator

调用stats 包的中的glm.fit函数做逻辑回归的时候遇到以下问题

一,加载数据

library(C50)
# load data
data(churn)
str(churnTrain)
 查看数据结构

   'data.frame': 3333 obs. of  20 variables:
 $ state                        : Factor w/ 51 levels "AK","AL","AR",..: 17 36 32 36 37 2 20 25 19 50 ...
 $ account_length               : int  128 107 137 84 75 118 121 147 117 141 ...
 $ area_code                    : Factor w/ 3 levels "area_code_408",..: 2 2 2 1 2 3 3 2 1 2 ...
 $ international_plan           : Factor w/ 2 levels "no","yes": 1 1 1 2 2 2 1 2 1 2 ...
 $ voice_mail_plan              : Factor w/ 2 levels "no","yes": 2 2 1 1 1 1 2 1 1 2 ...
 $ number_vmail_messages        : int  25 26 0 0 0 0 24 0 0 37 ...
 $ total_day_minutes            : num  265 162 243 299 167 ...
 $ total_day_calls              : int  110 123 114 71 113 98 88 79 97 84 ...
 $ total_day_charge             : num  45.1 27.5 41.4 50.9 28.3 ...
 $ total_eve_minutes            : num  197.4 195.5 121.2 61.9 148.3 ...
 $ total_eve_calls              : int  99 103 110 88 122 101 108 94 80 111 ...
 $ total_eve_charge             : num  16.78 16.62 10.3 5.26 12.61 ...
 $ total_night_minutes          : num  245 254 163 197 187 ...
 $ total_night_calls            : int  91 103 104 89 121 118 118 96 90 97 ...
 $ total_night_charge           : num  11.01 11.45 7.32 8.86 8.41 ...
 $ total_intl_minutes           : num  10 13.7 12.2 6.6 10.1 6.3 7.5 7.1 8.7 11.2 ...
 $ total_intl_calls             : int  3 3 5 7 3 6 7 6 4 5 ...
 $ total_intl_charge            : num  2.7 3.7 3.29 1.78 2.73 1.7 2.03 1.92 2.35 3.02 ...
 $ number_customer_service_calls: int  1 1 0 2 3 0 3 0 1 0 ...
 $ churn                        : Factor w/ 2 levels "yes","no": 2 2 2 2 2 2 2 2 2 2 ...

  划分训练集和预测集

churnTrain = churnTrain[ ,! names(churnTrain) %in% c("state","area_code", "account_length") ]
set.seed(2)
ind = sample(2, nrow(churnTrain), replace = T ,prob = c(0.7,0.3))
trainset = churnTrain[ind == 1,]
testset = churnTrain[ind == 2,]
trainset1 <- trainset[,-17]   # -c(1,2,17)
yc <- trainset$churn
   运行逻辑回归函数做分类

library(stats)

logicon <- glm.control(#epsilon = 1e-8, # positive convergence tolerance ε;
                       maxit = 500,     # integer giving the maximal number of IWLS iterations
                      # trace = FALSE   # logical indicating if output should be produced for each iteration.
                      )

logireg <- glm.fit(x=trainset1,y=yc,
                   #  weights = rep(1,length(yf)),
                   start = NULL,etastart = NULL,
                   mustart = NULL, 
                   #    offset = rep(0,length(yf)),
                   family = binomial(link = "logit"),
                   control = logicon,
                   #intercept = TRUE
                    )

 出现以下问题

Error in x[good, , drop = FALSE] * w : 
  non-numeric argument to binary operator

分析:

       出现这个问题,主要是数据集中的变量中包含了因子类型的数据

解决:

       删掉包含因子类型的自变量 (此数据集第一、二个自变量为因子类型)

trainset1 <- trainset[,-c(1,2,17)] 

或者用 glm 函数,则包含因子类型自变量的数据集不会导致该错误

fit <- glm(churn~.,data = trainset,family = binomial(link = "logit"))





你可能感兴趣的:(R-NOTE)