Wen Yu

telco-customer-churn-prediction

Churn Prediction - Logistic Regression, Decision Tree and Random Forest

Data Overview

The data was downloaded from IBM Sample Data Sets for customer retention programs. The goal of this project is to predict behaviors of churn or not churn to help retain customers.
Each row represents a customer, each column contains a customer’s attribute.

Customers who left within the last month – the column is called Churn
Services that each customer has signed up for – phone, multiple lines, internet, online security, online backup, device protection, tech support, and streaming TV and movies
Customer account information – how long they’ve been a customer, contract, payment method, paperless billing, monthly charges, and total charges
Demographic info about customers – gender, age range, and if they have partners and dependents

Library

library(readr)
library(ggplot2)
library(dplyr)
library(tidyr)
library(corrplot)
library(caret)
library(rms)
library(MASS)
library(e1071)
library(ROCR)
library(gplots)
library(pROC)
library(rpart)
library(randomForest)
library(ggpubr)

Explore Data

WA_Fn_UseC_Telco_Customer_Churn <- read_csv("../input/WA_Fn-UseC_-Telco-Customer-Churn.csv")
telco <- WA_Fn_UseC_Telco_Customer_Churn
telco <- data.frame(telco)

str(telco)
summary(telco)

Observations with Missing Values

Based on the summary, there are 11 missing values in the TotalCharges column, which account for only 0.16% of the total
number of observations. So I remove those 11 rows with missing values.

telco <- telco[complete.cases(telco),]

Continuous Variables

For continuous variables, let’s check for distributions.

ggplot(data = telco, aes(MonthlyCharges, color = Churn))+
  geom_freqpoly(binwidth = 5, size = 1)

The number of current customers with MonthlyCharges below $25 is extremly high. For the customers with Monthlycharges greater than $30,
the distributions are similar between who churned and who did not churn.

ggplot(data = telco, aes(TotalCharges, color = Churn))+
  geom_freqpoly(binwidth = 200, size = 1)

The distribution of TotalCharges is highly positive skew for all customers no matter whether they churned or not.

ggplot(data = telco, aes(tenure, colour = Churn))+
  geom_freqpoly(binwidth = 5, size = 1)

The distributions for tenure are very different between customers who churned and who didn’t churn. For customers
who churned, the distribution is positve skew, which means customers who churned are more likely to cancel the service
in the first couple of months. For current customers who didn’t churn, there are two spikes. The second spike is much more
drastic than the first one, which means a large group of current customers have been using the service more than 5 years.

No obvious outliers for 3 numeric variables. Then let’s check for correlations.

telco %>%
  dplyr::select (TotalCharges, MonthlyCharges, tenure) %>%
  cor() %>%
  corrplot.mixed(upper = "circle", tl.col = "black", number.cex = 0.7)

The plot shows high correlations between Totalcharges & tenure and between TotalCharges & MonthlyCharges.
Pay attention to these variables while training models later. Multicollinearity does not
reduce the predictive power or reliability of the model as a whole, at least within the sample data set.
But it affects calculations regarding individual predictors.

The tenure represents time period in months. To better find patterns with time, I change it to a factor with 5
levels, with each level represents a bin of tenure in years.

telco %>%
  mutate(tenure_year = case_when(tenure <= 12 ~ "0-1 year",
                                 tenure > 12 & tenure <= 24 ~ "1-2 years",
                                 tenure > 24 & tenure <= 36 ~ "2-3 years",
                                 tenure > 36 & tenure <= 48 ~ "3-4 years",
                                 tenure > 48 & tenure <= 60 ~ "4-5 years",
                                 tenure > 60 & tenure <= 72 ~ "5-6 years")) -> telco
telco$tenure <-NULL
table(telco$tenure_year)

Categorical Variables

I found that there is a column called Phone Service. And in the MultipleLines, some rows have the value of “No Phone Service”.
Are they related?

table(telco[, c("PhoneService","MultipleLines")])

When the value of Phone Service is “No”, the value of Multiplelines shows “No Phone Service.” The"No Phone Service"
value in the Multiplelines column actually does not have any predicting power.

The same problem appeared between Internet Service and Online Security, OnlineBackup, DeviceProtection, TechSupport,
StreamingTV and StreamingMovies. When the value of Internet Service is “No”, the values of the following 6 columns show “No Internet Service.”

table(telco[, c("InternetService", "OnlineSecurity")])
table(telco[, c("InternetService", "OnlineBackup")])
table(telco[, c("InternetService", "DeviceProtection")])
table(telco[, c("InternetService", "TechSupport")])
table(telco[, c("InternetService", "StreamingTV")])
table(telco[, c("InternetService", "StreamingMovies")])

I will address this problem later in the data preparation. Now I will check the distributions of churn by the levels of yes or no
for the above 7 variables. I will remove the rows with “No phone service” and “No internet service” in the plot.

telco %>%
  mutate(SeniorCitizen = ifelse(SeniorCitizen == 0, "No", "Yes")) -> categorical

categorical %>%
  dplyr::select(gender:Dependents, PhoneService:PaymentMethod, Churn) -> categorical 

categorical %>%
  dplyr::select(MultipleLines, OnlineSecurity:StreamingMovies, Churn) %>%
  filter(MultipleLines != "No phone service" &
           OnlineSecurity != "No internet service") -> c2
           
gather(c2, columns, value, -Churn) -> c3

ggplot(c3)+
  geom_bar(aes(x = value, fill = Churn), position = "fill", stat = "count")+
  facet_wrap(~columns)+ 
  xlab("Attributes")

The customers who subscribe the service of DeviceProtection, OnlineBackup, OnlineSecurity and TechSupport have lower
churn rate compared to the customers who don’t. However, the churn rates do not have big difference between customers
who have the service of MultipleLines, StreamingMovies and StreamingTV or not.

categorical %>%
dplyr::select(Contract:Churn) -> c4

ggplot(c4) +
 geom_bar(aes(x = Contract, fill = Churn), position = "fill", stat = "count", 
          show.legend = F) -> p7

ggplot(c4) +
 geom_bar(aes(x = PaperlessBilling, fill = Churn), position = "fill", stat = "count", 
          show.legend = T) -> p8

ggplot(c4) +
 geom_bar(aes(x = PaymentMethod, fill = Churn), position = "fill", stat = "count", 
          show.legend = F) +
 scale_x_discrete(labels = c("Bank transfer", "Credit card", "Electronic check", "Mail check"))+
 theme(axis.text= element_text(size=7)) -> p9

ggarrange(p7,p8,p9, ncol = 2, nrow = 2)

The customers who sign longer contract have lower churn rate (Two year < One year < Month-to-month).
The customers who choose paperlessbilling have higher churn rate.
The customers who pay with electronic check have higher churn rate than customers who pay with other methods.

Lastly, I will check if churn rates are different among the attributes about customers’ basic information.

categorical %>%
 dplyr::select(gender:Dependents, PhoneService, InternetService, Churn) %>%
 mutate(Gender_male = ifelse(gender =="Male", "Yes", "No")) -> c1 

c1$gender <- NULL

ggplot(c1) +
 geom_bar(aes(x = Gender_male, fill = Churn), position = "fill", stat = "count", 
          show.legend = F) -> p1
ggplot(c1) +
 geom_bar(aes(x = SeniorCitizen, fill = Churn), position = "fill", stat = "count", 
          show.legend = F) -> p2
ggplot(c1) +
 geom_bar(aes(x = Partner, fill = Churn), position = "fill", stat = "count", 
          show.legend = F) -> p3    
ggplot(c1) +
 geom_bar(aes(x = Dependents, fill = Churn), position = "fill", stat = "count", 
          show.legend = F) -> p4  
ggplot(c1) +
 geom_bar(aes(x = PhoneService, fill = Churn), position = "fill", stat = "count", 
          show.legend = F) -> p5
ggplot(c1) +
 geom_bar(aes(x = InternetService, fill = Churn), position = "fill", stat = "count", 
          show.legend = F) -> p6
          
ggarrange(p1,p2,p3,p4,p5,p6, ncol = 3, nrow = 2)

The churn rates are not changed by genders and phone service.
The senior customers have higher churn rate.
The customers who have partners or dependents have lower churn rate.

Check Churn Rate for the full dataset

telco %>%
  summarise(Total = n(), n_Churn = sum(Churn == "Yes"), p_Churn = n_Churn/Total)

There are 26.6% of customers churn.

Logistic Regression Model

Data Preparation

To prepare the data for logistic regression, I modify binomial charactors to (0,1) and change the SeniorCitizen column from int to num.

telco_lr <- telco

telco_lr %>%
  mutate(Churn = ifelse(Churn == "Yes", 1, 0)) -> telco_lr
telco_lr %>%
  mutate(gender = ifelse(gender == "Female", 1, 0)) -> telco_lr
telco_lr %>%
  mutate(Partner = ifelse(Partner == "Yes", 1, 0)) -> telco_lr
telco_lr %>%
  mutate(PhoneService = ifelse(PhoneService == "Yes", 1, 0)) -> telco_lr
telco_lr %>%
  mutate(Dependents = ifelse(Dependents == "Yes", 1, 0)) -> telco_lr
telco_lr %>%
  mutate(PaperlessBilling = ifelse(PaperlessBilling == "Yes", 1, 0)) -> telco_lr

I delete the customerID and make one-hot coding to create dummy variables for all charactor variables.

telco_lr$customerID <- NULL
dmy <- dummyVars(" ~ .", data = telco_lr)
dmy <- data.frame(predict(dmy, newdata = telco_lr))
str(dmy)

Then, I remove the variables with “No Phone Service” because they don’t have any predicting power

dmy$MultipleLinesNo.phone.service <- NULL
dmy$OnlineSecurityNo.internet.service <- NULL
dmy$OnlineBackupNo.internet.service <- NULL
dmy$DeviceProtectionNo.internet.service <- NULL
dmy$TechSupportNo.internet.service <- NULL
dmy$StreamingTVNo.internet.service <- NULL
dmy$StreamingMoviesNo.internet.service <- NULL

Finally, I remove the last level of each factor to avoid singularities.

dmy$ContractTwo.year <- NULL
dmy$InternetServiceNo <- NULL
dmy$PaymentMethodMailed.check <- NULL
dmy$tenure_year5.6.years <- NULL

Check the final data set.

str(dmy)

Split the data into traning and test sets (75% vs 25%)

set.seed(818)
assignment <- sample(0:1, size= nrow(dmy), prob = c(0.75,0.25), replace = TRUE)
train <- dmy[assignment == 0, ]
test <- dmy[assignment == 1, ]

Double check if the churn rates of two sets are close.

For the Training Set:

train %>%
  summarise(Total = n(), n_Churn = sum(Churn == 1), p_Churn = n_Churn/Total)

For the Test Set:

test %>%
  summarise(Total = n(), n_Churn = sum(Churn == 1), p_Churn = n_Churn/Total)

Now, the data is ready for training logistic regression models!

Train Models

I will first use all columns to build the model1.

model1 <- glm(Churn ~., family = "binomial", data = train)
summary(model1)

Notice there are 6 NAs in the model’s summary for MultipleLinesYes, OnlineSecurityYes, OnlineBackupYes,
DeviceProtectionYes, TechSupportYes, StreamingTVYes, StreamingMoviesYes. That’s because I remove the “xxx.No Phone Service”
or “xxx.No Internet Service” of them when processing dummy variables. Only two values of “xxx.yes” and “xxx.no” are left with obsolutely
multicollinearities between them. This problem will be address during the following variable selection.

I use AIC to exclude variables based on their significance and create model2.

model2 <- stepAIC(model1, trace = 0)
summary(model2)

Use VIF function to check multicollinearity

vif(model2)

The VIFs for MonthlyCharges, InternetServiceDSL and InternetserviceFiber.optic are very high due to multicollinearity.
Since TotalCharges has high correlation with MonthlyCharges and tenure (see the correlation plot above), I will remove
the TotalCharges variable . The InternetserviceFiber.optic will also be removed from model3.

model3 <- glm(formula = Churn ~  SeniorCitizen + Dependents + PhoneService + MultipleLinesNo + InternetServiceDSL + OnlineBackupNo +
DeviceProtectionNo + StreamingTVNo + StreamingMoviesNo + ContractMonth.to.month + ContractOne.year + 
PaperlessBilling + PaymentMethodElectronic.check + MonthlyCharges + tenure_year0.1.year + tenure_year1.2.years,
family = "binomial", data = train)

Then, check the model3 and its VIFs.

summary(model3)
vif(model3)

Now all VIFs are fine below 5. but the p-values for StreamingTVNo and StreamingMoviesNo are still very high.
So I remove these two variables and create model 4.

model4 <- glm(formula = Churn ~  SeniorCitizen + Dependents + PhoneService + MultipleLinesNo + InternetServiceDSL + OnlineBackupNo +
DeviceProtectionNo + ContractMonth.to.month + ContractOne.year + 
PaperlessBilling + PaymentMethodElectronic.check + MonthlyCharges + tenure_year0.1.year + tenure_year1.2.years,
family = "binomial", data = train)

Check the model4 and its VIFs

summary(model4)
vif(model4)

Model4 looks good! It is used as my final model to predict churn on train and test set.

Cross Validation (Confusion Matrix & ROC)

model_logit <- model4
predict(model_logit, data = train, type = "response") -> train_prob
predict(model_logit, newdata = test, type = "response") -> test_prob

Set the threshold as 0.5 by default.

train_pred <- factor(ifelse(train_prob >= 0.5, "Yes", "No"))
train_actual <- factor(ifelse(train$Churn == 1, "Yes", "No"))
test_pred <- factor(ifelse(test_prob >= 0.5, "Yes", "No"))
test_actual <- factor(ifelse(test$Churn == 1, "Yes", "No"))

For the Training Set:

confusionMatrix(data = train_pred, reference = train_actual)
roc <- roc(train$Churn, train_prob, plot= TRUE, print.auc=TRUE)

For the Test Set:

confusionMatrix(data = test_pred, reference = test_actual)
roc <- roc(test$Churn, test_prob, plot= TRUE, print.auc=TRUE)

For the training set, the accuracy is 0.80 and the AUC is 0.85. For the test set, the accuracy is 0.79 and the AUC is 0.82.
It’s a good model because the accuracy and AUC do not have big difference between the training and test sets.
But the Specificities for two sets are as low as 0.46.

In real case, we can adjust the threshold based on the cost of TN, FN, FP or TP to reduce cost or loss. But here, I just tend
to find the optimal threshold (or cutoff) point that maximises the specificity (TN rate) and sensitivity (TP rate).

Find the optimal cutoff and adjust the class of prediction

pred <- prediction(train_prob, train_actual)
perf <- performance(pred, "spec", "sens")

cutoffs <- data.frame([email protected][[1]], [email protected][[1]], 
                      sensitivity= [email protected][[1]])

opt_cutoff <- cutoffs[which.min(abs(cutoffs$specificity-cutoffs$sensitivity)),]
opt_cutoff

ggplot(data = cutoffs) +
  geom_line(aes(x = cut, y = specificity, color ="red"), size = 1.5)+
  geom_line(aes(x = cut, y = sensitivity, color = "blue"), size = 1.5) +
  labs(x = "cutoff", y ="value") +
  scale_color_discrete(name = "", labels = c("Specificity", "Sensitivity"))+
  geom_vline(aes(xintercept = opt_cutoff$cut))+
  geom_text(aes(x= 0.55, y= 0.75),label="opt_cutoff = 0.3",hjust=1, size=4)

The optimal cutoff is 0.3. So I use it as the threshold to predict churn on training and test sets.

Prediction on training set with threshold = 0.3:

train_pred_c <- factor(ifelse(train_prob >= 0.3, "Yes", "No"))
confusionMatrix(data = train_pred_c, reference = train_actual)

Prediction on test set with threshold = 0.3:

predict(model_logit, newdata = test, type = "response") -> test_prob
test_pred_c <- factor(ifelse(test_prob >= 0.3, "Yes", "No"))
confusionMatrix(data = test_pred_c, reference = test_actual)

For the training set, the Accuracy is 0.76, and the Sensitivity and Specificity are both about 0.76.
For the test set, the Accuracy is 0.74, and the Sensitivity and Specificity are 0.74 and 0.73 respectively.
Overall, this model with adjusted cutoff works well.

Summary for Logistic Regression Model

The final Logistic Regression Model (with threshold = 0.5) has Accuracy of 0.79 and the AUC is 0.82. Based on the P values
for variables, PhoneService, InternetServiceDSL, OnlineBackup, Contract, PaperleslsBilling, PaymentMethodElectronic.check,
MonthlyCharges, tenure in 0-1 year and 1-2 years have more significant influence on predicting churn.

Decision Tree

Data Preparation

Decision tree models can handle categorical variables without one-hot encoding them, and one-hot encoding will degrade
tree-model performance. Thus, I will re-prepare the data for decision tree and random forest models. I keep the “telco” data
before I do logistic regression and change the charactor variables to factors. Here’s the final dataset I use for training
classification tree models.

telcotree <- telco
telcotree$customerID <- NULL
telcotree %>%
  mutate_if(is.character, as.factor) -> telcotree
str(telcotree)

Split the data into training and test sets.

set.seed(818)
tree <- sample(0:1, size= nrow(telcotree), prob = c(0.75,0.25), replace = TRUE)
traintree <- telcotree[tree == 0, ]
testtree <- telcotree[tree == 1, ]

Train Model1

First of all, I use all variables to build the model_tree1.

model_tree1 <- rpart(formula = Churn ~., data = traintree, 
                     method = "class", parms = list(split = "gini"))

Cross Validation (Confusion Matrix and AUC) for modeltree1

predict(model_tree1, data = traintree, type = "class") -> traintree_pred1
predict(model_tree1, data = traintree, type = "prob") -> traintree_prob1
predict(model_tree1, newdata= testtree, type = "class") -> testtree_pred1
predict(model_tree1, newdata = testtree, type = "prob") -> testtree_prob1

For the Training Set

confusionMatrix(data = traintree_pred1, reference = traintree$Churn)
traintree_actual <- ifelse(traintree$Churn == "Yes", 1,0)
roc <- roc(traintree_actual, traintree_prob1[,2], plot= TRUE, print.auc=TRUE)

For the Test Set:

confusionMatrix(data = testtree_pred1, reference = testtree$Churn)
testtree_actual <- ifelse(testtree$Churn == "Yes", 1,0)
roc <- roc(testtree_actual, testtree_prob1[,2], plot = TRUE, print.auc = TRUE)

For the training set, the Accuracy is 0.79 and the AUC is 0.800. For the test set, the Accuracy is 0.78 and the AUC is 0.78.

Train Model2

Remember that Totalcharges, MonthlyCharges and tenure are highly correlated, which may effect the performance of the
decision tree models. So I remove the TotalCharges column to train the second model.

model_tree2 <- rpart(formula = Churn ~ gender + SeniorCitizen + Partner + Dependents + PhoneService + 
                       MultipleLines + InternetService + OnlineSecurity + TechSupport +
                       OnlineBackup + DeviceProtection + StreamingTV + StreamingMovies + 
                       Contract + PaperlessBilling + tenure_year +
                       PaymentMethod + MonthlyCharges, data = traintree, 
                       method = "class", parms = list(split = "gini"))

Cross Validation for modeltree2

predict(model_tree2, data = traintree, type = "class") -> traintree_pred2
predict(model_tree2, data = traintree, type = "prob") -> traintree_prob2
predict(model_tree2, newdata= testtree, type = "class") -> testtree_pred2
predict(model_tree2, newdata = testtree, type = "prob") -> testtree_prob2

For the Training Set:

confusionMatrix(data = traintree_pred2, reference = traintree$Churn)
traintree_actual <- ifelse(traintree$Churn == "Yes", 1,0)
roc <- roc(traintree_actual, traintree_prob2[,2], plot= TRUE, print.auc=TRUE)

For the Test Set:

testtree_actual <- ifelse(testtree$Churn == "Yes", 1,0)
confusionMatrix(data = testtree_pred2, reference = testtree$Churn)
roc <- roc(testtree_actual, testtree_prob2[,2], plot = TRUE, print.auc = TRUE)

For the training set, the Accuracy is 0.80 and the AUC is 0.80. For the test set, the Accuracy is 0.78 and the AUC is 0.78.
Compared to the performance of the first model, the performance of the second model is just a little bit better. So I will
use model 2 as the final classification tree model. There is still a problem that the Specificity is too low.
But since I don’t have the real conditions about costs for this case, I don’t do cutoff optimization here for tree models.

Summary for Decision Tree Model

The final decision tree model has Accuracy of 0.78 and AUC of 0.78 for the test set. It does not perform as good as the logistic
regression model.

Random Forest

Data Preparation

I use the same data prepared for Classification Tree models.

Train Model

set.seed(802)
modelrf1 <- randomForest(formula = Churn ~., data = traintree)
print(modelrf1)

Cross Validation for modelrf1

predict(modelrf1, traintree, type = "class") -> trainrf_pred
predict(modelrf1, traintree, type = "prob") -> trainrf_prob
predict(modelrf1, newdata = testtree, type = "class") -> testrf_pred
predict(modelrf1, newdata = testtree, type = "prob") -> testrf_prob

For the Training Set:

confusionMatrix(data = trainrf_pred, reference = traintree$Churn)
trainrf_actual <- ifelse(traintree$Churn == "Yes", 1,0)
roc <- roc(trainrf_actual, trainrf_prob[,2], plot= TRUE, print.auc=TRUE)

For the Test Set:

confusionMatrix(data = testrf_pred, reference = testtree$Churn)
testrf_actual <- ifelse(testtree$Churn == "Yes", 1,0)
roc <- roc(testrf_actual, testrf_prob[,2], plot = TRUE, print.auc = TRUE)

For the training set, the Accuracy is 0.97 and the AUC is almost 1. For the test set, the Accuracy is 0.79 and the AUC is 0.82.

Tunning

Tunning mtry with tuneRF

set.seed(818)
modelrf2 <- tuneRF(x = subset(traintree, select = -Churn), y = traintree$Churn, ntreeTry = 500, doBest = TRUE)
print(modelrf2)

When mtry = 2, OOB decreases from 20.11% to 19.67%

Grid Search based on OOB error

I first establish a list of possible values for mtry, nodesize and sampsize.

mtry <- seq(2, ncol(traintree) * 0.8, 2)
nodesize <- seq(3, 8, 2)
sampsize <- nrow(traintree) * c(0.7, 0.8)
hyper_grid <- expand.grid(mtry = mtry, nodesize = nodesize, sampsize = sampsize)

Then, I create a loop to find the combination with the optimal oob err.

oob_err <- c()
for (i in 1:nrow(hyper_grid)) {
  model <- randomForest(formula = Churn ~ ., 
                        data = traintree,
                        mtry = hyper_grid$mtry[i],
                        nodesize = hyper_grid$nodesize[i],
                        sampsize = hyper_grid$sampsize[i])
  oob_err[i] <- model$err.rate[nrow(model$err.rate), "OOB"]
  }

opt_i <- which.min(oob_err)
print(hyper_grid[opt_i,])

The optimal hyperparameters are mtry = 2, nodesize = 7, sampsize = 3658.2

Train model 2 with optimal hyperparameters.

set.seed(802)
modelrf3 <- randomForest(formula = Churn ~., data = traintree, mtry = 2, nodesize = 7, sampsize = 3658.2)
print(modelrf3)

OOB of modelrf3 decreases a little bit to 19.79% with the optimal combination. The OOB of modelrf2 is 19.67%.
So I will use modelrf2 as the final random forest model.

Cross Validation for modelrf2

predict(modelrf2, traintree, type = "class") -> trainrf_pred2
predict(modelrf2, traintree, type = "prob") -> trainrf_prob2
predict(modelrf2, newdata = testtree, type = "class") -> testrf_pred2
predict(modelrf2, newdata = testtree, type = "prob") -> testrf_prob2

For the Training Set:

confusionMatrix(data = trainrf_pred2, reference = traintree$Churn)
trainrf_actual <- ifelse(traintree$Churn == "Yes", 1,0)
roc <- roc(trainrf_actual, trainrf_prob2[,2], plot= TRUE, print.auc=TRUE)

For the Test Set:

confusionMatrix(data = testrf_pred2, reference = testtree$Churn)
testrf_actual <- ifelse(testtree$Churn == "Yes", 1,0)
roc <- roc(testrf_actual, testrf_prob2[,2], plot = TRUE, print.auc = TRUE)

For the training set, the Accuracy is 0.88 and AUC is 0.95. For the test set, the Accuracy is 0.79 and the AUC is 0.82.
Compared to the performance of the first model, which Accuracy = 0.97, AUC = 0.995 for the training set, and Accuracy = 0.79,
AUC = 0.82 for the test set. The second model works a little better.

Variable Importance

varImpPlot(modelrf2,type=2)

Summary for Random Forest Model

The final random forest model has the Accuracy of 0.79 and AUC of 0.82 for the test set.
According to the Variable Importance plot, TotalCharges, MonthlyCharges, Tenure_year and Contract are the top 4 most important
variables to predict churn. The PhoneSerivce, Gender, SeniorCitizen, Dependents, Partner, MultipleLines, PaperlessBilling, StreamingTV,
Movies, DeviceProtection and OnlineBackup have very small effect on Churn.

Comparison of ROC and AUC for Logistic Regression, Decision Tree and Random Forest models

preds_list <- list(test_prob, testtree_prob2[,2], testrf_prob2[,2])
m <- length(preds_list)
actuals_list <- rep(list(testtree$Churn), m)

pred <- prediction(preds_list, actuals_list)
rocs <- performance(pred, "tpr", "fpr")
plot(rocs, col = as.list(1:m), main = "Test Set ROC Curves for 3 Models")
legend(x = "bottomright",
       legend = c("Logistic Regression", "Decision Tree", "Random Froest"),
       fill = 1:m)

The logistic regression model and random forest model work better than the decision tree model. The Accuracies are 0.78 for
Logistic Regression, 0.78 for Decision Tree and 0.79 for Random Forest, with 0.5 as threshold.

Regarding the variance importance, the logistic regression model and the random forest model have little differences.
They both have MonthlyCharges, tenure, Contract and PaymentMethod as important predictors and have gender, StreamingTV, Movies and
Partner as unimportant predictors. However, in the logistic regression model, PaperlessBilling, PhoneService and OnlineBackup
show significant influence on the churn, while in the randomforest model, they have very small predicting power.

你可能感兴趣的:(R)

android系统selinux中添加新属性property 辉色投像
1.定位/android/system/sepolicy/private/property_contexts声明属性开头：persist.charge声明属性类型：u:object_r:system_prop:s0图12.定位到android/system/sepolicy/public/domain.te删除neverallow{domain-init}default_prop:property
用Python实现读取统计单词个数程序媛了了 python 游戏 java
完整实例代码：fromcollectionsimportCounterdefpythonit():danci={}withopen("pythonit.txt","r",encoding="utf-8")asf:foriinf:words=i.strip().split()forwordinwords:ifwordnotindanci:danci[word]=1else:danci[word]+=
光盘文件系统 (iso9660) 格式解析穷人小水滴光盘文件系统 iso9660 deno GNU/Linux javascript
越简单的系统,越可靠,越不容易出问题.光盘文件系统(iso9660)十分简单,只需不到200行代码,即可实现定位读取其中的文件.参考资料:https://wiki.osdev.org/ISO_9660相关文章:《光盘防水嘛?DVD+R刻录光盘泡水实验》https://blog.csdn.net/secext2022/article/details/140583910《光驱的内部结构及日常使用》ht
《跃迁》5/7-5组-橙子-张静12.16 静言物于
【便签5】【片段来源】《跃迁：成为高手的技术》第四章【R原文】一位客户咨询时抱怨：“这个我做不到。”我问他：“如果我请你现在出去裸奔，你能做到吗？”“这个我也做不到”“其实并不是做不到，而是不愿意做，或者不想承担裸奔的代价吧。你不是做不到，而是选择不去做。如果有一天你裸奔能救自己家人、孩子，也许就能做到了。”为什么要做这个区分？如果一个人经常和自己说“做不到”，他的能力范围会越来越小，会成为一个无
✔2848. 与车相交的点程序员小小聪力扣 leetcode
代码实现：方法一：哈希表#definefmax(a,b)((a)>(b)?(a):(b))intnumberOfPoints(int**nums,intnumsSize,int*numsColSize){inthash[101]={0};intmax=0;for(inti=0;i=x){j--;}if(i=nums[i][0]){r=r>nums[i][1]?r:nums[i][1];}else{
【加密算法基础——RSA 加密】 XWWW668899 网络服务器笔记 python
RSA加密RSA（Rivest-Shamir-Adleman）加密是非对称加密，一种广泛使用的公钥加密算法，主要用于安全数据传输。公钥用于加密，私钥用于解密。RSA加密算法的名称来源于其三位发明者的姓氏：R:RonRivestS:AdiShamirA:LeonardAdleman这三位计算机科学家在1977年共同提出了这一算法，并发表了相关论文。他们的工作为公钥加密的基础奠定了重要基础，使得安全通
Acwing 区间合并 Curry_Math 算法学习算法 c++开发语言
区间合并主要思想：给定很多区间。若两个区间有交集，将二者合并成一个区间。具体做法:先按照区间的左端点进行排序然后遍历每个区间，根据不同的情况进行合并，有一下几种情况：第一种情况，区间不变；第二种情况，end更新为区间i的右端点；以上两种情况，可以归结为end更新为max（end，r）;r为区间右端点第三种情况，将当前维护的区间加入结果，并将维护的区间更新为区间i；下面给出区间合并的板子：//区间合
Android shell 常用 debug 命令晨春计 Audio debug android linux
目录1、查看版本2、am命令3、pm命令4、dumpsys命令5、sed命令6、log定位查看APK进程号7、log定位使用场景1、查看版本1.1、Android串口终端执行getpropro.build.version.release#获取Android版本uname-a#查看linux内核版本信息uname-r#单独查看内核版本1.2、linux服务器执行lsb_release-a#查看Lin
Windows安装ciphey编码工具，附一道ciscn编码题例 im-Miclelson CTF工具网络安全
TA是什么一款智能化的编码分析解码工具，对于CTF中复杂性编码类题目可以快速攻破。编码自动分析解码的神器。如何安装Windows环境Python3.864位（最新的版本不兼容，32位的也不行）PIP直接安装pipinstallciphey-ihttps://pypi.mirrors.ustc.edu.cn/simple/安装后若是出现报错请根据错误代码行数找到对应文件，r修改成rb即可。使用标准语
linux简单安装gcc和gdb chn-zgq Linux linux ubuntu
linux安装gcc以及环境配置和gdb安装gcc-10.0添加源:sudoadd-apt-repositoryppa:ubuntu-toolchain-r/ppa更新源:sudoaptupdate下载gcc:sudoaptinstallgcc-10g++-10默认GCC版本设置为gcc-10.0:sudoupdate-alternatives--install/usr/bin/gccgcc/us
梧桐数据库（WuTongDB）：数据库技术中都有哪些常见的优化器鲁鲁517 梧桐数据库梧桐数据库
以下是一些常见的数据库优化器：1.CBO（Cost-BasedOptimizer）应用场景：广泛应用于关系型数据库中，如Oracle、PostgreSQL、MySQL等。工作原理：通过计算不同执行计划的代价（如CPU、I/O等资源消耗），选择最低代价的执行计划。代表数据库：Oracle、PostgreSQL、MySQL。特点：CBO使用统计信息（如表大小、索引分布）来评估查询的代价。2.RBO（R
【机器人建模和控制】读书笔记 Piccab0o 机器人
机器人建模和控制——马克·斯庞A.x10=x1∙x0x^0_1=x_1\bulletx_0x10=x1∙x0，其实就是：1）x1x_1x1轴向量在O0O_0O0系下的坐标2）在x0x_0x0轴上的投影3）坐标变换矩阵的R10R_1^0R10的第一个元素B.点p在o1x1y1z1o_1x_1y_1z_1o1x1y1z1系下的坐标p1p^1p1可以表示为：p=ux1+vy1+wz1p=ux_1+vy_
Python和R均方根误差平均绝对误差算法模型亚图跨际 Python 交叉知识 R 回归模型误差指标归一化均方根误差生态状态指标神经网络成本误差气体排放气候模型多项式拟合
要点回归模型误差评估指标归一化均方根误差生态状态指标神经网络成本误差计算气体排放气候算法模型Python误差指标均方根误差和平均绝对误差均方根偏差或均方根误差是两个密切相关且经常使用的度量值之一，用于衡量真实值或预测值与观测值或估计值之间的差异。估计器θ^\hat{\theta}θ^相对于估计参数θ\thetaθ的RMSD定义为均方误差的平方根：RMSD⁡(θ^)=MSE⁡(θ^)=E((θ^−θ
PCIe进阶之TL：Common Packet Header Fields & TLPs with Data Payloads Rules 芯芯之火，可以燎原 PCIe进阶 PCIe进阶硬件工程信息与通信
1TransactionLayerProtocol-PacketDefinitionTLP有四种事务类型：Memory、I/O、Configuration和Messages，两种地址格式：32bit和64bit。构成TLP时，所有标记为Reserved的字段（有时缩写为R）都必须全为0。接收者Rx必须忽略此字段中的值，PCIeSwitch必须对其进行原封不动的转发。请注意，对于某些字段，既有指定值
python下载pandas库镜像_下载pandas库 weixin_39791152
背景交代：在下载matplotlib库时，我已经将pip的下载源手动更改为清华的镜像，所以，如果有小伙伴在下载库遇到问题，如timeout，请先将下载源改为国内镜像，具体操作见我的另一篇文章：今天的主题是安装pandas库~首先，按田字格+R，打开cmd，输入：pipinstallpandas嗯，不出所料地报错了……主要原因：pip._vendor.urllib3.exceptions.ReadT
FlexibleBI系统是现代制造企业提升生产质量和效率的重要工具三坐标CMM质量数据系统制造
SPC（统计过程控制）系统是现代制造企业提升生产质量和效率的重要工具。我们的SPC系统通过一键生成全面的SPC分析报告，帮助企业快速、精准地完成质量分析，并大大减少了手动处理数据的复杂性。FlexibleBI实时更新的控制图在生产过程中，控制图可以实时自动更新，确保企业能够随时掌握生产状态，及时发现并处理潜在问题。系统支持多种标准SPC控制图，如X-bar、R、P等图表，全面覆盖所有常见生产场景。
ResNet的半监督和半弱监督模型 Valar_Morghulis
Billion-scalesemi-supervisedlearningforimageclassificationhttps://arxiv.org/pdf/1905.00546.pdfhttps://github.com/facebookresearch/semi-supervised-ImageNet1K-models/权重在timm中也有：https://hub.fastgit.org/r
node初奶瓶SAMA
www.nodejs.org下载nodejs的安装文件,然后就直接下一步，下一步，下一步傻瓜式安装（打开命令符widow+r输入cmd）node-v查单当前node的版本号安装nodejs时，会自动安装npm包管理工具npm-v查看npm的版本可以直接在黑窗口中输入node然后点击回车以后，就可以输入javascripnt的代码了既然在浏览器鼠标右键中console和在黑窗口中输入node点击回车
ros2中使用launch.xml启动时，怎么在命令行里设置参数，或者加载参数文件（params.yaml） code . Autoware 自动驾驶 ROS2 xml Ros2 自动驾驶机器人
在ROS2中使用launch.xml启动时，可以通过命令行设置参数或加载参数文件（如params.yaml）。以下是具体的方法：1.在命令行中设置参数你可以在运行ros2launch命令时直接设置参数，使用key:=value的语法。例如：ros2launchparam_name:=param_value例如，如果你有一个参数background_r，你可以这样设置：ros2launchmy_pa
【机器学习与R语言】1-机器学习简介苹果酱0567 面试题汇总与解析 java 中间件开发语言 spring boot 后端
1.基本概念机器学习：发明算法将数据转化为智能行为数据挖掘VS机器学习：前者侧重寻找有价值的信息，后者侧重执行已知的任务。后者是前者的先期准备过程：数据——>抽象化——>一般化。或者：收集数据——推理数据——归纳数据——发现规律抽象化：训练：用一个特定模型来拟合数据集的过程用方程来拟合观测的数据：观测现象——数据呈现——模型建立。通过不同的格式来把信息概念化一般化：一般化：将抽象化的知识转换成可用
商业预测初识R hongyanwin r语言预测
1.打开帮助文档首页，查阅其中的“IntroductiontoR”helpRhelp2.安装vcd包install.packages("vcd")3.列出此包中可用的函数和数据集ls("package:vcd")/data(package="vcd")4.载入包并阅读数据集Arthritis的描述library("v.d")/?Arthritis5.显示数据集Arthritis的内容查看数据集结构
【NLP5-RNN模型、LSTM模型和GRU模型】一蓑烟雨紫洛 nlp rnn lstm gru nlp
RNN模型、LSTM模型和GRU模型1、什么是RNN模型RNN（RecurrentNeuralNetwork)中文称为循环神经网络，它一般以序列数据为输入，通过网络内部的结构设计有效捕捉序列之间的关系特征，一般也是以序列形式进行输出RNN的循环机制使模型隐层上一时间步产生的结果，能够作为当下时间步输入的一部分（当下时间步的输入除了正常的输入外还包括上一步的隐层输出）对当下时间步的输出产生影响2、R
2024上半年软考系统架构设计师-综合知识选择题及答案不对法系统架构
1.操作系统先来先服务调度算法2.操作系统多道程序设计，利用率3.操作系统状态流转错误的，执行态到运行态4.数据库2NF每一个非主属性完全依赖主键5.数据库笛卡尔积m*n6.数据库不属于事务的特点，并发性7.数据库交集表达式R-(R-S)8.数据库反规范化属于逻辑设计9.网络没有加密功能，物理层10.网络二层交换机数据，数据链路层11.知识产权专利法是否属于民法12.知识产权商标不属于，其他几个是
python 判断 ‘NoneType’的方法 cuisidong1997 文本转换 python
的错误时说明需要进行判断，而对‘NoneType’进行判断时直接使用‘isNone’即可，如下：iftextisNone:print('testis’+None)else:print('testisnot’+None)a=re.match(r’主叫号码(.*)客户姓名’,r’2、主叫号码：15558191990;3、客户姓名：韩东远;')print(type(a))ifaisNone:print(
R 数据可视化 —— 韦恩图名本无名
前言对于数据集之间交叠关系的可视化，通常想到的是绘制韦恩图。韦恩图是一种关系型图表，通过图形之间的重叠来反映数据集之间的相交关系。下面，我们来简单介绍一下如何绘制韦恩图韦恩图绘制韦恩图的包有很多，比如gplots包的venn()函数、limma包的vennDiagram()函数、venneuler包的venneuler()函数。但是这些包绘制出来的图像效果都不是很好，所以我们使用比较成熟的包Ven
Mac清倒废纸篓提示“voicetrigger“在使用中 ReddingtonLin Mac Mac
删除Mac下的user以后，清倒废纸篓，提示“voicetrigger”在使用中。解决办法：重启Mac，开机的时候按住Cmd+R进入Recovery模式选择语言-简体中文从工具菜单中启动终端，输入密码。输入csrutildisable命令，即可关闭SIP服务。重启电脑。（正常重启即可，不用按住Cmd+R进入Recovery模式）再尝试清空废纸篓。如果还不行，就尝试用命令行删除。处理好后，再开启SI
大数据新视界 --大数据大厂之数据挖掘入门：用 R 语言开启数据宝藏的探索之旅青云交大数据新视界数据库大数据数据挖掘 R 语言算法案例未来趋势应用场景学习建议大数据新视界
亲爱的朋友们，热烈欢迎你们来到青云交的博客！能与你们在此邂逅，我满心欢喜，深感无比荣幸。在这个瞬息万变的时代，我们每个人都在苦苦追寻一处能让心灵安然栖息的港湾。而我的博客，正是这样一个温暖美好的所在。在这里，你们不仅能够收获既富有趣味又极为实用的内容知识，还可以毫无拘束地畅所欲言，尽情分享自己独特的见解。我真诚地期待着你们的到来，愿我们能在这片小小的天地里共同成长，共同进步。本博客的精华专栏：Ja
python做窗口软件界面绑定py程序_PyCharm GUI界面开发和exe文件生成的实现 weixin_39948442
一、安装Python二、安装PyQt5推荐使用pip安装：win+R调出cmd命令窗口pipinstallPyQt5等待片刻，继续安装PyQt5-toolspipinstallPyQt5-tools如果直接pip不成功的话，建议在python库这个网站上搜索相关库，下载相应的.whl文件，然后用以下方法进行安装：①pipwhl文件所在路径whl文件名②在cmd命令窗口先执行cdwhl文件所在路径到
02 Java-Lambda-Java 8 自带的函数接口王小杰at2019
Java8自带的函数接口我们使用lambda在处理自己定义的业务时，需要自定义函数式接口，其实java8已经内置了常用的接口，这样我们在用的时候不要需要自己定义接口，根据需要选择符合自己业务逻辑的接口接口|输入参数|返回值类型|说明---|---|---|---|---Predicate|T|boolean|断言Consumer|T|/|消费一个数据|Function|T|R|输入一个T输出一个R
10.web应用体系以及windows网络常见操作应用 XXX-17 软件测试软件测试
一、Dos命令1.启动方式：win+R，输入cmd2.切换盘符/路径：盘符名称+：（C:)cd目录（cdB111）（目录名按table键自动补全）3.查看目录：dirdir/p分页展示目录及文件dir/b展示文件名称4.创建文件夹：md文件夹名（mdt1)5.删除文件夹：rd文件夹名（rdt1）删除文件：del文件名（del222.txt）6.复制文件：copy复制文件目标路径（copymaste
VMware Workstation 11 或者 VMware Player 7安装MAC OS X 10.10 Yosemite iwindyforest vmware mac os 10.10 workstation player
最近尝试了下VMware下安装MacOS 系统，安装过程中发现网上可供参考的文章都是VMware Workstation 10以下， MacOS X 10.9以下的文章，只能提供大概的思路，但是实际安装起来由于版本问题，走了不少弯路，所以我尝试写以下总结，希望能给有兴趣安装OSX的人提供一点帮助。写在前面的话：其实安装好后发现，由于我的th
关于《基于模型驱动的B/S在线开发平台》源代码开源的疑虑？ deathwknight JavaScript java 框架
本人从学习Java开发到现在已有10年整，从一个要自学 java买成javascript的小菜鸟，成长为只会java和javascript语言的老菜鸟（个人邮箱：[email protected]）一路走来，跌跌撞撞。用自己的三年多业余时间，瞎搞一个小东西（基于模型驱动的B/S在线开发平台，非MVC框架、非代码生成）。希望与大家一起分享，同时有许些疑虑，希望有人可以交流下平台
如何把maven项目转成web项目 Kai_Ge maven MyEclipse
创建Web工程，使用eclipse ee创建maven web工程 1.右键项目,选择Project Facets,点击Convert to faceted from 2.更改Dynamic Web Module的Version为2.5.(3.0为Java7的,Tomcat6不支持). 如果提示错误,可能需要在Java Compiler设置Compiler compl
主管？？？ Array_06 工作
转载：http://www.blogjava.net/fastzch/archive/2010/11/25/339054.html 很久以前跟同事参加的培训，同事整理得很详细，必须得转！前段时间，公司有组织中高阶主管及其培养干部进行了为期三天的管理训练培训。三天的课程下来，虽然内容较多，因对老师三天来的课程内容深有感触，故借着整理学习心得的机会，将三天来的培训课程做了一个
python内置函数大全 2002wmj python
最近一直在看python的document，打算在基础方面重点看一下python的keyword、Build-in Function、Build-in Constants、Build-in Types、Build-in Exception这四个方面，其实在看的时候发现整个《The Python Standard Library》章节都是很不错的，其中描述了很多不错的主题。先把Build-in Fu
JSP页面通过JQUERY合并行 357029540 JavaScript jquery
在写程序的过程中我们难免会遇到在页面上合并单元行的情况，如图所示如果对于会的同学可能很简单，但是对没有思路的同学来说还是比较麻烦的，提供一下用JQUERY实现的参考代码 function mergeCell(){ var trs = $("#table tr"); &nb
Java基础冰天百华 java基础
学习函数式编程 package base; import java.text.DecimalFormat; public class Main { public static void main(String[] args) { // Integer a = 4; // Double aa = (double)a / 100000; // Decimal
unix时间戳相互转换 adminjun 转换 unix 时间戳
如何在不同编程语言中获取现在的Unix时间戳(Unix timestamp)？ Java time JavaScript Math.round(new Date().getTime()/1000) getTime()返回数值的单位是毫秒 Microsoft .NET / C# epoch = (DateTime.Now.ToUniversalTime().Ticks - 62135
作为一个合格程序员该做的事 aijuans 程序员
作为一个合格程序员每天该做的事 1、总结自己一天任务的完成情况最好的方式是写工作日志，把自己今天完成了什么事情，遇见了什么问题都记录下来，日后翻看好处多多 2、考虑自己明天应该做的主要工作把明天要做的事情列出来，并按照优先级排列，第二天应该把自己效率最高的时间分配给最重要的工作 3、考虑自己一天工作中失误的地方，并想出避免下一次再犯的方法出错不要紧，最重
由html5视频播放引发的总结 ayaoxinchao html5 视频 video
前言项目中存在视频播放的功能，前期设计是以flash播放器播放视频的。但是现在由于需要兼容苹果的设备，必须采用html5的方式来播放视频。我就出于兴趣对html5播放视频做了简单的了解，不了解不知道，水真是很深。本文所记录的知识一些浅尝辄止的知识，说起来很惭愧。视频结构本该直接介绍html5的<video>的，但鉴于本人对视频
解决httpclient访问自签名https报javax.net.ssl.SSLHandshakeException: sun.security.validat bewithme httpclient
如果你构建了一个https协议的站点，而此站点的安全证书并不是合法的第三方证书颁发机构所签发，那么你用httpclient去访问此站点会报如下错误 javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path bu
Jedis连接池的入门级使用 bijian1013 redis redis数据库 jedis
Jedis连接池操作步骤如下： a.获取Jedis实例需要从JedisPool中获取； b.用完Jedis实例需要返还给JedisPool； c.如果Jedis在使用过程中出错，则也需要还给JedisPool； packag
变与不变 bingyingao 不变变亲情永恒
变与不变周末骑车转到了五年前租住的小区，曾经最爱吃的西北面馆、江西水饺、手工拉面早已不在，各种店铺都换了好几茬，这些是变的。三年前还很流行的一款手机在今天看起来已经落后的不像样子。三年前还运行的好好的一家公司，今天也已经不复存在。一座座高楼拔地而起，
【Scala十】Scala核心四：集合框架之List bit1129 scala
Spark的RDD作为一个分布式不可变的数据集合，它提供的转换操作，很多是借鉴于Scala的集合框架提供的一些函数，因此，有必要对Scala的集合进行详细的了解 1. 泛型集合都是协变的，对于List而言，如果B是A的子类，那么List[B]也是List[A]的子类，即可以把List[B]的实例赋值给List[A]变量 2. 给变量赋值(注意val关键字，a，b
Nested Functions in C bookjovi c closure
Nested Functions 又称closure，属于functional language中的概念，一直以为C中是不支持closure的，现在看来我错了，不过C标准中是不支持的，而GCC支持。既然GCC支持了closure，那么 lexical scoping自然也支持了，同时在C中label也是可以在nested functions中自由跳转的
Java-Collections Framework学习与总结-WeakHashMap BrokenDreams Collections
总结这个类之前，首先看一下Java引用的相关知识。Java的引用分为四种：强引用、软引用、弱引用和虚引用。强引用：就是常见的代码中的引用，如Object o = new Object();存在强引用的对象不会被垃圾收集
读《研磨设计模式》-代码笔记-解释器模式-Interpret bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ package design.pattern; /* * 解释器（Interpreter）模式的意图是可以按照自己定义的组合规则集合来组合可执行对象 * * 代码示例实现XML里面1.读取单个元素的值 2.读取单个属性的值 * 多
After Effects操作&快捷键 cherishLC After Effects
1、快捷键官方文档中文版：https://helpx.adobe.com/cn/after-effects/using/keyboard-shortcuts-reference.html 英文版：https://helpx.adobe.com/after-effects/using/keyboard-shortcuts-reference.html 2、常用快捷键
Maven 常用命令 crabdave maven
Maven 常用命令 mvn archetype:generate mvn install mvn clean mvn clean complie mvn clean test mvn clean install mvn clean package mvn test mvn package mvn site mvn dependency:res
shell bad substitution daizj shell 脚本
#!/bin/sh /data/script/common/run_cmd.exp 192.168.13.168 "impala-shell -islave4 -q 'insert OVERWRITE table imeis.${tableName} select ${selectFields}, ds, fnv_hash(concat(cast(ds as string), im
Java SE 第二讲（原生数据类型 Primitive Data Type） dcj3sjt126com java
Java SE 第二讲： 1. Windows: notepad, editplus, ultraedit, gvim Linux: vi, vim, gedit 2. Java 中的数据类型分为两大类： 1）原生数据类型（Primitive Data Type） 2）引用类型（对象类型）（R
CGridView中实现批量删除 dcj3sjt126com PHP yii
1，CGridView中的columns添加 array( 'selectableRows' => 2, 'footer' => '<button type="button" onclick="GetCheckbox();" style=&
Java中泛型的各种使用 dyy_gusi java 泛型
Java中的泛型的使用：1.普通的泛型使用在使用类的时候后面的<>中的类型就是我们确定的类型。 public class MyClass1<T> {//此处定义的泛型是T private T var; public T getVar() { return var; } public void setVa
Web开发技术十年发展历程 gcq511120594 Web 浏览器数据挖掘
回顾web开发技术这十年发展历程： Ajax 03年的时候我上六年级，那时候网吧刚在小县城的角落萌生。传奇，大话西游第一代网游一时风靡。我抱着试一试的心态给了网吧老板两块钱想申请个号玩玩，然后接下来的一个小时我一直在，注，册，账，号。彼时网吧用的512k的带宽，注册的时候，填了一堆信息，提交，页面跳转，嘣，”您填写的信息有误，请重填”。然后跳转回注册页面，以此循环。我现在时常想，如果当时a
openSession()与getCurrentSession()区别： hetongfei java DAO Hibernate
来自 http://blog.csdn.net/dy511/article/details/6166134 1.getCurrentSession创建的session会和绑定到当前线程,而openSession不会。 2. getCurrentSession创建的线程会在事务回滚或事物提交后自动关闭,而openSession必须手动关闭。这里getCurrentSession本地事务(本地
第一章安装Nginx+Lua开发环境 jinnianshilongnian nginx lua openresty
首先我们选择使用OpenResty，其是由Nginx核心加很多第三方模块组成，其最大的亮点是默认集成了Lua开发环境，使得Nginx可以作为一个Web Server使用。借助于Nginx的事件驱动模型和非阻塞IO，可以实现高性能的Web应用程序。而且OpenResty提供了大量组件如Mysql、Redis、Memcached等等，使在Nginx上开发Web应用更方便更简单。目前在京东如实时价格、秒
HSQLDB In-Process方式访问内存数据库 liyonghui160com
HSQLDB一大特色就是能够在内存中建立数据库，当然它也能将这些内存数据库保存到文件中以便实现真正的持久化。先睹为快！下面是一个In-Process方式访问内存数据库的代码示例：下面代码需要引入hsqldb.jar包（hsqldb-2.2.8） import java.s
Java线程的5个使用技巧 pda158 java 数据结构
Java线程有哪些不太为人所知的技巧与用法？　　萝卜白菜各有所爱。像我就喜欢Java。学无止境，这也是我喜欢它的一个原因。日常工作中你所用到的工具，通常都有些你从来没有了解过的东西，比方说某个方法或者是一些有趣的用法。比如说线程。没错，就是线程。或者确切说是Thread这个类。当我们在构建高可扩展性系统的时候，通常会面临各种各样的并发编程的问题，不过我们现在所要讲的可能会略有不同。
开发资源大整合：编程语言篇——JavaScript（1） shoothao JavaScript
概述：本系列的资源整合来自于github中各个领域的大牛，来收藏你感兴趣的东西吧。程序包管理器管理javascript库并提供对这些库的快速使用与打包的服务。 Bower - 用于web的程序包管理。 component - 用于客户端的程序包管理，构建更好的web应用程序。 spm - 全新的静态的文件包管
避免使用终结函数 vahoa.ma java jvm C++
终结函数（finalizer）通常是不可预测的，常常也是很危险的，一般情况下不是必要的。使用终结函数会导致不稳定的行为、更差的性能，以及带来移植性问题。不要把终结函数当做C++中的析构函数（destructors）的对应物。我自己总结了一下这一条的综合性结论是这样的： 1）在涉及使用资源，使用完毕后要释放资源的情形下，首先要用一个显示的方