数据分析:基于K-folds + repeats tuning 模型参数

前言

为了得到robust parameters,常常会对模型参数进行tuning,tuning的方法有K-folds CV或 N repeats 构建模型,前者对单次运行时划分样本训练模型,后者重复前者N次,接着选择前者最小lambda组合成最小lambda集合,再根据median或min等选择最佳lambda,最后基于最佳lambda构建新模型。更多知识分享请到 https://zouhua.top/

Codes

library(glmnet)
data(QuickStartExample)

df_lambdas_min <- c()
# 10-fold CV + 10 repeat
for(i in 1:10){
  cvfit <- cv.glmnet(x=x,
                     y=y,
                     nfolds = 10,
                     alpha = 1,
                     nlambda = 100,
                     type.measure = "auc")

# require(ipflasso)
# cvfit <- cvr.glmnet(X=dat_table,
#                  Y=dat_target,
#                  family='binomial',
#                  nfolds = 10,
#                  alpha = 1,
#                  ncv = 10,
#                  nlambda = 100,
#                  type.measure = "auc")

  df_lambdas_min <- rbind(df_lambdas_min, cvfit$lambda.min)
}
print(df_lambdas_min)

Notes: K folds设置应该考虑到sample size的问题,每个fold的sample size一定要大于8,所以10 folds的最小sample size是80

  1. sample size per condition less than 8


  1. sample size per folds less than 10


参考

  1. Circulating Protein Biomarkers for Use in Pancreatic Ductal Adenocarcinoma Identification

  2. An Introduction to glmnet

  3. Repeating cv.glmnet

参考文章如引起任何侵权问题,可以与我联系,谢谢。

你可能感兴趣的:(数据分析:基于K-folds + repeats tuning 模型参数)