简单K折交叉验证源码:(
将数据集拆分为k个连续的折叠(默认情况下不进行混洗)。
然后将每个折叠用作一次验证,而剩下的k-1个折叠形成训练集。
)
import numpy as np from sklearn.model_selection import KFold X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) y = np.array([1, 2, 3, 4]) kf = KFold(n_splits=2) kf.get_n_splits(X) print(kf) for train_index, test_index in kf.split(X): print("TRAIN:", train_index, "TEST:", test_index) X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index]
# 输出
TRAIN: [2 3] TEST: [0 1]
TRAIN: [0 1] TEST: [2 3]
参考教程:https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.KFold.html And https://machinelearningmastery.com/k-fold-cross-validation/
N次K折交叉验证源码:(重复K折n次,每次重复具有不同的随机性)
import numpy as np from sklearn.model_selection import RepeatedKFold X = np.array([[1, 2], [3, 4], [1, 2], [3, 4]]) y = np.array([0, 0, 1, 1]) rkf = RepeatedKFold(n_splits=2, n_repeats=2, random_state=2652124) #2次2折交叉验证 for train_index, test_index in rkf.split(X): print("TRAIN:", train_index, "TEST:", test_index) X_train, X_test = X[train_index], X[test_index] y_train, y_test = y[train_index], y[test_index]
# 输出:
TRAIN: [0 1] TEST: [2 3] TRAIN: [2 3] TEST: [0 1] TRAIN: [1 2] TEST: [0 3] TRAIN: [0 3] TEST: [1 2]