python实现划分机器学习训练集与测试集

#!/usr/bin/env python3
# -*- coding: UTF-8 -*-

import numpy as np
from sklearn.model_selection import train_test_split
import pandas as pd

dataSetName = 'ionosphere'
dataSet = pd.read_csv(dataSetName + ".csv").values

# 读取的数据集shape = N*d (样例数*特征数)
# 先将数据集划分为输入数据和分类标签
X = dataSet[:, :-1] # 输入数据
labels = dataSet[:, -1] # 分类标签

X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.3, random_state=42)
# 训练集:测试集=7:3
# 概率划分,到该步骤就可以开始训练数据

参考:
https://blog.csdn.net/u010801439/article/details/79555857

你可能感兴趣的:(python,机器学习)