python-pandas功能大全

查全手册

http://pan.baidu.com/s/1nvNmzkH

随机按照一定比例采样

将df分拆为df_sample和df_reset部分

df_sample = df.sample(frac = 0.7)
df_reset = df.loc[~df.index.isin(df_sample.index)]

计算数目

dia_num = len(df[df['DiagGDM'] == 1])
total_num = len(df)

改变类型

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df[['two', 'three']] = df[['two', 'three']].astype(float)

将numpy顺序按行打乱

np.random.shuffle(train_data)
np.random.shuffle(test_data)

官方文档10 Minutes to pandas

10 Minutes to pandas

打乱训练和测试样本

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.5, random_state=0)

你可能感兴趣的:(python)