《python机器学习及实践-从零开始通往kaggle竞赛之路(代码Python 3.6 版)》chapter1.1

 本博客代码是对书《python机器学习及实践-从零开始通往kaggle竞赛之路》,基于Python3.6的实现,并且使用的所需的库是最新的(2017/12/8)。

chapter1_1

import pandas as pd  #导入pandas 库
df_train = pd.read_csv('../Datasets/Breast-Cancer/breast-cancer-train.csv') #读取目录下的数据,如果代码与文件路径不在一起,则需要另行设置

df_test = pd.read_csv('../Datasets/Breast-Cancer/breast-cancer-test.csv')

print(df_train.head(5)) #显示df_train 前列5行数据,了解数据大概样式
print(df_test.head(5))

df_test_negative = df_test.loc[df_test['Type'] == 0][['Clump Thickness', 'Cell Size']] #先对test 的“Type”行进行判断,然后切分其他两列数据
df_test_positive = df_test.loc[df_test['Type'] == 1][['Clump Thickness', 'Cell Size']]
print(df_test_negative.head())
print(df_test_positive.head())

import matplotlib.pyplot as plt

plt.scatter(df_test_negative['Clump Thickness'],df_test_negative['Cell Size'],marker = 'o', s=20, c='green')
plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'], marker = 'x', s=10, c='red')

plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')

plt.show()

import numpy as np

intercept = np.random.random([1])
coef = np.random.random([2])

lx = np.arange(0, 12)
ly = (-intercept - lx * coef[0]) / coef[1]

plt.plot(lx, ly, c='yellow')


plt.scatter(df_test_negative['Clump Thickness'],df_test_negative['Cell Size'], marker = 'o', s=200, c='red')
plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'], marker = 'x', s=150, c='black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

from sklearn.linear_model import LogisticRegression
lr = LogisticRegression()

lr.fit(df_train[['Clump Thickness', 'Cell Size']][:10], df_train['Type'][:10])
print ('Testing accuracy (10 training samples):', lr.score(df_test[['Clump Thickness', 'Cell Size']], df_test['Type']))

intercept = lr.intercept_
coef = lr.coef_[0, :]

ly = (-intercept - lx * coef[0]) / coef[1]

plt.plot(lx, ly, c='green')
plt.scatter(df_test_negative['Clump Thickness'],df_test_negative['Cell Size'], marker = 'o', s=200, c='red')
plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'], marker = 'x', s=150, c='black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

lr = LogisticRegression()

lr.fit(df_train[['Clump Thickness', 'Cell Size']], df_train['Type'])
print ('Testing accuracy (all training samples):', lr.score(df_test[['Clump Thickness', 'Cell Size']], df_test['Type']))

intercept = lr.intercept_
coef = lr.coef_[0, :]
ly = (-intercept - lx * coef[0]) / coef[1]

plt.plot(lx, ly, c='blue')
plt.scatter(df_test_negative['Clump Thickness'],df_test_negative['Cell Size'], marker = 'o', s=200, c='red')
plt.scatter(df_test_positive['Clump Thickness'],df_test_positive['Cell Size'], marker = 'x', s=150, c='black')
plt.xlabel('Clump Thickness')
plt.ylabel('Cell Size')
plt.show()

发布修改代码已经过作者同意,如果有疑问,可以留言给我。




你可能感兴趣的:(python机器学习及实践)