写在正文之前:
这篇紧接着上一篇的博文
深度学习笔记-----基于TensorFlow2.2.0代码练习(第一课)
主要写的是TensorFlow2.0的代码练习,跟随着KGP Talkie的【TensorFlow 2.0】实战进阶教程进行学习,并将其中一些不适用的代码错误进行修改。
本文跟随视频油管非常火的【TensorFlow 2.0】实战进阶教程(中英字幕+代码实战)第二课
课程所需要的数据链接:https://pan.baidu.com/s/1Lpo3l3UaPANOGE_HGJf2TQ
提取码:dqo4
注意:需要把数据放到jupyter目录下
如何建立第一个ANN
1 数据处理
2 建立输入层
3 初始随机化输入权重W
4 建立隐藏层
5 选择优化,损失和精确性指标
6 编译模型
7 使用model.fit 训练模型
8 评估模型
9 如果有需要的话调整模型
import tensorflow as tf
from tensorflow import keras
from tensorflow.python.keras import Sequential
from tensorflow.python.keras.layers import Flatten,Dense
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
dataset = pd.read_csv('customer_Churn_Modelling.csv')
dataset.head()
|
RowNumber |
CustomerId |
Surname |
CreditScore |
Geography |
Gender |
Age |
Tenure |
Balance |
NumOfProducts |
HasCrCard |
IsActiveMember |
EstimatedSalary |
Exited |
0 |
1 |
15634602 |
Hargrave |
619 |
France |
Female |
42 |
2 |
0.00 |
1 |
1 |
1 |
101348.88 |
1 |
1 |
2 |
15647311 |
Hill |
608 |
Spain |
Female |
41 |
1 |
83807.86 |
1 |
0 |
1 |
112542.58 |
0 |
2 |
3 |
15619304 |
Onio |
502 |
France |
Female |
42 |
8 |
159660.80 |
3 |
1 |
0 |
113931.57 |
1 |
3 |
4 |
15701354 |
Boni |
699 |
France |
Female |
39 |
1 |
0.00 |
2 |
0 |
0 |
93826.63 |
0 |
4 |
5 |
15737888 |
Mitchell |
850 |
Spain |
Female |
43 |
2 |
125510.82 |
1 |
1 |
1 |
79084.10 |
0 |
X = dataset.drop(labels=['CustomerId','Surname','RowNumber','Exited'],axis =1)
y = dataset['Exited']
X.head()
|
CreditScore |
Geography |
Gender |
Age |
Tenure |
Balance |
NumOfProducts |
HasCrCard |
IsActiveMember |
EstimatedSalary |
0 |
619 |
France |
Female |
42 |
2 |
0.00 |
1 |
1 |
1 |
101348.88 |
1 |
608 |
Spain |
Female |
41 |
1 |
83807.86 |
1 |
0 |
1 |
112542.58 |
2 |
502 |
France |
Female |
42 |
8 |
159660.80 |
3 |
1 |
0 |
113931.57 |
3 |
699 |
France |
Female |
39 |
1 |
0.00 |
2 |
0 |
0 |
93826.63 |
4 |
850 |
Spain |
Female |
43 |
2 |
125510.82 |
1 |
1 |
1 |
79084.10 |
y.head()
0 1
1 0
2 1
3 0
4 0
Name: Exited, dtype: int64
from sklearn.preprocessing import LabelEncoder
label1 = LabelEncoder()
X['Geography'] = label1.fit_transform(X['Geography'])
X.head()
|
CreditScore |
Geography |
Gender |
Age |
Tenure |
Balance |
NumOfProducts |
HasCrCard |
IsActiveMember |
EstimatedSalary |
0 |
619 |
0 |
Female |
42 |
2 |
0.00 |
1 |
1 |
1 |
101348.88 |
1 |
608 |
2 |
Female |
41 |
1 |
83807.86 |
1 |
0 |
1 |
112542.58 |
2 |
502 |
0 |
Female |
42 |
8 |
159660.80 |
3 |
1 |
0 |
113931.57 |
3 |
699 |
0 |
Female |
39 |
1 |
0.00 |
2 |
0 |
0 |
93826.63 |
4 |
850 |
2 |
Female |
43 |
2 |
125510.82 |
1 |
1 |
1 |
79084.10 |
label2 = LabelEncoder()
X['Gender'] = label1.fit_transform(X['Gender'])
X.head()
|
CreditScore |
Geography |
Gender |
Age |
Tenure |
Balance |
NumOfProducts |
HasCrCard |
IsActiveMember |
EstimatedSalary |
0 |
619 |
0 |
0 |
42 |
2 |
0.00 |
1 |
1 |
1 |
101348.88 |
1 |
608 |
2 |
0 |
41 |
1 |
83807.86 |
1 |
0 |
1 |
112542.58 |
2 |
502 |
0 |
0 |
42 |
8 |
159660.80 |
3 |
1 |
0 |
113931.57 |
3 |
699 |
0 |
0 |
39 |
1 |
0.00 |
2 |
0 |
0 |
93826.63 |
4 |
850 |
2 |
0 |
43 |
2 |
125510.82 |
1 |
1 |
1 |
79084.10 |
|
CreditScore |
Gender |
Age |
Tenure |
Balance |
NumOfProducts |
HasCrCard |
IsActiveMember |
EstimatedSalary |
Geography_1 |
Geography_2 |
0 |
619 |
0 |
42 |
2 |
0.00 |
1 |
1 |
1 |
101348.88 |
0 |
0 |
1 |
608 |
0 |
41 |
1 |
83807.86 |
1 |
0 |
1 |
112542.58 |
0 |
1 |
2 |
502 |
0 |
42 |
8 |
159660.80 |
3 |
1 |
0 |
113931.57 |
0 |
0 |
3 |
699 |
0 |
39 |
1 |
0.00 |
2 |
0 |
0 |
93826.63 |
0 |
0 |
4 |
850 |
0 |
43 |
2 |
125510.82 |
1 |
1 |
1 |
79084.10 |
0 |
1 |
5 |
645 |
1 |
44 |
8 |
113755.78 |
2 |
1 |
0 |
149756.71 |
0 |
1 |
6 |
822 |
1 |
50 |
7 |
0.00 |
2 |
1 |
1 |
10062.80 |
0 |
0 |
7 |
376 |
0 |
29 |
4 |
115046.74 |
4 |
1 |
0 |
119346.88 |
1 |
0 |
8 |
501 |
1 |
44 |
4 |
142051.07 |
2 |
0 |
1 |
74940.50 |
0 |
0 |
9 |
684 |
1 |
27 |
2 |
134603.88 |
1 |
1 |
1 |
71725.73 |
0 |
0 |
X = pd.get_dummies(X, drop_first=True, columns=['Geography'])
X.head(30)
|
CreditScore |
Gender |
Age |
Tenure |
Balance |
NumOfProducts |
HasCrCard |
IsActiveMember |
EstimatedSalary |
Geography_1 |
Geography_2 |
0 |
619 |
0 |
42 |
2 |
0.00 |
1 |
1 |
1 |
101348.88 |
0 |
0 |
1 |
608 |
0 |
41 |
1 |
83807.86 |
1 |
0 |
1 |
112542.58 |
0 |
1 |
2 |
502 |
0 |
42 |
8 |
159660.80 |
3 |
1 |
0 |
113931.57 |
0 |
0 |
3 |
699 |
0 |
39 |
1 |
0.00 |
2 |
0 |
0 |
93826.63 |
0 |
0 |
4 |
850 |
0 |
43 |
2 |
125510.82 |
1 |
1 |
1 |
79084.10 |
0 |
1 |
5 |
645 |
1 |
44 |
8 |
113755.78 |
2 |
1 |
0 |
149756.71 |
0 |
1 |
6 |
822 |
1 |
50 |
7 |
0.00 |
2 |
1 |
1 |
10062.80 |
0 |
0 |
7 |
376 |
0 |
29 |
4 |
115046.74 |
4 |
1 |
0 |
119346.88 |
1 |
0 |
8 |
501 |
1 |
44 |
4 |
142051.07 |
2 |
0 |
1 |
74940.50 |
0 |
0 |
9 |
684 |
1 |
27 |
2 |
134603.88 |
1 |
1 |
1 |
71725.73 |
0 |
0 |
10 |
528 |
1 |
31 |
6 |
102016.72 |
2 |
0 |
0 |
80181.12 |
0 |
0 |
11 |
497 |
1 |
24 |
3 |
0.00 |
2 |
1 |
0 |
76390.01 |
0 |
1 |
12 |
476 |
0 |
34 |
10 |
0.00 |
2 |
1 |
0 |
26260.98 |
0 |
0 |
13 |
549 |
0 |
25 |
5 |
0.00 |
2 |
0 |
0 |
190857.79 |
0 |
0 |
14 |
635 |
0 |
35 |
7 |
0.00 |
2 |
1 |
1 |
65951.65 |
0 |
1 |
15 |
616 |
1 |
45 |
3 |
143129.41 |
2 |
0 |
1 |
64327.26 |
1 |
0 |
16 |
653 |
1 |
58 |
1 |
132602.88 |
1 |
1 |
0 |
5097.67 |
1 |
0 |
17 |
549 |
0 |
24 |
9 |
0.00 |
2 |
1 |
1 |
14406.41 |
0 |
1 |
18 |
587 |
1 |
45 |
6 |
0.00 |
1 |
0 |
0 |
158684.81 |
0 |
1 |
19 |
726 |
0 |
24 |
6 |
0.00 |
2 |
1 |
1 |
54724.03 |
0 |
0 |
20 |
732 |
1 |
41 |
8 |
0.00 |
2 |
1 |
1 |
170886.17 |
0 |
0 |
21 |
636 |
0 |
32 |
8 |
0.00 |
2 |
1 |
0 |
138555.46 |
0 |
1 |
22 |
510 |
0 |
38 |
4 |
0.00 |
1 |
1 |
0 |
118913.53 |
0 |
1 |
23 |
669 |
1 |
46 |
3 |
0.00 |
2 |
0 |
1 |
8487.75 |
0 |
0 |
24 |
846 |
0 |
38 |
5 |
0.00 |
1 |
1 |
1 |
187616.16 |
0 |
0 |
25 |
577 |
1 |
25 |
3 |
0.00 |
2 |
0 |
1 |
124508.29 |
0 |
0 |
26 |
756 |
1 |
36 |
2 |
136815.64 |
1 |
1 |
1 |
170041.95 |
1 |
0 |
27 |
571 |
1 |
44 |
9 |
0.00 |
2 |
0 |
0 |
38433.35 |
0 |
0 |
28 |
574 |
0 |
43 |
3 |
141349.43 |
1 |
1 |
1 |
100187.43 |
1 |
0 |
29 |
411 |
1 |
29 |
0 |
59697.17 |
2 |
1 |
1 |
53483.21 |
0 |
0 |
特征标准化
from sklearn.preprocessing import StandardScaler
X_train, X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2, random_state = 0, stratify = y)
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.fit_transform(X_test)
y_test
1344 1
8167 0
4747 0
5004 1
3124 1
..
9107 0
8249 0
8337 0
6279 1
412 0
Name: Exited, Length: 2000, dtype: int64
构建ANN
model = Sequential()
model.add(Dense(X.shape[1],activation='relu',input_dim = X.shape[1]))
model.add(Dense(128,activation = 'relu'))
model.add(Dense(1,activation = 'sigmoid'))
WARNING:tensorflow:From F:\Anaconda3\lib\site-packages\tensorflow\python\ops\resource_variable_ops.py:435: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
model.compile(optimizer = 'adam',loss ='binary_crossentropy',metrics=['accuracy'])
model.fit(X_train,y_train.to_numpy(),batch_size=10,epochs=10,verbose=1)
WARNING:tensorflow:From F:\Anaconda3\lib\site-packages\tensorflow\python\ops\math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.
Epoch 1/10
8000/8000 [==============================] - 1s 94us/sample - loss: 0.4515 - acc: 0.8049
Epoch 2/10
8000/8000 [==============================] - 1s 80us/sample - loss: 0.4185 - acc: 0.8202
Epoch 3/10
8000/8000 [==============================] - 1s 80us/sample - loss: 0.4057 - acc: 0.8324
Epoch 4/10
8000/8000 [==============================] - 1s 77us/sample - loss: 0.3752 - acc: 0.8431
Epoch 5/10
8000/8000 [==============================] - 1s 79us/sample - loss: 0.3507 - acc: 0.8571
Epoch 6/10
8000/8000 [==============================] - 1s 78us/sample - loss: 0.3415 - acc: 0.8591
Epoch 7/10
8000/8000 [==============================] - 1s 79us/sample - loss: 0.3363 - acc: 0.8620
Epoch 8/10
8000/8000 [==============================] - 1s 84us/sample - loss: 0.3345 - acc: 0.8619
Epoch 9/10
8000/8000 [==============================] - 1s 74us/sample - loss: 0.3328 - acc: 0.8602
Epoch 10/10
8000/8000 [==============================] - 1s 74us/sample - loss: 0.3302 - acc: 0.8626
y_pred = model.predict_classes(X_test)
y_pred
array([[0],
[0],
[0],
...,
[0],
[1],
[0]])
y_test
1344 1
8167 0
4747 0
5004 1
3124 1
..
9107 0
8249 0
8337 0
6279 1
412 0
Name: Exited, Length: 2000, dtype: int64
model.evaluate(X_test, y_test.to_numpy())
2000/2000 [==============================] - 0s 34us/sample - loss: 0.3583 - acc: 0.8535
[0.3583366745710373, 0.8535]
from sklearn.metrics import confusion_matrix, accuracy_score
confusion_matrix(y_test,y_pred)
array([[1525, 68],
[ 225, 182]], dtype=int64)
accuracy_score(y_test,y_pred)
0.8535