鸢尾花数据集(iris)是常用的分类实验数据集,有Fisher与1936年收集整理。iris包含150条数据,3个类别(Setosa,Versicolour,Virginica),每类50条数据,每条数据包含4个数值属性(花萼长度,花萼宽度,花瓣长度,花瓣宽度)。
从sklearn数据库可导入iris数据集:
from sklearn.datasets import load_iris
data = load_iris()
data是Bunch类型的参数,Bunch类继承于dict(字典),也是键值对形式的数据保存模块。但与字典不同的是可以直接通过Bunch.属性名 的形式访问元素。
from sklearn.datasets.base import Bunch a = Bunch(age = 'aa',set = "sdf") a.age Out[9]: 'aa'
必须从sklearn.datasets.base导入Bunch
data包含:data、target、target_names、DESCR、feature_names五个属性,这里只用到前两个:data、target。
data.data的shape(None,4)
data.target是直接的类标签[0,0,0,0,0,0,1,1,1,1,2,2,2,2,2....]。我们要将其转换为shape(None,3),这就是所谓的独热编码(one-hot)处理。
iris_target = np.float32(tf.keras.utils.to_categorical(iris_target,num_classes=3))
一行3列的形式表示三个类别。
tensorflow采用批量数据读取:tf.data.Dataset.from_tensor_slices函数 并通过batch打包,我们将数据和标签打包成元组一起放入训练数据中
train_data = tf.data.Dataset.from_tensor_slices((iris_data,iris_target)).batch(128)
此时,我们准备好了数据集train_data 。它是一个BatchDataset类型数据,shapes:((None,4),(None,3)),types:(tf.float32,tf.float32)
import tensorflow as tf
import numpy as np
from sklearn.datasets import load_iris
import math
data = load_iris()
iris_target = data.target
iris_data = np.float32(data.data)
iris_target = np.float32(tf.keras.utils.to_categorical(iris_target,num_classes=3))
train_data = tf.data.Dataset.from_tensor_slices((iris_data,iris_target)).batch(128)
model = tf.keras.models.Sequential()
# Add layers
model.add(tf.keras.layers.Dense(32, activation="relu"))
model.add(tf.keras.layers.Dense(64, activation="relu"))
model.add(tf.keras.layers.Dense(3,activation="softmax"))
opt = tf.optimizers.Adam(1e-3)
model.compile(optimizer = opt,loss=tf.losses.categorical_crossentropy,metrics = ['accuracy'])
model.fit(train_data,epochs = 500)
score = model.evaluate(iris_data, iris_target)
print("last score:",score)
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(32,activation="relu"),
tf.keras.layers.Dense(64, activation="relu"),
tf.keras.layers.Dense(3,activation="softmax")
])
opt = tf.optimizers.Adam(1e-3)
model.compile(optimizer = opt,loss=tf.losses.categorical_crossentropy,metrics = ['accuracy'])
model.fit(train_data,epochs = 500)
score = model.evaluate(iris_data, iris_target)
print("last score:",score)
input_xs = tf.keras.Input(shape=(4), name='input_xs')
out = tf.keras.layers.Dense(32, activation='relu', name='dense_1')(input_xs)
out = tf.keras.layers.Dense(64, activation='relu', name='dense_2')(out)
logits = tf.keras.layers.Dense(3, activation="softmax",name='predictions')(out)
model = tf.keras.Model(inputs=input_xs, outputs=logits)
opt = tf.optimizers.Adam(1e-3)
model.compile(optimizer=tf.optimizers.Adam(1e-3), loss=tf.losses.categorical_crossentropy,
metrics = ['accuracy'])
model.fit(train_data, epochs=500)
score = model.evaluate(iris_data, iris_target)
print("last score:",score)
上述三种方法的结果是一致的:
Epoch 1/500
1/Unknown - 1s 1s/step - loss: 1.4003 - accuracy: 0.3906
2/Unknown - 1s 523ms/step - loss: 2.0731 - accuracy: 0.33332/2 [==============================] - 1s 535ms/step - loss: 2.0731 - accuracy: 0.3333
Epoch 2/5001/2 [==============>...............] - ETA: 0s - loss: 1.2291 - accuracy: 0.3906
2/2 [==============================] - 0s 8ms/step - loss: 1.3029 - accuracy: 0.3333
Epoch 3/500。。。。。。。
Epoch 500/500
1/2 [==============>...............] - ETA: 0s - loss: 0.1002 - accuracy: 0.9609
2/2 [==============================] - 0s 7ms/step - loss: 0.0963 - accuracy: 0.9667150/1 [===========================] - 0s 399us/sample - loss: 0.0685 - accuracy: 0.9667
last score: [0.09203936378161112, 0.96666664]