Tensorflow2.0的学习笔记,教材基于开源的https://github.com/dragen1860/Deep-Learning-with-TensorFlow-book
在GitHub上写笔记要经常查看很麻烦,在此记录一些整合的各种代码。能附上原文链接的都附上了,多数非原创,不要杠。
为即将到来的中文分类工作做知识储备
"""
Python 3.7.7
Tensorflow 2.1.0
"""
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import Sequential, layers, losses, optimizers, metrics
gpus = tf.config.experimental.list_physical_devices('GPU')
if gpus:
try:
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
except RuntimeError as e:
print(e)
# 设置显存增长式占用,教材中推荐,避免独占显卡资源导致别的用户不能正常工作
# 如果是个人电脑,或者cpu版tf,则不需要
BATCH_SIZE = 128 # batch size
TOTAL_WORDS = 10000 # 语料库总词汇量,超出部分填0
MAX_REVIEW_LEN = 80 # 每段评论最大字数,不足补0,多余删除
EMBEDDING_LEN = 100 # 词向量维度
word_index = keras.datasets.imdb.get_word_index()
embedding_index = {}
# 储存word和对应向量
GLOVE_DIR = 'glove.6B.100d.txt'
with open(GLOVE_DIR,encoding='utf-8') as f:
for line in f:
values = line.split()
word = values[0]
coefs = np.asarray(values[1:],dtype='float32')
# array和asarray都可以将结构数据转化为ndarray,
# 但是主要区别就是当数据源是ndarray时,array仍然会copy出一个副本,占用新的内存,
# 但asarray不会
embedding_index[word] = coefs
num_words = min(TOTAL_WORDS,len(word_index))
embedding_matrix = np.zeros((num_words,EMBEDDING_LEN))
for word,i in word_index.items():
if i >= TOTAL_WORDS:
continue
# 过滤掉超出词汇
embedding_vector = embedding_index.get(word)
# 查询词向量
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector
(x_train, y_train), (x_test, y_test) = keras.datasets.imdb.load_data(num_words=TOTAL_WORDS)
x_train = keras.preprocessing.sequence.pad_sequences(x_train,maxlen=MAX_REVIEW_LEN)
x_test = keras.preprocessing.sequence.pad_sequences(x_test,maxlen=MAX_REVIEW_LEN)
# 截断和填充句子,使得等长,此处长句子保留句子后面的部分,短句子在前面填充
train_db = tf.data.Dataset.from_tensor_slices((x_train,y_train))
train_db = train_db.shuffle(1000).batch(batch_size=BATCH_SIZE, drop_remainder=True)
test_db = tf.data.Dataset.from_tensor_slices((x_test,y_test))
test_db = test_db.batch(batch_size=BATCH_SIZE,drop_remainder=True)
# 因为在模型中定义了batch size,所以这里必须设定drop_remainder=True
class MyGRU(keras.Model):
def __init__(self, units):
super(MyGRU, self).__init__()
self.state0 = [tf.zeros([BATCH_SIZE,units])]
self.state1 = [tf.zeros([BATCH_SIZE,units])]
# [batch_size, units],构建Cell初始化状态向量
self.embedding = layers.Embedding(TOTAL_WORDS,EMBEDDING_LEN,
input_length=MAX_REVIEW_LEN,
trainable=False)
self.embedding.build(input_shape=(None,MAX_REVIEW_LEN))
self.embedding.set_weights([embedding_matrix])
# 词向量编码 [b, 80] => [b, 80, 100]
# 若不使用预训练的词向量,build和set_weights方法注释掉,
# 同时Embedding方法中的trainable设置为True即可
self.rnn_cell0 = layers.GRUCell(units,dropout=0.5)
self.rnn_cell1 = layers.GRUCell(units,dropout=0.5)
self.out_layer = Sequential([layers.Dense(units),
layers.Dropout(rate=0.5),
layers.ReLU(),
layers.Dense(1)])
# 教材里直接用的out_layer=layers.Dense(),但是该函数不接受training参数,
# 在call方法中,x = self.out_layer(out1,training)处会报错,
# 此时去掉training参数,不使用dropout层,
# 或者将Dropout使用Sequential装配在out_layer中,
# 用Sequential包装后可接受training参数
# self.rnn = Sequential([
# layers.GRU(units, dropout=0.5, return_sequences=True),
# layers.GRU(units, dropout=0.5)
# ])
# 层方式,不需要初始化状态list,代码也更简洁
# 非输出层必须设置return_sequences=True
def call(self, inputs, training=None):
x = self.embedding(inputs)
for word in tf.unstack(x,axis=1):
out0, self.state0 = self.rnn_cell0(word,self.state0,training)
out1, self.state1 = self.rnn_cell1(out0,self.state1,training)
x = self.out_layer(out1,training)
x = tf.sigmoid(x)
return x
model = MyGRU(64)
model.compile(optimizer=optimizers.Adam(1e-3),
loss=losses.BinaryCrossentropy(),
metrics=['accuracy'],
experimental_run_tf_function=False)
# BinaryCrossentropy是专门用于二分类,且模型输出为一维的情况
# 因为模型输出经过了Sigmoid函数,所以BinaryCrossentropy不需设置from_logits=True
# 如果是多分类,或者偏好模型输出二维来二分类,需要使用CategoricalCrossentropy
# tf2.0推荐分类模型输出不使用sigmoid函数,而是直接在损失函数里设置from_logits=True,能减小计算错误
# 在init方法中添加了dropout的话,需要加上experimental_run_tf_function=False,
# 否则会提示keras张量不能参与到eager张量的计算
# 暂不清楚原因,查compile函数手册,没有experimental_run_tf_function参数,应该是继承自更底层的类,
# 也有人说因为Keras在编译模型阶段,区分训练状态和非训练状态,
# 模型中有dropout或者batch norm层时,需要有所区分,但是这样compile后,下一步的evaluate函数里能自动区分吗?
# 想来是能的,但是强迫症表示不放心,还是更喜欢自己写训练和测试函数
# 新发现:Calling tf.config.experimental_run_functions_eagerly(True) will make all invocations of tf.function run eagerly instead of running as a traced graph function.
model.fit(train_db,
epochs=5,
validation_data=test_db)
print('预测:',model.evaluate(test_db))
criterion = losses.CategoricalCrossentropy(from_logits=True)
# 这里改用CategoricalCrossentropy,同时设置from_logits=True,记得将代码中的sigmoid输出删除
optimizer = optimizers.Adam(learning_rate=1e-3)
# 优化方法,学习率0.001
loss_meter = metrics.Mean()
# 代价函数值的监控类,也可以自己写
acc_meter = metrics.Accuracy()
# 准确率的监控类,也可以自己写
for epoch in range(5):
# 训练5次,一般来说5次远远不够,IMDB数据上20次起步
for step,(x,y) in enumerate(train_db):
with tf.GradientTape() as tape:
out = model(x,training=True)
y_onehot = tf.one_hot(y,depth=2)
loss = criterion(y,out)
grads = tape.gradient(loss,model.trainable_variables)
optimizer.apply_gradients(zip(grads,model.trainable_variables))
loss_meter.update_state(loss.numpy())
pred = tf.argmax(out,axis=1)
acc_meter.update_state(y,pred)
# 每个step都算一下准确率,速度肯定慢,有需求的话自己改
if (step+1)%10 == 0:
# 每10个step统计下这10个step的平均代价函数值和平均准确率
# acc_meter和loss_meter的result方法会计算所有储存在其中的值的均值
print(f'Epoch:{epoch+1},Step:{step+1},Loss:{loss_meter.result().numpy()},Accuracy:{acc_meter.result().numpy()}')
loss_meter.reset_states()
acc_meter.reset_states()
# 清空准确率和loss值,下一轮重新统计
for _, (x,y) in enumerate(test_db):
# 在测试集上计算准确率
out = model(x)
pred = tf.argmax(out,axis=1)
acc_meter.update_state(y,pred)
print(acc_meter.result())
# 以下是不用acc_meter类的实现方法
accuracy = []
for _, (x,y) in enumerate(test_db):
out = res(x)
pred = tf.cast(tf.argmax(out, axis=1),dtype=tf.int32)
acc = tf.reduce_mean((tf.cast(tf.equal(y, pred),dtype=tf.float32)))
accuracy.append(acc)
accuracy = tf.reduce_mean(accuracy)
print(accuracy.numpy())
官方手册上解释这两个类的result方法:
Result computation is an idempotent operation that simply calculates the metric value using the state variables.
从描述上看有点像是拿模型的变量计算时不影响反向传播,类似自动实现pytorch的detach方法,但实际上tf2的模型训练过程都包含在with tf.GradientTape()下,不需要额外考虑这些变量的影响,教程里也没有强调这一点
在不使用Cell构建网络时,即使有dropout,experimental_run_tf_function=False也不是必须的
在官方API里查到一条:Calling tf.config.experimental_run_functions_eagerly(True) will make all invocations of tf.function run eagerly instead of running as a traced graph function.
似乎是使用cell模块时,只能用traced graph计算,莫非因为用cell时涉及到手动跟踪隐藏层状态?无妨,记住结论就好
————————————————
版权声明:本文为CSDN博主「quantum00549」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/ziyi9663/article/details/107041158