最近在学习meta-learning, 因为里面涉及到一些需要手动更新参数的操作,尝试了通过先求loss, 对trainable_variables进行梯度计算,然后再去作更新的操作,但是中途发现了很多的问题。现在记录一下:
在这个实验中,主要目的是为了拟合一个二次函数,第一步是先生成相应的training data:
from __future__ import absolute_import, division, print_function, unicode_literals
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout
import tensorflow.keras as keras
#import keras
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import random
from sklearn.model_selection import train_test_split
tf.keras.backend.set_floatx('float64')
from sklearn.datasets import load_iris
x = np.random.uniform(-10, 10, 1000) # generate random x
print(len(x))
f = lambda x: 2*x*x - 7*x + 11
y = f(x)
t = np.linspace(-10, 10, 50)
yy = f(t)
x_train, x_test, y_train, y_test = train_test_split(
x, y, test_size=0.25, random_state=42)
plt.figure()
plt.plot(t,yy)
plt.show()
print(len(x_train))
print(x_train.shape)
函数图形为:
刚开始是用下面的代码来进行处理:
from tensorflow.keras.layers import Dropout
import tensorflow.keras.backend as keras_backend
from keras import backend as k
class MyModel(keras.Model):
##有个要注意的点,这里要加上tf.keras.Model,不然后面的训练会有问题哦, 会提示model没有trainable_varibles
def __init__(self):
super().__init__()
self.layer1 = Dense(64, input_shape = (1, )) #
self.layer2 = Dense(32)
self.layer3 = Dense(1)
def call(self, x):
y = keras.activations.relu(self.layer1(x))
y = keras.activations.relu(self.layer2(y))
y = self.layer3(y)
return y
def loss_fun(y_pred, y):
return keras_backend.mean(keras.losses.mean_squared_error(y, y_pred))
def compute_loss(model, x, y, loss_fun = loss_fun):
logits = model.call(x)
mse = loss_fun(y, logits)
return mse, logits
def compute_gradients(model, x, y, loss_fun = loss_fun):
with tf.GradientTape() as tape:
loss, _ = compute_loss(model, x, y, loss_fun)
return tape.gradient(loss, model.trainable_variables), loss
#loss, _ = compute_loss(model, x, y, loss_fun)
#return k.gradients(loss, model.trainable_variables), loss
def apply_gradients(optimizer, gradients, variables):
optimizer.apply_gradients(zip(gradients, variables))
def train_batch(x, y, model, optimizer):
'''
进行一次gradient的更新,其实keras的里面本身有一个train_on_batch函数,可以单次进行梯度更新
'''
gradients, loss = compute_gradients(model, x, y)
apply_gradients(optimizer, gradients, model.trainable_variables)
return loss
model2 = MyModel()
print(model2.trainable_variables)
model2.call(x_train)
print(model2.trainable_variables)
用来训练的代码是这样的:
epochs = 1000
lr = 0.01
optimizer = keras.optimizers.Adam(learning_rate = 0.01) #据查这个0.01是keras默认的learning rate
loss = []
model = MyModel()
model.compile()
model.build(x_train.shape)
model.summary()
for i in range(epochs):
#y_pred = model(x_train)
#gradient = k.gradients(l, model.trainable_variables) # 这个方法好像求不出来gradient,全部都是空
with tf.GradientTape() as tape:
l = tf.reduce_mean(tf.losses.MSE(y_train, model(x_train))) # MSE
#print(f'loss = {l}')
gradient = tape.gradient(l, model.trainable_variables) #求出来的就是一个List object
#print(f'model.trianble_weights = {model.trainable_weights}')
#print(f'gradients = {gradient}')
#new_weights = model.get_weights() + 0.001 * np.array(gradient)
#model.set_weights(new_weights)
optimizer.apply_gradients(zip(gradient, model.trainable_variables))
if i % 10 == 0:
loss.append(l)
print(f'{i}th loss is: {l}')
这中间碰到的问题如下:
(1)首先发现的是model.trainable_variables为空:这个的原因是因为model没有build, 如果实现了call方法,直接用model(x)是可以自动进行build的,这种情况在抛出的异常里面可以看到。
(2)第二个是发现求loss的函数不行,如果你用keras的loss,那求出来的loss虽然是有结果,你甚至能直接print出来结果,但是呢,你这个直接的去用keras的gradient,还是采用tensorflow的tf.GradientTape()都不行,求出来的都会是None
(3)第三个是发现,当你建model的时候,input_shape这个一定要弄对,不然你train出来的model都可能有问题,比方说你的x.shape = (750, 1), input_dim = 1 这种情况,你用model.summary()去看的时候, model才是对的。如果你x.shape = (1, 750),而你此时的input_shape = (1, ),那这种情况下model的结构都是有问题的,不会是你想的那样。