Keras手动更新参数BUG记录

最近在学习meta-learning, 因为里面涉及到一些需要手动更新参数的操作,尝试了通过先求loss, 对trainable_variables进行梯度计算,然后再去作更新的操作,但是中途发现了很多的问题。现在记录一下:

 

在这个实验中,主要目的是为了拟合一个二次函数,第一步是先生成相应的training data:

from __future__ import absolute_import, division, print_function, unicode_literals

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation
from tensorflow.keras.callbacks import EarlyStopping
from tensorflow.keras.layers import Dropout
import tensorflow.keras as keras
#import keras
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import random
from sklearn.model_selection import train_test_split

tf.keras.backend.set_floatx('float64')
from sklearn.datasets import load_iris

x = np.random.uniform(-10, 10, 1000)  # generate random x
print(len(x))
f = lambda x: 2*x*x - 7*x + 11
y = f(x)

t = np.linspace(-10, 10, 50)
yy = f(t)

x_train, x_test, y_train, y_test = train_test_split(    
    x, y, test_size=0.25, random_state=42)

plt.figure()
plt.plot(t,yy)
plt.show()

print(len(x_train))
print(x_train.shape)

函数图形为:

Keras手动更新参数BUG记录_第1张图片

刚开始是用下面的代码来进行处理:

from tensorflow.keras.layers import Dropout
import tensorflow.keras.backend as keras_backend
from keras import backend as k

class MyModel(keras.Model):
    ##有个要注意的点,这里要加上tf.keras.Model,不然后面的训练会有问题哦, 会提示model没有trainable_varibles
    def __init__(self):
        super().__init__()
        self.layer1 = Dense(64, input_shape = (1, )) # 
        self.layer2 = Dense(32)
        self.layer3 = Dense(1)
        
    def call(self, x):
        y = keras.activations.relu(self.layer1(x))
        y = keras.activations.relu(self.layer2(y))
        y = self.layer3(y)
        return y
    
def loss_fun(y_pred, y):
    return keras_backend.mean(keras.losses.mean_squared_error(y, y_pred))

def compute_loss(model, x, y, loss_fun = loss_fun):
    logits = model.call(x)
    mse = loss_fun(y, logits)
    return mse, logits

def compute_gradients(model, x, y, loss_fun = loss_fun):
    with tf.GradientTape() as tape:
        loss, _ = compute_loss(model, x, y, loss_fun)
    return tape.gradient(loss, model.trainable_variables), loss
    #loss, _ = compute_loss(model, x, y, loss_fun)
    #return k.gradients(loss, model.trainable_variables), loss

def apply_gradients(optimizer, gradients, variables):
    optimizer.apply_gradients(zip(gradients, variables))
    
def train_batch(x, y, model, optimizer):
    '''
    进行一次gradient的更新,其实keras的里面本身有一个train_on_batch函数,可以单次进行梯度更新
    '''
    gradients, loss = compute_gradients(model, x, y)
    apply_gradients(optimizer, gradients, model.trainable_variables)
    return loss

model2 = MyModel()
print(model2.trainable_variables)
model2.call(x_train)
print(model2.trainable_variables)

用来训练的代码是这样的:

epochs = 1000
lr = 0.01
optimizer = keras.optimizers.Adam(learning_rate = 0.01) #据查这个0.01是keras默认的learning rate
loss = []
model = MyModel()

model.compile()
model.build(x_train.shape)
model.summary()


for i in range(epochs):
    #y_pred = model(x_train)
    #gradient = k.gradients(l, model.trainable_variables) # 这个方法好像求不出来gradient,全部都是空
    with tf.GradientTape() as tape:
        l = tf.reduce_mean(tf.losses.MSE(y_train, model(x_train))) # MSE
        #print(f'loss = {l}')
        gradient = tape.gradient(l, model.trainable_variables)  #求出来的就是一个List object
    #print(f'model.trianble_weights = {model.trainable_weights}')
    #print(f'gradients = {gradient}')
        #new_weights = model.get_weights() + 0.001 * np.array(gradient)
        #model.set_weights(new_weights)
    optimizer.apply_gradients(zip(gradient, model.trainable_variables))
    if i % 10 == 0:
        loss.append(l)
        print(f'{i}th loss is: {l}')

这中间碰到的问题如下:

(1)首先发现的是model.trainable_variables为空:这个的原因是因为model没有build, 如果实现了call方法,直接用model(x)是可以自动进行build的,这种情况在抛出的异常里面可以看到。

(2)第二个是发现求loss的函数不行,如果你用keras的loss,那求出来的loss虽然是有结果,你甚至能直接print出来结果,但是呢,你这个直接的去用keras的gradient,还是采用tensorflow的tf.GradientTape()都不行,求出来的都会是None

(3)第三个是发现,当你建model的时候,input_shape这个一定要弄对,不然你train出来的model都可能有问题,比方说你的x.shape = (750, 1), input_dim = 1 这种情况,你用model.summary()去看的时候, model才是对的。如果你x.shape = (1, 750),而你此时的input_shape = (1, ),那这种情况下model的结构都是有问题的,不会是你想的那样。

你可能感兴趣的:(python,计算机视觉,机器学习)