数据集CelebA_CelebFaces_Attributes_Dataset:这个数据集包含20多万张图片,主要用于人脸属性,性别、年龄、眼镜、胡子等等。
我这里主要是训练人脸的性别,先将图片整理分类并归一化,生成TFRecord格式的数据(图片格式(224,224,3)),便于tensorflow训练。
整理的数据集在网盘上:
https://pan.baidu.com/s/1ptteUCu02TzHD1e-JXl6Ig ,提取码:hmpt
里面有两个,一个是正常数据,另一个比较小的数据,方便快速生成和训练,毕竟20万张图片,我这破电脑生成至少半天o(╥﹏╥)o。
附上生成和提取代码。
#你的数据集路径
src_pic_dir = r'H:/DATA/Gender/Gender_tiny'
orig_picture_test = src_pic_dir+'/test/test.txt'
orig_picture_train = src_pic_dir+'/train/train.txt'
# 生成图片的存储位置
record_test = src_pic_dir+'tf_test_224.tfrecord'
record_train = src_pic_dir+'tf_train_224.tfrecord'
# 需要的识别类型
classes = {'0', '1'}
#图片缩放大小
IMG_SIZE = 224
# 制作TFRecords数据
def create_record(record_path,orig_pic):
writer = tf.python_io.TFRecordWriter(record_path)
file_ = open(orig_pic,'r')
file_list = file_.readlines()
#将数据集打乱
random.shuffle(file_list)
for i in range(len(file_list)):
name = file_list[i]
name = name.strip('\n')
spt = name.split(' ')
img_path = spt[0]
index = int(spt[-1])
#print(name+' ',str(index))
img = Image.open(img_path)
img = img.resize((IMG_SIZE, IMG_SIZE)) # 设置需要转换的图片大小
###图片灰度化######################################################################
# img=img.convert("L")
##############################################################################################
img_raw = img.tobytes() # 将图片转化为原生bytes
example = tf.train.Example(
features=tf.train.Features(feature={
"label": tf.train.Feature(int64_list=tf.train.Int64List(value=[index])),
'img_raw': tf.train.Feature(bytes_list=tf.train.BytesList(value=[img_raw]))
}))
writer.write(example.SerializeToString())
writer.close()
#提取TFRecord数据
def read_and_decode(filename,image_size,is_batch=True,batch_size=3):
# 创建文件队列,不限读取的数量
filename_queue = tf.train.string_input_producer([filename])
# create a reader from file queue
reader = tf.TFRecordReader()
# reader从文件队列中读入一个序列化的样本
_, serialized_example = reader.read(filename_queue)
# get feature from serialized example
# 解析符号化的样本
features = tf.parse_single_example(
serialized_example,
features={
'label': tf.FixedLenFeature([], tf.int64),
'img_raw': tf.FixedLenFeature([], tf.string)
})
print(features['img_raw'])
img = tf.decode_raw(features['img_raw'],tf.int8)
print(img)
img = tf.reshape(img,[image_size,image_size,3])
img = tf.cast(img,tf.float32)*(1./255)- 0.5
#img = tf.cast(img,tf.float32)
label = tf.cast(features['label'],tf.int32)
#input_queue = tf.train.slice_input_producer([img,label],num_epochs=num_epochs,shuffle=False)
#print('input_queue:',input_queue)
print('img:',img,',label:',label)
min_after_dequeue = 300
capacity = min_after_dequeue + 3 * batch_size
#capacity = 50000
if is_batch:
print('capacity:',str(capacity),' ,min_after_dequeue:',str(min_after_dequeue))
img, label = tf.train.shuffle_batch([img,label],#input_queue,
batch_size=batch_size,
num_threads=1,
capacity=capacity,
min_after_dequeue=min_after_dequeue)
else:
img,label = tf.train.batch([img,label],batch_size=batch_size,capacity=capacity)
return img, label
if __name__ == '__main__':
create_record(record_train,orig_picture_train)
create_record(record_test,orig_picture_test)
TFRecord的制作和提取网上有很多介绍,这里就不多介绍。
但是这里要重点说明一下,在制作数据的时候,要先打乱数据,我这里用的是
random.shuffle(file_list)
在训练的时候困扰了很久,后来看打印的时候发现每次提取batch_size的数据都是同一类,后来在看
tf.train.shuffle_batch
介绍时,大致了解到,虽然这个函数会打乱,但我怀疑它是在capacity队列中进行打乱,而不是整个数据集打乱,很有可能提取到队列中是顺序放入的,这样才能解释训练中每次提取到的都是同一类的问题。
选择经典网络Alexnet:https://cloud.tencent.com/developer/news/230380
代码:
def alexnet(datas,n_output,keep_prob,training,Is_train):
with tf.name_scope('conv1') as scope:
kernel = tf.Variable(tf.random_normal([11,11,3,64],dtype=tf.float32,stddev=0.01),name='weight1')
conv = tf.nn.conv2d(datas,kernel,[1,4,4,1],padding='SAME')
biases = tf.Variable(tf.constant(0.0,shape=[64],dtype=tf.float32),trainable=True,name='biases1')
conv1 = tf.nn.bias_add(conv,biases)
conv1 = tf.layers.batch_normalization(conv1,training=training)
conv1 = tf.nn.relu(conv1,name=scope)
print_activations(conv1)
tf.summary.histogram('weight1', kernel)
tf.summary.histogram('biases1', biases)
#tf.summary.histogram(scope , conv1)
conv1 = tf.nn.lrn(conv1,bias=2.0,alpha=2e-04,beta=0.75,name='lrn1')
pool1 = tf.nn.max_pool(conv1,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool1')
print_activations(pool1)
tf.summary.histogram('pool1' , pool1)
with tf.name_scope('conv2') as scope:
kernel = tf.Variable(tf.random_normal([5,5,64,128],dtype=tf.float32,stddev=0.01),name='weight2')
conv = tf.nn.conv2d(pool1,kernel,[1,1,1,1],padding='SAME')
biases = tf.Variable(tf.constant(0.0,shape=[128],dtype=tf.float32),trainable=True,name='biases2')
conv2 = tf.nn.bias_add(conv,biases)
conv2 = tf.layers.batch_normalization(conv2,training=training)
conv2 = tf.nn.relu(conv2,name=scope)
print_activations(conv2)
tf.summary.histogram('weight2' , kernel)
tf.summary.histogram('biases2' , biases)
#tf.summary.histogram(scope , conv2)
#lrn2 = tf.nn.lrn(conv2,bias=2.0,alpha=2e-05,beta=0.75,name='lrn2')
pool2 = tf.nn.max_pool(conv2,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool2')
print_activations(pool2)
tf.summary.histogram('pool2' , pool2)
with tf.name_scope('conv3') as scope:
kernel = tf.Variable(tf.random_normal([3,3,128,256],dtype=tf.float32,stddev=0.01),name='weight3')
conv = tf.nn.conv2d(pool2,kernel,[1,1,1,1],padding='SAME')
biases = tf.Variable(tf.constant(0.0,shape=[256],dtype=tf.float32),trainable=True,name='biases3')
conv3 = tf.nn.bias_add(conv,biases)
#conv3 = tf.layers.batch_normalization(conv3,training=training)
conv3 = tf.nn.relu(conv3,name=scope)
tf.summary.histogram('weight3' + '/activations', kernel)
tf.summary.histogram('biases3' + '/activations', biases)
tf.summary.histogram(scope + '/activations', conv3)
print_activations(conv3)
with tf.name_scope('conv4') as scope:
kernel = tf.Variable(tf.random_normal([3,3,256,384],dtype=tf.float32,stddev=0.01),name='weight4')
conv = tf.nn.conv2d(conv3,kernel,[1,1,1,1],padding='SAME')
biases = tf.Variable(tf.constant(0.0,shape=[384],dtype=tf.float32),trainable=True,name='biases4')
conv4 = tf.nn.bias_add(conv,biases)
#conv4 = tf.layers.batch_normalization(conv4,training=training)
conv4 = tf.nn.relu(conv4,name=scope)
print_activations(conv4)
tf.summary.histogram('weight4' + '/activations', kernel)
tf.summary.histogram('biases4' + '/activations', biases)
tf.summary.histogram(scope + '/activations', conv4)
with tf.name_scope('conv5') as scope:
kernel = tf.Variable(tf.random_normal([3,3,384,256],dtype=tf.float32,stddev=0.01),name='weight5')
conv = tf.nn.conv2d(conv4,kernel,[1,1,1,1],padding='SAME')
biases = tf.Variable(tf.constant(0.0,shape=[256],dtype=tf.float32),trainable=True,name='biases5')
conv5 = tf.nn.bias_add(conv,biases)
#conv5 = tf.layers.batch_normalization(conv5,training=training)
conv5 = tf.nn.relu(conv5,name=scope)
print_activations(conv5)
tf.summary.histogram('weight5' , kernel)
tf.summary.histogram('biases5' , biases)
#tf.summary.histogram(scope , conv5)
pool5 = tf.nn.max_pool(conv5,ksize=[1,3,3,1],strides=[1,2,2,1],padding='SAME',name='pool5')
print_activations(pool5)
tf.summary.histogram('pool5' , pool5)
with tf.name_scope('fc6') as scope:
kernel = tf.Variable(tf.random_normal([7*7*256,4096],dtype=tf.float32,stddev=0.01),name='weight6')
flat = tf.reshape(pool5,[-1,7*7*256])
biases = tf.Variable(tf.constant(0.0,shape=[4096],dtype=tf.float32),trainable=True,name='biases6')
fc6 = tf.add(tf.matmul(flat,kernel) , biases)
#fc6 = tf.layers.batch_normalization(fc6,training=training)
fc6 = tf.nn.relu(fc6,name='relu')
if Is_train == 1:
fc6 = tf.nn.dropout(fc6,keep_prob,name=scope)
print_activations(fc6)
tf.summary.histogram('weight6' , kernel)
tf.summary.histogram('biases6' , biases)
tf.summary.histogram(scope , fc6)
with tf.name_scope('fc7') as scope:
kernel = tf.Variable(tf.random_normal([4096,4096],dtype=tf.float32,stddev=0.01),name='weight7')
biases = tf.Variable(tf.constant(0.0,shape=[4096],dtype=tf.float32),trainable=True,name='biases7')
fc7 = tf.add(tf.matmul(fc6,kernel) , biases)
#fc7 = tf.layers.batch_normalization(fc7,training=training)
fc7 = tf.nn.relu(fc7,name='relu')
if Is_train == 1:
fc7 = tf.nn.dropout(fc7,keep_prob,name=scope)
print_activations(fc7)
tf.summary.histogram('weight7' , kernel)
tf.summary.histogram('biases7' , biases)
tf.summary.histogram(scope , fc7)
with tf.name_scope('fc8') as scope:
kernel = tf.Variable(tf.random_normal([4096,n_output],dtype=tf.float32,stddev=0.01),name='weight8')
biases = tf.Variable(tf.constant(0.0,shape=[n_output],dtype=tf.float32),trainable=True,name='biases8')
fc8 = tf.nn.bias_add(tf.matmul(fc7,kernel),biases)
print_activations(fc8)
tf.summary.histogram('weight8' , kernel)
tf.summary.histogram('biases8' , biases)
tf.summary.histogram(scope , fc8)
return fc8
训练代码:
def run_alexnet():
epochs = 10000 #训练epoch
image_size = 224 #图片大小
batch_train = 32 #训练每次提取数量
batch_test = 32 #测试每次提取数据
total_batch = 200 #每epoch训练次数
n_output = 2 #类别数量
dropout_rate = 0.85 #保留参数比例
train_record = r'H:/DATA/Gender/Gender_tiny/tf_train_224.tfrecord'
test_record = r'H:/DATA/Gender/Gender_tiny/tf_test_224.tfrecord'
# 训练时的日志logs文件,没有这个目录要先建一个
train_dir = './tflearn_logs/test_Gender'
#模型保存路径
save_dir = './save_model/test_Gender'
#初始化变量
X = tf.placeholder(tf.float32,shape=[batch_train,image_size,image_size,3])
Y = tf.placeholder(tf.int32,shape=[batch_train])
keep_prob = tf.placeholder(tf.float32)
Is_training = tf.placeholder(tf.int32)
training = tf.placeholder_with_default(False, shape=(), name='training')
#网络
pred = net.alexnet(X,n_output,keep_prob,training,Is_train=Is_training)
print('y_res:',Y,',pred:',pred)
top_k_op = tf.nn.in_top_k(pred, Y, 1)
loss = tf.reduce_mean(tf.nn.sparse_softmax_cross_entropy_with_logits(labels=Y,logits=pred),name='loss')
tf.summary.scalar('loss',loss)
#设置指数学习率
global_step = tf.Variable(0,trainable = False)
learning_rate = tf.train.exponential_decay(0.01, global_step, 500, 0.99, staircase=True)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
#添加训练和测试数据
train_x,train_y = make_record.read_and_decode(train_record,image_size,batch_size=batch_train)
test_x,test_y = make_record.read_and_decode(test_record,image_size,batch_size=batch_test)
# 汇总操作,写入tensorboard
summary_op = tf.summary.merge_all()
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
init = tf.global_variables_initializer()
#保存模型,最多保存max_to_keep个模型
saver = tf.train.Saver(max_to_keep=3)
#是否加载模型再次训练
Retrain = True
# 占用 GPU 的 20% 资源,主要在gpu资源紧张的时候用
#config = tf.ConfigProto()
#config.gpu_options.per_process_gpu_memory_fraction = 0.2
#sess = tf.InteractiveSession(config=config)
with tf.Session() as sess:
sess.run(init)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess,coord)
summary_writer = tf.summary.FileWriter(train_dir, sess.graph)
ckpt = tf.train.get_checkpoint_state(save_dir)
if ckpt and ckpt.model_checkpoint_path and Retrain:
ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
saver.restore(sess, os.path.join(save_dir, ckpt_name))
try:
for epoch in range(epochs):
print('epoch:',str(epoch))
for i in range(total_batch):#tqdm(range(total_batch)):
feed_x,feed_y = sess.run([train_x,train_y])
step_val,train_loss,_,_ = sess.run([global_step,loss,train_step,extra_update_ops],
feed_dict={training: True,Is_training:1,X:feed_x,Y:feed_y,keep_prob:dropout_rate})
if step_val % 20 == 0:#每隔20次,测试一下数据
feed_test_x,feed_test_y = sess.run([test_x,test_y])
train_top_k_op = sess.run(top_k_op,
feed_dict={training:False,Is_training: 0,X:feed_test_x,Y:feed_test_y,keep_prob:dropout_rate})
learning_rate_val = sess.run(learning_rate)
predict = np.sum(train_top_k_op)
train_accuracy = predict/batch_test
print('step:',str(step_val),' ,learning_rate:',str(learning_rate_val),
' ,loss:',str(train_loss),' , predictNum: ',str(predict),' ,acc:',str(train_accuracy))
if step_val % 100 == 0:
# 运行汇总操作, 写入汇总,便于在网页上查看训练过程中参数的变化
summary_str = sess.run(summary_op,feed_dict={training:True,Is_training: 1,X:feed_x,Y:feed_y,keep_prob:dropout_rate})
summary_writer.add_summary(summary_str, step_val)
if ((epoch+1)*total_batch) % 1000 == 0:
saver.save(sess,save_dir + '/model.ckpt',global_step=step_val)
except tf.errors.OutOfRangeError:
print('complete')
finally:
coord.request_stop()
coord.join(threads)
def evaluate():
batch_size = 64
n_output = 2
image_size = 224
save_dir = './save_model/test_gender'
test_record = r'H:/DATA/CelebA_CelebFaces_Attributes_Dataset/norm/Gender_norm/tf_test_224.tfrecord'
X = tf.placeholder(tf.float32,shape=[batch_size,image_size,image_size,3])
Y = tf.placeholder(tf.int32,shape=[batch_size])
keep_prob = tf.placeholder(tf.float32)
training = tf.placeholder_with_default(False, shape=(), name='training')
test_x,test_y = make_record.read_and_decode(test_record,image_size,batch_size=batch_size)
pred = net.alexnet(X,n_output,keep_prob,training=False,Is_train=False)
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(pred,-1),tf.argmax(Y,-1)),tf.float32))
top_k_op = tf.nn.in_top_k(pred, Y, 1)
saver = tf.train.Saver(tf.all_variables())
init = tf.global_variables_initializer()
with tf.Session() as sess:
sess.run(init)
coord = tf.train.Coordinator()
threads = tf.train.start_queue_runners(sess,coord)
ckpt = tf.train.get_checkpoint_state(save_dir)
if ckpt and ckpt.model_checkpoint_path:
ckpt_name = os.path.basename(ckpt.model_checkpoint_path)
saver.restore(sess, os.path.join(save_dir, ckpt_name))
try:
true_count = 0
step = 0
for n in range(200):
if coord.should_stop():
break
feed_x,feed_y = sess.run([test_x,test_y])
test_accuracy, predict = sess.run([accuracy,top_k_op],feed_dict={X:feed_x,Y:feed_y,keep_prob:dropout_rate})
true_count += np.sum(predict)
step += 1
rate = true_count/(step*batch_size)
print('count = ',str(step*batch_size),' , test accuracy = ',str(rate))
except tf.errors.OutOfRangeError:
print('complete')
finally:
coord.request_stop()
coord.join(threads)
1.loss在0.6上下浮动,在训练cifar10(224,224,3)的时候也出现过,最后loss总是在2.3左右,在二分类中相当于loss=0.6,
这里有详细介绍:
https://blog.csdn.net/weixin_34343689/article/details/88111552
但我用了还是没有解决,后来改
(1)我试过将learning_rate设指数递减:
#设置指数学习率
global_step = tf.Variable(0,trainable = False)
learning_rate = tf.train.exponential_decay(0.002, global_step, 500, 0.99, staircase=True)
train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=global_step)
(2)改过初始化参数,weights初始化将tf.truncated_normal改为tf.random_normal,标准差改为0.01。
kernel = tf.Variable(tf.random_normal([11,11,3,64],dtype=tf.float32,stddev=0.01),name='weight1')
这里有介绍两者之间的区别:https://blog.csdn.net/u014687582/article/details/78027061
后来网上查用batch_normalization,可以解决,相关介绍:
https://www.cnblogs.com/guoyaohua/p/8724433.html
改为:
conv1 = tf.nn.bias_add(conv,biases)
conv1 = tf.layers.batch_normalization(conv1,training=training)
conv1 = tf.nn.relu(conv1,name=scope)
一般是放在relu之前,官网有介绍说以后会移除,所以以后要用tf.keras.layers.batch_normalization。
其实挺神奇的,因为之前也改过这些,但是并没有效果,后来东改改西改改,居然就好了,o(╥﹏╥)o,改参数真是个玄学的事情,愿诸君共勉。