caffe的mnist训练起来挺方便,但是怎么进行预测呢?
参考:
http://blog.csdn.net/l691899397/article/details/52233454
http://www.jianshu.com/p/9644f7ec0a03
http://www.jianshu.com/p/9e30328a0a71
理论可以参考一下第一个博客,也可以看看论文。
我认为进行预测两个关键的步骤是:1.加载训练好的caffemodel和模型描述文件deploy 2.是把要预测的图片正确的导入,这需要理解caffe的Blob。下面进入正题:
一.建立模型描述文件
读了caffe的整体框架(本新只看了整体框架,对于源代码看的不深入)了解caffe极度模块化,便于阅读和理解,对于整个神经网络的编写都快接近可视化了。我的代码在我的github上下载
https://github.com/zefan7564/caffe
下面是我进行预测的网络层。
name: "LeNet"
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 1 dim: 28 dim: 28 } }
}
layer {
name: "conv1"
type: "Convolution"
bottom: "data"
top: "conv1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 20
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool1"
type: "Pooling"
bottom: "conv1"
top: "pool1"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "conv2"
type: "Convolution"
bottom: "pool1"
top: "conv2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
convolution_param {
num_output: 50
kernel_size: 5
stride: 1
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "pool2"
type: "Pooling"
bottom: "conv2"
top: "pool2"
pooling_param {
pool: MAX
kernel_size: 2
stride: 2
}
}
layer {
name: "ip1"
type: "InnerProduct"
bottom: "pool2"
top: "ip1"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 500
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "relu1"
type: "ReLU"
bottom: "ip1"
top: "ip1"
}
layer {
name: "ip2"
type: "InnerProduct"
bottom: "ip1"
top: "ip2"
param {
lr_mult: 1
}
param {
lr_mult: 2
}
inner_product_param {
num_output: 10
weight_filler {
type: "xavier"
}
bias_filler {
type: "constant"
}
}
}
layer {
name: "prob"
type: "Softmax"
bottom: "ip2"
top: "prob"
}
值得注意的是
layer {
name: "data"
type: "Input"
top: "data"
input_param { shape: { dim: 1 dim: 1 dim: 28 dim: 28 } }
}
因为caffe的数据格式是四维数组,[num,channel,weight,high],因此这样设计数据层。接着是最后一个层prob,与训练时不同。它只需要前向传播,输出数据。因此不需要loss层和accuracy层。
然后是图像预处理减去均值,这样在训练和预测都能提高速度。
然后是训练好的模型,这里就使用lenet_iter_10000.caffemodel
二.使用训练好的模型进行预测,这里我直接贴出两个代码
#coding=utf-8
#caffe and opencv test mnist
#test by yuzefan
import os
import caffe
import numpy as np
import cv2
import sys
caffe_root='/home/ubuntu/caffe-master/'
sys.path.insert(0,caffe_root+'python') #add this python path
os.chdir(caffe_root)
MODEL_FILE=caffe_root+'mytest/my-mnist/classificat_net.prototxt'
WEIGTHS=caffe_root+'mytest/my-mnist/lenet_iter_10000.caffemodel'
net=caffe.Classifier(MODEL_FILE,WEIGTHS)
caffe.set_mode_gpu()
IMAGE_PATH=caffe_root+'mytest/smy-mnist/'
font = cv2.FONT_HERSHEY_SIMPLEX #normal size sans-serif font
for i in range(0,9):
# astype() is a method provided by numpy to convert numpy dtype.
input_image=cv2.imread(IMAGE_PATH+'{}.png'.format(i),cv2.IMREAD_GRAYSCALE).astype(np.float32)
#resize Image to improve vision effect.
resized=cv2.resize(input_image,(280,280),None,0,0,cv2.INTER_AREA)
input_image = input_image[:, :, np.newaxis] # input_image.shape is (28, 28, 1), with dtype float32
prediction = net.predict([input_image], oversample=False)
cv2.putText(resized, str(prediction[0].argmax()), (200, 280), font, 4, (255,), 2)
cv2.imshow("Prediction", resized)
print 'predicted class:', prediction[0].argmax()
keycode = cv2.waitKey(0) & 0xFF
if keycode == 27:
break
#coding=utf-8
#caffe and opencv test mnist
#test by yuzefan
import os
import caffe
import numpy as np
import cv2
import sys
caffe_root='/home/ubuntu/caffe-master/'
sys.path.insert(0,caffe_root+'python') #add this python path
os.chdir(caffe_root)
MODEL_FILE=caffe_root+'mytest/my-mnist/classificat_net.prototxt'
WEIGTHS=caffe_root+'mytest/my-mnist/lenet_iter_10000.caffemodel'
MEAN_FILE=caffe_root+'mytest/my-mnist/mean.binaryproto'
print('Params loaded!')
cv2.waitKey(1000)
caffe.set_mode_gpu()
net=caffe.Net(MODEL_FILE,WEIGTHS,caffe.TEST)
mean_blob=caffe.proto.caffe_pb2.BlobProto()
mean_blob.ParseFromString(open(MEAN_FILE, 'rb').read())
mean_npy = caffe.io.blobproto_to_array(mean_blob)
a=mean_npy[0, :, 0, 0]
print(net.blobs['data'].data.shape)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
#transformer.set_transpose('data', (2, 0, 1))
##transformer.set_raw_scale('data', 255)
#transformer.set_channel_swap('data', (2, 1, 0))
for i in range(0,10):
IMAGE_PATH=caffe_root+'mytest/my-mnist/{}.png'.format(i)
#im = caffe.io.load_image(IMAGE_PATH)
input_image=cv2.imread(IMAGE_PATH,cv2.IMREAD_GRAYSCALE).astype(np.float32)
resized=cv2.resize(input_image,(280,280),None,0,0,cv2.INTER_AREA)
net.blobs['data'].data[...] = transformer.preprocess('data', input_image)
predict = net.forward()
names = []
with open('/home/ubuntu/caffe-master/mytest/my-mnist/words.txt', 'r+') as f:
for l in f.readlines():
names.append(l.split(' ')[1].strip())
print(names)
prob = net.blobs['prob'].data[0].flatten()
print('prob: ', prob)
print('class: ', names[np.argmax(prob)])
cv2.imshow("Prediction", resized)
keycode = cv2.waitKey(0) & 0xFF
if keycode == 27:
break
下面以代码一和代码二简称。
代码一关键的地方是
net=caffe.Classifier(MODEL_FILE,WEIGTHS)
prediction = net.predict([input_image], oversample=False)
可以看出是利用opencv读入灰度图加入到net中进行预测
代码二关键的地方是
net=caffe.Net(MODEL_FILE,WEIGTHS,caffe.TEST)
transformer = caffe.io.Transformer({'data': net.blobs['data'].data.shape})
#transformer.set_transpose('data', (2, 0, 1))
##transformer.set_raw_scale('data', 255)
#transformer.set_channel_swap('data', (2, 1, 0))
#img = caffe.io.load_image(IMAGE_PATH)
input_image=cv2.imread(IMAGE_PATH,cv2.IMREAD_GRAYSCALE).astype(np.float32)
net.blobs['data'].data[...] = transformer.preprocess('data', input_image)
predict = net.forward()
可以看到我注释掉许多代码,但是又很关键因此我没有删掉。对于transformer,是对data进行变换。但是利用caffe.io.load_image(IMAGE_PATH)读入图片,每次都会读到[28,28,3]的矩阵,也就是每次都会读三个通道,于是出现如下错误
could not broadcast input array from shape (28,28,3) into shape (1,1,28,28)
无奈之下又使用opencv,读到的是(W,H,C)。。。望高手替我解答一下。
但是对于彩色图像应该选用caffe.io.load_image(IMAGE_PATH)
下面是运行效果
![Upload Screenshot from 2017-08-22 15:50:07.png failed. Please try again.]
对于疑问已经解决:
cv2.imread()接口读图像,读进来直接是gray 格式and 0~255,所以不需要再缩放到[0,255]和通道变换[2,1,0]不需要
transformer.set_raw_scale('data',255)和transformer.set_channel_swap('data',(2,1,0)
是caffe.io.load_image()读进来是RGB格式和0~1(float)所以在进行特征提取之前要在transformer中设置transformer.set_raw_scale('data',255)(缩放至0~255)
以及transformer.set_channel_swap('data',(2,1,0)(将RGB变换到BGR)
完毕!