Caffe学习笔记3:加强LeNet,实现'0-9'and'A-Z'识别

经过之前的学习能够做到识别'0-9'的手写体数字,但是为了实现更多的分类,应该怎么办呢?
首先简单介绍一下LeNet,网络结构包含2个卷积层,2个max池化层,2个全链接层和1个relu层与一个softmax层。

Paste_Image.png

输入数据体的尺寸为
W_1\times H_1\times D_1

4个超参数:滤波器的数量
K

滤波器的空间尺寸
F

步长
S

零填充数量
P

输出数据体的尺寸为
W_2\times H_2\times D_2

,其中:


W_2=(W_1-F+2P)/S+1
H_2=(H_1-F+2P)/S+1
(宽度和高度的计算方法相同)
D_2=K

想实现对‘0-9’和‘a-z’分类,我看了整个网络。数据层不需要变化,我只改变了最后一个全连接层。很幸运的是成功实现了目标,但是训练完正确率只有0.6.可能是我的训练集太小,也可能是卷积层要增加特征的采集。
以下是代码,和之前差不多所以我只贴出改的地方

layer {
  name: "ip2"
  type: "InnerProduct"
  bottom: "ip1"
  top: "ip2"
  param {
    lr_mult: 1
  }
  param {
    lr_mult: 2
  }
  inner_product_param {
    num_output: 36
    weight_filler {
      type: "xavier"
    }
    bias_filler {
      type: "constant"
    }
  }
}
# The train/test net protocol buffer definition
net: "mytest/chinese/lenet_plus_train_test.prototxt"
# test_iter specifies how many forward passes the test should carry out.
# In the case of MNIST, we have test batch size 100 and 100 test iterations,
# covering the full 370 testing images.
test_iter: 10
# Carry out testing every 500 training iterations.
test_interval: 300
# The base learning rate, momentum and the weight decay of the network.
base_lr: 0.01
momentum: 0.9
weight_decay: 0.0005
# The learning rate policy
lr_policy: "inv"
gamma: 0.0001
power: 0.75
# Display every 100 iterations
display: 20
# The maximum number of iterations
max_iter: 10000
# snapshot intermediate results
snapshot: 5000
snapshot_prefix: "mytest/chinese/lenet"
# solver mode: CPU or GPU
solver_mode: GPU

创建predict_plus.py

#coding=utf-8
#by yuzefan
import os
import caffe
import numpy as np
import cv2
import sys
from os.path import join, isdir
caffe_root='/home/ubuntu/caffe-master/'
sys.path.insert(0,caffe_root+'python')
os.chdir(caffe_root)#change current dir
DEPLOY_FILE=caffe_root+'mytest/chinese/classificat_net.prototxt'
MODEL_FILE=caffe_root+'mytest/chinese/lenet_iter_10000.caffemodel'
net=caffe.Classifier(DEPLOY_FILE,MODEL_FILE)
caffe.set_mode_gpu()
IMAGE_PATH=caffe_root+'mytest/chinese/data/train'
font=cv2.FONT_HERSHEY_SIMPLEX #normal size sans-serif font
sd=[d for d in os.listdir(IMAGE_PATH)]
sd.sort()
cv2.waitKey(1000)
print (sd,'add path done')
cv2.waitKey(2000)
class_id=001
os.chdir(IMAGE_PATH)
names=[]
with open('/home/ubuntu/caffe-master/mytest/chinese/words.txt', 'r+') as f:
    for l in f.readlines():
        names.append(l.split(' ')[1].strip())
for d in sd:
    fs=[join(d,x) for x in os.listdir(d)]
    for num in fs:
        img=join(IMAGE_PATH,num)
        input_image=cv2.imread(img,cv2.IMREAD_GRAYSCALE).astype(np.float32)
        resized=cv2.resize(input_image,(280,280),None,0,0,cv2.INTER_AREA)
        input_image = input_image[:, :, np.newaxis]
        prediction = net.predict([input_image], oversample=False)
        cv2.putText(resized, str(names[prediction[0].argmax()]), (0, 280), font, 2, (0,), 2)
        cv2.imshow("Prediction", resized)
        print 'predicted class:', names[prediction[0].argmax()]
        keycode = cv2.waitKey(50) & 0xFF
        if keycode == 27:
            break

以前的make_list.py,训练脚本只需要更改路径。
运行效果:

Caffe学习笔记3:加强LeNet,实现'0-9'and'A-Z'识别_第1张图片
Paste_Image.png

你可能感兴趣的:(Caffe学习笔记3:加强LeNet,实现'0-9'and'A-Z'识别)