loss = NaN问题:
(1)学习率太高
(2)lmdb生成有问题,未将shuffle设置为true,导致NaN问题,虽然也可以通过降低学习率改善,但是变得难以训练了,原因是生成的batch无法很好的估计整个数据集。
create_list.sh
#!/usr/bin/env sh
DATA="data/mnist.28x28"
cd $DATA
rm -f train.txt
rm -f test.txt
find train/0 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 0/" >> train.txt
find train/1 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 1/" >> train.txt
find train/2 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 2/" >> train.txt
find train/3 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 3/" >> train.txt
find train/4 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 4/" >> train.txt
find train/5 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 5/" >> train.txt
find train/6 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 6/" >> train.txt
find train/7 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 7/" >> train.txt
find train/8 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 8/" >> train.txt
find train/9 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 9/" >> train.txt
find test/0 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 0/" >> test.txt
find test/1 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 1/" >> test.txt
find test/2 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 2/" >> test.txt
find test/3 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 3/" >> test.txt
find test/4 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 4/" >> test.txt
find test/5 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 5/" >> test.txt
find test/6 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 6/" >> test.txt
find test/7 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 7/" >> test.txt
find test/8 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 8/" >> test.txt
find test/9 -name "*" | grep -i -E ".bmp|.jpg|.png" | sed "s/$/ 9/" >> test.txt
create_lmdb.sh
#!/usr/bin/env sh
# This script converts the mnist data into lmdb/leveldb format,
# depending on the value assigned to $BACKEND.
set -e
EXAMPLE=examples/mnist.28x28
DATA=data/mnist.28x28
BUILD=build/tools
BACKEND="lmdb"
echo "Creating ${BACKEND}..."
rm -rf $EXAMPLE/mnist_train_${BACKEND}
rm -rf $EXAMPLE/mnist_test_${BACKEND}
$BUILD/convert_imageset -backend=$BACKEND -gray=true -shuffle=true $DATA/ $DATA/train.txt $EXAMPLE/mnist_train_${BACKEND}
$BUILD/convert_imageset -backend=$BACKEND -gray=true -shuffle=true $DATA/ $DATA/test.txt $EXAMPLE/mnist_test_${BACKEND}
echo "Done."
基础的学习率和动量过大会导致loss很大,甚至等于NaN
net:lenet_train_test.prototxt
test_iter:corvered_test_images_num / batch_size
parameters “test_iterations” and “batch size” in test depend on number of images in test database.
test_interval:训练算法每迭代test_interval次计算一次测试结果(间隔内测试数据的精度和损失)
base_lr:基础的学习率
momentum:动量
momentum2:优化算法的第二个参数,adam的第二参数
lr = base_lr * decay_factor
V(t+1) = momentum * V(t) - lr * g
lr_policy:学习率衰减策略
解释来自caffe.proto
The learning rate decay policy. The currently implemented learning rate
policies are as follows:
fixed: always return base_lr.
- step: return base_lr * gamma ^ (floor(iter / step))
- exp: return base_lr * gamma ^ iter
inv: return base_lr * (1 + gamma * iter) ^ (- power)
- multistep: similar to step but it allows non uniform steps defined by
epvalue
- poly: the effective learning rate follows a polynomial decay, to be
zero by the max_iter. return base_lr (1 - iter/max_iter) ^ (power)
- sigmoid: the effective learning rate follows a sigmod decay
return base_lr ( 1/(1 + exp(-gamma * (iter - stepsize))))
here base_lr, max_iter, gamma, step, stepvalue and power are defined
in the solver parameter protocol buffer, and iter is the current iteration.
weight_decay: 0.0005,
regularization types supported: L1 and L2
The weight_decay parameter govern the regularization term of the neural net.
During training a regularization term is added to the network's loss to compute the backprop gradient. Theweight_decay value determines how dominant this regularization term will be in the gradient computation.
As a rule of thumb, the more training examples you have, the weaker this term should be. The more parameters you have (i.e., deeper net, larger filters, large InnerProduct layers etc.) the higher this term should be.
Caffe also allows you to choose between L2 regularization (default) andL1 regularization, by setting
regularization_type: "L1"
While learning rate may (and usually does) change during training, the regularization weight is fixed throughout.
---------------------
作者:susandebug
来源:CSDN
原文:https://blog.csdn.net/u010025211/article/details/50055815
版权声明:本文为博主原创文章,转载请附上博文链接!
display:间隔多少次迭代显示一次训练结果
max_iter:最大迭代次数
Parameters “maximum_iterations” and “batch size” in train depend on number of epochs you would like to train your net.
一个epoch的iter_num,即iter_num_per_epoch = training_images_num / batch_size
epochs_num = max_iter / iter_num_per_epoch
snapshot:设置快照的迭代次数间隔
snapshot_prefix:设置快照的前缀
type:选择优化算法
solver_mode:CPU or GPU
include phase train/test
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt --snapshot=examples/mnist/lenet_iter_1000.solverstate
./build/tools/caffe train --solver=examples/mnist/lenet_solver.prototxt --weights=examples/mnist/lenet_iter_100000.caffemodel
./build/tools/caffe test -model examples/mnist/lenet_train_test.prototxt -weights examples/mnist/lenet_iter_10000.caffemodel
或者
caffe test --model=examples/mnist/lenet_train_test.prototxt --weights=examples/mnist/lenet_iter_10000.caffemodel
1、使用netscope进行可视化
http://ethereon.github.io/netscope/quickstart.html
2、使用caffe自带工具draw_net.py进行可视化