1、生成log文件
1.1新建一个.sh文件放在caffe目录下,开始训练
#!/usr/bin/env sh
set -e
TOOLS=./build/tools
$TOOLS/caffe train --solver=./models/VGGNet/VOC0712/SSD_300x300/solver.prototxt
1.2 sudo运行生成log
sudo chmod u+x fan_train.sh
./fan_train.sh >& fan.log &
2、绘制曲线
2.1根据log使用caffe自带tool绘制曲线
(实际使用的已训练1万次的udacity.log)
0: Test accuracy vs. Iters
1: Test accuracy vs. Seconds
2: Test loss vs. Iters
3: Test loss vs. Seconds
4: Train learning rate vs. Iters
5: Train learning rate vs. Seconds
6: Train loss vs. Iters
7: Train loss vs. Seconds
./tools/extra/plot_training_log.py.example 0 learn_curve0.png ./udacity.log./tools/extra/plot_training_log.py.example 2 learn_curve2.png ./udacity.log
./tools/extra/plot_training_log.py.example 3 learn_curve3.png ./udacity.log
./tools/extra/plot_training_log.py.example 4 learn_curve4.png ./udacity.log
./tools/extra/plot_training_log.py.example 5 learn_curve5.png ./udacity.log
./tools/extra/plot_training_log.py.example 6 learn_curve6.png ./udacity.log
./tools/extra/plot_training_log.py.example 7 learn_curve7.png ./udacity.log
运行成功后png图片被保持再caffe目录下,下图分别为迭代1200次和120000次的loss曲线,可以看出随着训练次数迭代观察到梯度下降
2.2原理
caffe中tools/extra文件夹下plot_training_log.py画图步骤实际分为两部:解析和绘制
plot_training_log.py先调用文件parse_log.py解析日志文件,得到解析后的log.train和log.tes文件,具体内容如下:
接着plot_training_log.py用函数load_data()读入上述解析好的数据,规范化数据后用plt.plot()绘制出曲线
参考:
https://www.cnblogs.com/ymjyqsx/p/7059280.html
https://blog.csdn.net/Running_J/article/details/51505715
2.3error&solution
直接执行./tools/extra/plot_training_log.py.example 0 learn_curve0.png ./udacity.log命令时可能报错:
wangyufan@USTC-176:~/caffe$ ./tools/extra/plot_training_log.py.example 2 learn_curve4.png ./udacity.log
10000,0.000601,0.485991
['10000', '0.000601', '0.485991']
Traceback (most recent call last):
File "./tools/extra/plot_training_log.py.example", line 194, in
plot_chart(chart_type, path_to_png, path_to_logs)
File "./tools/extra/plot_training_log.py.example", line 120, in plot_chart
data = load_data(data_file, x, y)
File "./tools/extra/plot_training_log.py.example", line 91, in load_data
data[1].append(float(fields[field_idx1].strip()))
IndexError: list index out of range
直接用会报字符串无法转换成float的错误,这是因为第2部分提到的解析过程得到的数据包含空格,原代码里对每一行split空格后生成的list,不是只含这4个数字的list,而是含有许多空格的list,所以当然float无法转换空格字符。需要做的就是split掉所有的空格,生成一个大小为4只包含4个数字的list。另外,日志文件中两个数字间的间隔的空格数是不一样的,有的是4个,有的是5个,代码需要实现无论多少个空格,都split掉。
修改函数并print出关键数据:
可以看出此时不再报错,print的数据也不含空格
wangyufan@USTC-176:~/caffe$ ./tools/extra/plot_training_log.py.example 0 learn_curve0.png ./udacity.log
10000,0.000601,0.485991
['10000', '0.000601', '0.485991']
ps:绘制训练11万次的数据时还是会出错,推测是因为含有数值5e-05,待验证。