ZynqNet解析(二)运行与调试

背景:ZynqNet能在xilinx的FPGA上实现deep compression

目的:运行zynqNet的代码。

源码地址:https://github.com/dgschwend/zynqnet

目录

1. _TRAINED_MODEL

2. _FIRMWARE

2.1 运行打印结果

3._HLS_CODE

3.1 C simulation

3.2 Synthesis

3.3 搭建系统与生成比特流


项目程序架构,针对我们的项目,我们需要看懂相应的HLS程序和ARM端的程序。

ARM端的程序以_FIRMWARE为准;FPGA端程序以HLS为准。

1. _TRAINED_MODEL

这部分为训练好的caffe模型与预训练的权重。

2. _FIRMWARE

这部分程序针对Zynq 7Z035 ARM processor。make之后是在服务器上运行的,一次迭代需要将近3590ms

make
./test CPU|FPGA indata.bin (-quiet)

2.1 运行打印结果

gpu@gpu-SYS-7048GR-TR:~/datasets/xxr/zynqnet/_FIRMWARE$ ./test CPU indata.bin
 ______                  _   _      _
|___  /                 | \ | |    | |
   / / _   _ _ __   __ _|  \| | ___| |_
  / / | | | | '_ \ / _` | . ` |/ _ \ __|
./ /__| |_| | | | | (_| | |\  |  __/ |_
\_____/\__, |_| |_|\__, \_| \_/\___|\__|
        __/ |         | |
       |___/          |_| (c) 2016 davidgs


CPU: Load Network Configuration
c1    : 256x256 x   3 > 64 , CONV (3x3)/2p + ReLU, IN @mem(       0-  786432B), OUT @mem(  786432B), WEIGHTS @mem(       0-    7168B)
f2/s3 : 128x128 x  64 > 16 , CONV (3x3)/2p + ReLU, IN @mem(  786432- 4980736B), OUT @mem( 4980736B), WEIGHTS @mem(    7168-   44096B)
f2/e1 :  64x64  x  16 > 64 , CONV (1x1)/1  + ReLU, IN @mem( 4980736- 5242880B), OUT @mem( 5242880B), WEIGHTS @mem(   44096-   48448B) (split1)
f2/e3 :  64x64  x  16 > 64 , CONV (3x3)/1p + ReLU, IN @mem( 4980736- 5242880B), OUT @mem( 5243136B), WEIGHTS @mem(   48448-   85568B) (split2)
f3/s1 :  64x64  x 128 > 16 , CONV (1x1)/1  + ReLU, IN @mem( 5242880- 7340032B), OUT @mem( 7340032B), WEIGHTS @mem(   85568-   93824B)
f3/e1 :  64x64  x  16 > 64 , CONV (1x1)/1  + ReLU, IN @mem( 7340032- 7602176B), OUT @mem( 7602176B), WEIGHTS @mem(   93824-   98176B) (split1)
f3/e3 :  64x64  x  16 > 64 , CONV (3x3)/1p + ReLU, IN @mem( 7340032- 7602176B), OUT @mem( 7602432B), WEIGHTS @mem(   98176-  135296B) (split2)
f4/s3 :  64x64  x 128 > 32 , CONV (3x3)/2p + ReLU, IN @mem( 7602176- 9699328B), OUT @mem( 9699328B), WEIGHTS @mem(  135296-  282880B)
f4/e1 :  32x32  x  32 > 128, CONV (1x1)/1  + ReLU, IN @mem( 9699328- 9830400B), OUT @mem( 9830400B), WEIGHTS @mem(  282880-  299776B) (split1)
f4/e3 :  32x32  x  32 > 128, CONV (3x3)/1p + ReLU, IN @mem( 9699328- 9830400B), OUT @mem( 9830912B), WEIGHTS @mem(  299776-  447744B) (split2)
f5/s1 :  32x32  x 256 > 32 , CONV (1x1)/1  + ReLU, IN @mem( 9830400-10878976B), OUT @mem(10878976B), WEIGHTS @mem(  447744-  480640B)
f5/e1 :  32x32  x  32 > 128, CONV (1x1)/1  + ReLU, IN @mem(10878976-11010048B), OUT @mem(11010048B), WEIGHTS @mem(  480640-  497536B) (split1)
f5/e3 :  32x32  x  32 > 128, CONV (3x3)/1p + ReLU, IN @mem(10878976-11010048B), OUT @mem(11010560B), WEIGHTS @mem(  497536-  645504B) (split2)
f6/s3 :  32x32  x 256 > 64 , CONV (3x3)/2p + ReLU, IN @mem(11010048-12058624B), OUT @mem(12058624B), WEIGHTS @mem(  645504- 1235584B)
f6/e1 :  16x16  x  64 > 256, CONV (1x1)/1  + ReLU, IN @mem(12058624-12124160B), OUT @mem(12124160B), WEIGHTS @mem( 1235584- 1302144B) (split1)
f6/e3 :  16x16  x  64 > 256, CONV (3x3)/1p + ReLU, IN @mem(12058624-12124160B), OUT @mem(12125184B), WEIGHTS @mem( 1302144- 1892992B) (split2)
f7/s1 :  16x16  x 512 > 64 , CONV (1x1)/1  + ReLU, IN @mem(12124160-12648448B), OUT @mem(12648448B), WEIGHTS @mem( 1892992- 2024320B)
f7/e1 :  16x16  x  64 > 192, CONV (1x1)/1  + ReLU, IN @mem(12648448-12713984B), OUT @mem(12713984B), WEIGHTS @mem( 2024320- 2074240B) (split1)
f7/e3 :  16x16  x  64 > 192, CONV (3x3)/1p + ReLU, IN @mem(12648448-12713984B), OUT @mem(12714752B), WEIGHTS @mem( 2074240- 2517376B) (split2)
f8/s3 :  16x16  x 384 > 112, CONV (3x3)/2p + ReLU, IN @mem(12713984-13107200B), OUT @mem(13107200B), WEIGHTS @mem( 2517376- 4066112B)
f8/e1 :   8x8   x 112 > 256, CONV (1x1)/1  + ReLU, IN @mem(13107200-13135872B), OUT @mem(13135872B), WEIGHTS @mem( 4066112- 4181824B) (split1)
f8/e3 :   8x8   x 112 > 256, CONV (3x3)/1p + ReLU, IN @mem(13107200-13135872B), OUT @mem(13136896B), WEIGHTS @mem( 4181824- 5215040B) (split2)
f9/s1 :   8x8   x 512 > 112, CONV (1x1)/1  + ReLU, IN @mem(13135872-13266944B), OUT @mem(13266944B), WEIGHTS @mem( 5215040- 5444864B)
f9/e1 :   8x8   x 112 > 368, CONV (1x1)/1  + ReLU, IN @mem(13266944-13295616B), OUT @mem(13295616B), WEIGHTS @mem( 5444864- 5611200B) (split1)
f9/e3 :   8x8   x 112 > 368, CONV (3x3)/1p + ReLU, IN @mem(13266944-13295616B), OUT @mem(13297088B), WEIGHTS @mem( 5611200- 7096448B) (split2)
c10/p1:   8x8   x 736 > 512, CONV (1x1)/1        , IN @mem(13295616-13484032B), OUT @mem(13484032B), WEIGHTS @mem( 7096448- 8605824B) (split1) GLOBAL POOL
c10/p2:   8x8   x 736 > 512, CONV (1x1)/1        , IN @mem(13295616-13484032B), OUT @mem(13486080B), WEIGHTS @mem( 8605824-10115200B) (split2) GLOBAL POOL

CPU: FPGA DRAM Memory Allocation:
     Bytes allocated: 0B (config) + 9878KB (weights) + 13296KB (data)
     region: 140609957294096 ▒?140609981024400
CPU: Copy Weights: 9878KB (weights)
CPU: Load Input Data from file indata.bin (768KB)
CPU: Copy Input Image (768KB)
 ## Iteration 0000 ##
CPU: Offload CONV Layer c1    : 256x256 x   3 > 64 , CONV (3x3)/2p + ReLU, IN @mem(       0-  786432B), OUT @mem(  786432B), WEIGHTS @mem(       0-    7168B)
FPGA: Computing .........................................................................................
.........................................................................................
.............................................................................. done.
run time: 118ms
CPU: Offload CONV Layer f2/s3 : 128x128 x  64 > 16 , CONV (3x3)/2p + ReLU, IN @mem(  786432- 4980736B), OUT @mem( 4980736B), WEIGHTS @mem(    7168-   44096B)
FPGA: Computing ................................................................................................................................ done.
run time: 146ms
CPU: Offload CONV Layer f2/e1 :  64x64  x  16 > 64 , CONV (1x1)/1  + ReLU, IN @mem( 4980736- 5242880B), OUT @mem( 5242880B), WEIGHTS @mem(   44096-   48448B) (split1)
FPGA: Computing ................................................................ done.
run time: 95ms
CPU: Offload CONV Layer f2/e3 :  64x64  x  16 > 64 , CONV (3x3)/1p + ReLU, IN @mem( 4980736- 5242880B), OUT @mem( 5243136B), WEIGHTS @mem(   48448-   85568B) (split2)
FPGA: Computing ................................................................ done.
run time: 102ms
CPU: Offload CONV Layer f3/s1 :  64x64  x 128 > 16 , CONV (1x1)/1  + ReLU, IN @mem( 5242880- 7340032B), OUT @mem( 7340032B), WEIGHTS @mem(   85568-   93824B)
FPGA: Computing ................................................................ done.
run time: 247ms
CPU: Offload CONV Layer f3/e1 :  64x64  x  16 > 64 , CONV (1x1)/1  + ReLU, IN @mem( 7340032- 7602176B), OUT @mem( 7602176B), WEIGHTS @mem(   93824-   98176B) (split1)
FPGA: Computing ................................................................ done.
run time: 113ms
CPU: Offload CONV Layer f3/e3 :  64x64  x  16 > 64 , CONV (3x3)/1p + ReLU, IN @mem( 7340032- 7602176B), OUT @mem( 7602432B), WEIGHTS @mem(   98176-  135296B) (split2)
FPGA: Computing ................................................................ done.
run time: 102ms
CPU: Offload CONV Layer f4/s3 :  64x64  x 128 > 32 , CONV (3x3)/2p + ReLU, IN @mem( 7602176- 9699328B), OUT @mem( 9699328B), WEIGHTS @mem(  135296-  282880B)
FPGA: Computing ................................................................ done.
run time: 106ms
CPU: Offload CONV Layer f4/e1 :  32x32  x  32 > 128, CONV (1x1)/1  + ReLU, IN @mem( 9699328- 9830400B), OUT @mem( 9830400B), WEIGHTS @mem(  282880-  299776B) (split1)
FPGA: Computing ................................ done.
run time: 90ms
CPU: Offload CONV Layer f4/e3 :  32x32  x  32 > 128, CONV (3x3)/1p + ReLU, IN @mem( 9699328- 9830400B), OUT @mem( 9830912B), WEIGHTS @mem(  299776-  447744B) (split2)
FPGA: Computing ................................ done.
run time: 98ms
CPU: Offload CONV Layer f5/s1 :  32x32  x 256 > 32 , CONV (1x1)/1  + ReLU, IN @mem( 9830400-10878976B), OUT @mem(10878976B), WEIGHTS @mem(  447744-  480640B)
FPGA: Computing ................................ done.
run time: 191ms
CPU: Offload CONV Layer f5/e1 :  32x32  x  32 > 128, CONV (1x1)/1  + ReLU, IN @mem(10878976-11010048B), OUT @mem(11010048B), WEIGHTS @mem(  480640-  497536B) (split1)
FPGA: Computing ................................ done.
run time: 90ms
CPU: Offload CONV Layer f5/e3 :  32x32  x  32 > 128, CONV (3x3)/1p + ReLU, IN @mem(10878976-11010048B), OUT @mem(11010560B), WEIGHTS @mem(  497536-  645504B) (split2)
FPGA: Computing ................................ done.
run time: 98ms
CPU: Offload CONV Layer f6/s3 :  32x32  x 256 > 64 , CONV (3x3)/2p + ReLU, IN @mem(11010048-12058624B), OUT @mem(12058624B), WEIGHTS @mem(  645504- 1235584B)
FPGA: Computing ................................ done.
run time: 106ms
CPU: Offload CONV Layer f6/e1 :  16x16  x  64 > 256, CONV (1x1)/1  + ReLU, IN @mem(12058624-12124160B), OUT @mem(12124160B), WEIGHTS @mem( 1235584- 1302144B) (split1)
FPGA: Computing ................ done.
run time: 94ms
CPU: Offload CONV Layer f6/e3 :  16x16  x  64 > 256, CONV (3x3)/1p + ReLU, IN @mem(12058624-12124160B), OUT @mem(12125184B), WEIGHTS @mem( 1302144- 1892992B) (split2)
FPGA: Computing ................ done.
run time: 98ms
CPU: Offload CONV Layer f7/s1 :  16x16  x 512 > 64 , CONV (1x1)/1  + ReLU, IN @mem(12124160-12648448B), OUT @mem(12648448B), WEIGHTS @mem( 1892992- 2024320B)
FPGA: Computing ................ done.
run time: 181ms
CPU: Offload CONV Layer f7/e1 :  16x16  x  64 > 192, CONV (1x1)/1  + ReLU, IN @mem(12648448-12713984B), OUT @mem(12713984B), WEIGHTS @mem( 2024320- 2074240B) (split1)
FPGA: Computing ................ done.
run time: 66ms
CPU: Offload CONV Layer f7/e3 :  16x16  x  64 > 192, CONV (3x3)/1p + ReLU, IN @mem(12648448-12713984B), OUT @mem(12714752B), WEIGHTS @mem( 2074240- 2517376B) (split2)
FPGA: Computing ................ done.
run time: 73ms
CPU: Offload CONV Layer f8/s3 :  16x16  x 384 > 112, CONV (3x3)/2p + ReLU, IN @mem(12713984-13107200B), OUT @mem(13107200B), WEIGHTS @mem( 2517376- 4066112B)
FPGA: Computing ................ done.
run time: 67ms
CPU: Offload CONV Layer f8/e1 :   8x8   x 112 > 256, CONV (1x1)/1  + ReLU, IN @mem(13107200-13135872B), OUT @mem(13135872B), WEIGHTS @mem( 4066112- 4181824B) (split1)
FPGA: Computing ........ done.
run time: 38ms
CPU: Offload CONV Layer f8/e3 :   8x8   x 112 > 256, CONV (3x3)/1p + ReLU, IN @mem(13107200-13135872B), OUT @mem(13136896B), WEIGHTS @mem( 4181824- 5215040B) (split2)
FPGA: Computing ........ done.
run time: 44ms
CPU: Offload CONV Layer f9/s1 :   8x8   x 512 > 112, CONV (1x1)/1  + ReLU, IN @mem(13135872-13266944B), OUT @mem(13266944B), WEIGHTS @mem( 5215040- 5444864B)
FPGA: Computing ........ done.
run time: 78ms
CPU: Offload CONV Layer f9/e1 :   8x8   x 112 > 368, CONV (1x1)/1  + ReLU, IN @mem(13266944-13295616B), OUT @mem(13295616B), WEIGHTS @mem( 5444864- 5611200B) (split1)
FPGA: Computing ........ done.
run time: 55ms
CPU: Offload CONV Layer f9/e3 :   8x8   x 112 > 368, CONV (3x3)/1p + ReLU, IN @mem(13266944-13295616B), OUT @mem(13297088B), WEIGHTS @mem( 5611200- 7096448B) (split2)
FPGA: Computing ........ done.
run time: 63ms
CPU: Offload CONV Layer c10/p1:   8x8   x 736 > 512, CONV (1x1)/1        , IN @mem(13295616-13484032B), OUT @mem(13484032B), WEIGHTS @mem( 7096448- 8605824B) (split1) GLOBAL POOL
FPGA: Computing ........ done.
run time: 501ms
CPU: Offload CONV Layer c10/p2:   8x8   x 736 > 512, CONV (1x1)/1        , IN @mem(13295616-13484032B), OUT @mem(13486080B), WEIGHTS @mem( 8605824-10115200B) (split2) GLOBAL POOL
FPGA: Computing ........ done.
run time: 499ms
CPU: Copy Results from FPGA DRAM (4096 Bytes)

Total run time: 3590ms

Result (top-5):
====================
    88.38%: class 207 (output  18.94)
     4.42%: class 852 (output  15.95)
     4.25%: class 208 (output  15.91)
     1.65%: class 219 (output  14.97)
     0.20%: class 929 (output  12.85)

TestBench Result: SUCCESS

若参数输入FPGA则在分配完mem之后打出

XFPGA Driver: Initialize
test: could not open /dev/mem. need to be root: Permission denied

 

3._HLS_CODE

3.1 C simulation

这部分代码为用于进行HLS的c代码,其中fpga_top为top-level function,cpu_top为其test Bench。我们在运行时,前面正常输出,但是到了c10/p1在进行FPGA :computing的时候,会给出报错SIGSEGV,可能为内存相关的问题。

3.2 Synthesis

这部分排除了两个BUG,一个是在unittests.cpp程序中,这个程序用于测试相关的单元的功能。行中的显示没有seiosflags,我们发现这行作用并不大,直接删掉。

// unittests.cpp    line 50
std::cout << std::setiosflags(std::ios::fixed) << std::setprecision(2)
          << "ERROR: " << acquired << " != " << expected << " in " << fn
          << " (" << file << ", line " << line << ")" << std::endl;

另一个BUG是在netconfig.cpp 中,给出报错没有fopen,printf等等的函数,我们直接在其加入 #include。对于程序的其他问题我们需要在后续继续阅读与搞懂程序。

综合通过,然后export RTL

3.3 搭建系统与生成比特流

ZynqNet解析(二)运行与调试_第1张图片

 我们添加PS,定制PS 加入HP0,时钟周期设为200MHz,加入中断并连接,然后自动连接。

 

 

 

 

 

你可能感兴趣的:(FPGA,机器学习,zynqNet)