背景:ZynqNet能在xilinx的FPGA上实现deep compression的网络,
目的:读懂ZynqNetCPU端的代码。
源码地址:https://github.com/dgschwend/zynqnet
目录
cpu_top
程序包括
1 CPU端创建网络
1.1 储存网络结构的结构体
1.2 创建网络的函数
1.3 输出每层信息
1.4 构造函数
2 FPGA端创建网络
2.1 axi-Lite通信
2.2 加载数据到DRAM
2.3 FPGA对DRAM的读取
3 控制FPGA运算
4 取回运算并输出结果
4.1 对比结果
_HLS_CODE中的cpu_top程序为test Bench,用于测试HLS程序。
_FIRMWARE中的cpu_top为7z035的CPU代码。我们以此代码为目标。
我们需要搞明白哪些是CPU端的函数,哪些是FPGA端的用于HLS编译的函数。
CPU端的网络信息
//cpu_top main
// = Setup Network on CPU =
// Generate + Load Network Config from network.hpp/network.cpp
network_t *net_CPU = get_network_config();
printf("\nCPU: Load Network Configuration\n");
print_layers(net_CPU);
printf("\n");
创建network_t的结构体指针,*net_CPU;这个为网络在CPU端的指针,里面存储着网络的信息。
network_t与layer_t为结构体,均在netconfig.hpp中定义。其中储存神经网络的信息。以后CPU所有关于神经网络的信息都在这两个结构体中调用。
network_t *net_CPU = get_network_config();这个函数在network.cpp中,一层一层用addLayer函数加入每层layer的信息,然后用loadWeightsFromFile加载权重信息。
// network.c
network_t *get_network_config() {
network_t *net = new network_t(27, 2528800);
// Layer Attributes: ( NAME , W, H, CI, CO, K, P, S, R, S1, S2, GP)
addLayer(net, layer_t("c1 ", 256, 256, 3, 64, 3, 1, 2, 1, 0, 0, 0));
addLayer(net, layer_t("f2/s3 ", 128, 128, 64, 16, 3, 1, 2, 1, 0, 0, 0));
addLayer(net, layer_t("f2/e1 ", 64, 64, 16, 64, 1, 0, 1, 1, 1, 0, 0));
...
net->num_weights = 2528800;
const char* filename = "weights.bin";
loadWeightsFromFile(net, filename);
return net;
}
addLayer函数在netconfig.cpp中。从前往后参数分别为name,weight,hight,channel in,channel out,kernel,pad,ReLU,spilt1,spilt2,global pooling
loadWeightsFromFile函数也在netconfig.cpp中,函数用于一层一层的加载权重。
print_layers(net_CPU)函数用于输出每层网络的信息,其中子函数为print_layer。在netconfig.cpp中
我们注意到,在结构体之中,有一个比较新的语句方式。
struct layer_t {
char name[NET_NAME_MAX_LEN + 1];
dimension_t width; // input dimensions
dimension_t height;
channel_t channels_in;
channel_t channels_out;
kernel_t kernel; // kernel sizes supported: 3 or 1
stride_t stride; // only stride 1 or 2 supported
bool pad; // padding is either 0 or 1 pixel
bool relu;
bool is_first_split_layer;
bool is_second_split_layer;
bool global_pool;
memaddr_t mem_addr_input;
memaddr_t mem_addr_output;
memaddr_t mem_addr_weights;
// full constructor, used to define network in network.cpp
layer_t(const char *n, int w, int h, int ci, int co, int k, int p, int s,
int r, bool is_split_1 = false, bool is_split_2 = false,
bool global_pool = false, int mem_i = 0, int mem_o = 0, int mem_w = 0)
: width(w),
height(h),
channels_in(ci),
channels_out(co),
kernel(k),
pad(p),
stride(s),
relu(r),
mem_addr_input(mem_i),
mem_addr_output(mem_o),
mem_addr_weights(mem_w),
is_first_split_layer(is_split_1),
is_second_split_layer(is_split_2),
global_pool(global_pool) {
for (int i = 0; i < NET_NAME_MAX_LEN; i++) {
name[i] = n[i];
if (n[i] == 0) break;
}
name[NET_NAME_MAX_LEN] = 0;
};
// empty constructor, needed for empty array of layer_t in FPGA BRAM
layer_t()
: width(0),
height(0),
channels_in(0),
channels_out(0),
kernel(0),
pad(0),
stride(0),
relu(0),
mem_addr_input(0),
mem_addr_output(0),
mem_addr_weights(0),
is_first_split_layer(0),
is_second_split_layer(0),
global_pool(0) {
name[0] = 0;
};
};
原程序之中,下面两个layer_t就是构造函数。一个为全构造函数,一个为缺省构造函数,
https://blog.csdn.net/alex1997222/article/details/81219663?utm_source=blogxgwz0
相关内容,类的构造函数与析构函数 http://www.runoob.com/cplusplus/cpp-constructor-destructor.html
//network.cpp in network_t *get_network_config() funciton
network_t *get_network_config() {
network_t *net = new network_t(27, 2528800);
// Layer Attributes: ( NAME , W, H, CI, CO, K, P, S, R, S1, S2, GP)
addLayer(net, layer_t("c1 ", 256, 256, 3, 64, 3, 1, 2, 1, 0, 0, 0));
addLayer(net, layer_t("f2/s3 ", 128, 128, 64, 16, 3, 1, 2, 1, 0, 0, 0));
。。。
后面的layer_t即为全构造函数。先根据构造函数构造相应的layer,然后运用addLayer函数赋值给net
// in addLayer funciton netconfig.cpp
// Add Layer to network
net->layers[net->num_layers] = layer;
net->num_layers++;
net->total_pixel_mem = current_output_addr + output_data_pixels;
相当于用单片机分配DRAM上的地址传输给FPGA
// main
// Initialize AXILITE Configuration Bus + Shared DRAM region
if (USE_FPGA_BLOCK) XFPGA_Initialize();
初始化AXI-Lite通信,驱动程序是自动生成的,IPcore的相关驱动程序在xfpga.cpp中
后面用XFPGA_set...来进行相应的值的设置。
//main
// Allocate Shared Memory in DRAM for Weights + Data.
allocate_DRAM_memory(net_CPU);
// Copy Layer Weights to DRAM.
copy_weights_to_DRAM(net_CPU);
// ===========================
// = Load + Copy Input Image =
// ===========================
layer_t input_layer = net_CPU->layers[0];
// Allocate Memory for Input Image:
data_t *input_image = allocate_image_memory(input_layer);
// Load Input Image
load_prepared_input_image(input_layer, input_image, input_filename);
// Copy Input Image into shared DRAM
copy_input_image_to_DRAM(input_layer, input_image);
这几个函数均在cpu_top.cpp中,
allocate_DRAM_memory(net_CPU);根据分配DRAM的地址,copy_weights_to_DRAM将net_CPU->weights中的权重读到DRAM中。注意,weights也是先从文件读到buffer,然后再由buffer读到DRAM之中的。loadWeightsFromFile见1.1.2
load_prepared_input_image和copy_input_image_to_DRAM分别是将图像的数据从file存到buffer和从buffer读到DRAM之中。
//cpu_top.cpp in function allocate_DRAM_memory
// Memory Allocation
if (USE_FPGA_BLOCK) {
// Get Pointer to SHARED DRAM from XFPGA wrapper
SHARED_DRAM = (volatile char *)XFPGA_shared_DRAM_virtual();
} else {
// Allocate SHARED DRAM on Heap
SHARED_DRAM = (volatile char *)malloc(total_size);
}
其中XFPGA_shared_DRAM_virtual是xfpga.cpp中的程序, SHARED_DRAM = (volatile char * ) XFPGA_shared_DRAM_virtual ();返回 SHARED_DRAM 指针。
allocate_DRAM_memory函数就是根据网络的信息分配出DRAM上的地址。指针信息存在SHARED_DRAM, SHARED_DRAM_WEIGHTS, 和SHARED_DRAM_DATA中这三个变量是在main函数之外定义的volatile data_t型的变量,可以看作全局变量。
两次for循环控制相应的迭代与层数,其内用axi-Lite通信给FPGA传输数据控制FPGA进行运算。
// main for iteration for num_layers
if (USE_FPGA_BLOCK) {
// FPGA Accelerator Block
XFPGA_setLayerConfig(layer);
XFPGA_Start();
while (!XFPGA_IsDone()) { // busy-wait
usleep(1); // sleep 100us
if (!BE_QUIET)
LOG("XFPGA Status: Done = %d, Idle = %d, Ready = %d\n", XFPGA_IsDone(),
XFPGA_IsIdle(), XFPGA_IsReady());
}
} else {
// CPU Simulation
// Precalculate some Layer Parameters on CPU
numfilterelems_t w_per_f = (layer.kernel == 3) ? 9 : 1;
weightaddr_t n_weights = layer.channels_in * layer.channels_out * w_per_f;
fpga_top(layer, (data_t *)SHARED_DRAM, weights_offset, n_weights,
input_offset);
}
其中,XFPGA_setLayerConfig(layer);函数将每层layer的信息传输给FPGA,然后运用FPGA运算。该函数也在xfpga.cpp中
// Fetch Results from FPGA
data_t *results = (data_t *)malloc(ch_out * sizeof(data_t));
copy_results_from_DRAM(results, ch_out);
// = Release FPGA core =
if (USE_FPGA_BLOCK) XFPGA_Release();
// = Calculate Softmax =
std::vector > probabilities(ch_out);
calculate_softmax(net_CPU, results, probabilities);
运用 copy_results_from_DRAM(results, ch_out); 从DRAM上读回数据,然后运用softmax预测概率。
// main
// Check if output is AS EXPECTED (+- 0.5%) (defined in network.hpp)
if (fabs(100 * probabilities[0].first - TEST_RESULT_EXPECTED) < 0.1) {
printf("\nTestBench Result: SUCCESS\n");
return 0;
} else {
printf("\nTestBench Result: FAILURE\n");
printf("Actual: %5.2f, Expected: %5.2f\n", 100 * probabilities[0].first,
TEST_RESULT_EXPECTED);
return -1;
}
输出结果在TEST_RESULT_EXPECTED进行对比,其定义在network.hpp之中,运用const float定义的全局变量。