卷积函数的FPGA实现(四)函数接口的HLS

背景:编写好IPcore并且验证通过,但是接口需要进行HLS。

目的:将卷积IPcore接口进行HLS,将权重输入输出同步为DRAM的地址,axi-stream协议进行传输数据。将神经网络参数通过axi-lite协议进行传输。

参考:

 用IPcore调用DDR3相关知识 https://blog.csdn.net/weixin_36474809/article/details/81018040

 AXI-Lite实现PS与PL通信 https://blog.csdn.net/weixin_36474809/article/details/81206660

 FPGA实践教程(五)PS用MIG调用DDR https://blog.csdn.net/weixin_36474809/article/details/80997945#%E4%BA%94%E3%80%81SDK

 ARM用MIG调用DDR3的c程序解析 https://blog.csdn.net/weixin_36474809/article/details/81012267

 FPGA实践教程(七)运用IPcore调用DDR https://blog.csdn.net/weixin_36474809/article/details/84942607

UG1037 (v4.0) July 15, 2017 , AXI Reference guide

目录

目录

一、参考部分的接口

1.1 axi-lite

1.2 m_axi

二、添加指令

2.1 需要传递的参数(参考)

2.2 IPcore的参数传入(参考)

2.3 加入volatile指令

2.4 传入参数更改

2.5 最终执行的接口HLS

三、进行HLS

四、 必须有return值


一、参考部分的接口

原接口输入格式为结构体的格式,其参数包含了网络参数也包含DRAM上的指针,所以难以进行接口HLS,我们需要将DRAM指针与网络参数分开传入卷积。

1.1 axi-lite

void AxiLiteTest(int * tenNum, int * oneNum, int * outNum)
{
#pragma HLS INTERFACE s_axilite port=outNum
#pragma HLS INTERFACE s_axilite port=oneNum
#pragma HLS INTERFACE s_axilite port=tenNum

直接进行axi-lite即可,port表示进行axi-lite接口的变量,bundle表示一批,其他内容均在这一批之下。

int migTester(int size, volatile int *migPtr ,int totalNumDDR){
#pragma HLS INTERFACE s_axilite port=totalNumDDR
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE m_axi depth=512 port=migPtr offset=slave
#pragma HLS INTERFACE s_axilite port=size
unsigned int memDDR3Tester(unsigned int start, unsigned int size,
		unsigned int mode, unsigned int data, 
		volatile unsigned int *memPtr, unsigned int *expectedVal, 
		unsigned int *failedAddr, unsigned int *numErrors)
{
#pragma HLS INTERFACE s_axilite port=numErrors bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=failedAddr bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=expectedVal bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=start bundle=CRTL_BUS
#pragma HLS INTERFACE m_axi depth=512 port=memPtr offset=slave
#pragma HLS INTERFACE s_axilite port=data bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=mode bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=size bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=return bundle=CRTL_BUS

 

void fpga_top(layer_t layer, data_t *SHARED_DRAM, unsigned int weights_offset,
              weightaddr_t num_weights, unsigned int input_offset) {
#pragma HLS INTERFACE m_axi depth = DRAM_DEPTH port = SHARED_DRAM offset = \
    slave bundle = memorybus register
#pragma HLS INTERFACE s_axilite port = layer bundle = axilite  register
#pragma HLS INTERFACE s_axilite port = num_weights bundle = axilite  register
#pragma HLS INTERFACE s_axilite port = weights_offset bundle = axilite  register
#pragma HLS INTERFACE s_axilite port = input_offset bundle = axilite  register
#pragma HLS INTERFACE s_axilite port = return bundle = axilite  register

 关于register的参数设置暂不深究,后续需要查找文档找axi接口的相关问题。UG1037 (v4.0) July 15, 2017

所以我们实现卷积时候需要设置axi-lite下面这些内容:

  • INTERFACE s_axilite
  • port设置为相应的函数参量
  • bundle表示同一批
  • register格式

1.2 m_axi

此接口协议为IPcore与DRAM之间通过axi协议进行通信,前缀m表示IPcore为主,控制DDR。

unsigned int memDDR3Tester(unsigned int start, unsigned int size,
		unsigned int mode, unsigned int data, 
		volatile unsigned int *memPtr, unsigned int *expectedVal, 
		unsigned int *failedAddr, unsigned int *numErrors)
{
#pragma HLS INTERFACE s_axilite port=numErrors bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=failedAddr bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=expectedVal bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=start bundle=CRTL_BUS
#pragma HLS INTERFACE m_axi depth=512 port=memPtr offset=slave
#pragma HLS INTERFACE s_axilite port=data bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=mode bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=size bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=return bundle=CRTL_BUS
int migTester(int size, volatile int *migPtr ,int totalNumDDR){
#pragma HLS INTERFACE s_axilite port=totalNumDDR
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE m_axi depth=512 port=migPtr offset=slave
#pragma HLS INTERFACE s_axilite port=size	

 

void fpga_top(layer_t layer, data_t *SHARED_DRAM, unsigned int weights_offset,
              weightaddr_t num_weights, unsigned int input_offset) {
#pragma HLS INTERFACE m_axi depth = DRAM_DEPTH port = SHARED_DRAM offset = \
    slave bundle = memorybus register
#pragma HLS INTERFACE s_axilite port = layer bundle = axilite  register
#pragma HLS INTERFACE s_axilite port = num_weights bundle = axilite  register
#pragma HLS INTERFACE s_axilite port = weights_offset bundle = axilite  register
#pragma HLS INTERFACE s_axilite port = input_offset bundle = axilite  register
#pragma HLS INTERFACE s_axilite port = return bundle = axilite  register

下面为我们在HLS里面自己添加指令得出的预编译源码。

#pragma HLS INTERFACE m_axi depth=512 port=weightIn->pdata offset=slave bundle=memorybus

depth我们不太清楚含义,zynqNet之中,const int DRAM_DEPTH = 5932576;较深。

offset=salve表示需要设置指针的偏移地址。

bundle表示一系列的线。

所以调用m_axi需要的指令为:

  • INTERFACE m_axi
  • port=相应的函数输入参数
  • depth,通信位宽?
  • offset=salve
  • budnle=memorybus

二、添加指令

2.1 需要传递的参数(参考)

此步因为涉及多指针的问题,后面舍弃掉了。

函数之中,需要用到axi-lite指令传递的参数为:

//current varable for loop
int cur_channel_out,cur_channel_in,cur_row_out,cur_col_out;
int filter_col,filter_row;
//network parameters
int stride = weightIn->stride;
int kernelSize=weightIn->kernelSize,kernelSize_2D=weightIn->kernelSize*weightIn->kernelSize;//kernel


//DRAM location offset variable
int output_loc,weight_pre_loc,input_pre_loc,weight_loc,input_loc;
//DRAM three variable pointer
float* weight_ptr=weightIn->pdata;float *input_ptr=pboxIn->pdata;float *output_ptr=outpBox->pdata;


layer_setup:{
	MemoryController::setLayerConfig(weightIn,pboxIn,outpBox);
	ImageCache::setLayerConfig(weightIn,pboxIn);
	WeightsCache::setLayerConfig(weightIn);
};

其中涉及的结构体:

struct Weight
{
    mydataFmt *pdata;
    mydataFmt *pbias;
    int out_ChannelNum;
    int in_ChannelNum;
    int kernelSize;
    int stride;
    int leftPad;
    int rightPad;
};

struct pBox
{
	mydataFmt *pdata;
	int width;
	int height;
	int channel;
};

为后续实现方便,我们一次性将所以的参数均用axilite协议传入FPGA

2.2 IPcore的参数传入(参考)

此步设计多指针的问题,后面舍弃掉了。

//----------------convolution in FPGA-----------------------------------
void convolution_3x3(const Weight *weightIn, const pBox *pboxIn, pBox *outpBox){
//axilite interface	
#pragma HLS INTERFACE s_axilite register port=weightIn->out_ChannelNum bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->in_ChannelNum bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->kernelSize bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->stride bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->leftPad bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->rightPad bundle=axilite //weight
#pragma HLS INTERFACE s_axilite register port=pboxIn->width bundle=axilite
#pragma HLS INTERFACE s_axilite register port=pboxIn->height bundle=axilite
#pragma HLS INTERFACE s_axilite register port=pboxIn->channel bundle=axilite  //pboxIn
#pragma HLS INTERFACE s_axilite register port=outpBox->width bundle=axilite
#pragma HLS INTERFACE s_axilite register port=outpBox->height bundle=axilite
#pragma HLS INTERFACE s_axilite register port=outpBox->channel bundle=axilite  //outpBox
//m_axi interface
#pragma HLS INTERFACE m_axi depth=512 port=weightIn->pdata offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=512 port=pboxIn->pdata offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=512 port=outpBox->pdata offset=slave bundle=memorybus

按照上面的语句,实现相应的预编译语句

2.3 加入volatile指令

https://baike.baidu.com/item/volatile/10606957?fr=aladdin

这是c代码之中的volatile指令,加volatile指令用于告诉编译器volatile修饰的值要求每次直接读值。

DDR上的调用需要在变量前加入volatile的语句。我们先不加进行实验。发现依然是两个报错,

  • ERROR: [SYNCHK 200-11] src/fpgaAcc.cpp:259: Argument 'weightIn.pdata' of function 'convolution_3x3' (src/fpgaAcc.cpp:45) has an unsynthesizable type (possible cause(s): pointer to pointer or global pointer).
  • weightIn.pdata这个包含着不能被HLS综合的类型,例如指针指向的指针,或者全局变量指针。
  • ERROR: [SYNCHK 200-61] src/fpgaAcc.cpp:174: unsupported memory access on variable 'weightIn.pdata' which is (or contains) an array with unknown size at compile time.
  • weightIn.pdata是一个(或者包含)不知大小的数组。

所以我们需要加入volatile指令来指定相应的接口类型。

加入的位置:更改过程之中,编译器会大量报错,按照编译器的报错依次更令。主要更改为加入强制类型转换。

  • pBox.h之中,weight与pbox的结构体的变量需要变为volatile float
  • network.cpp与.h之中,addbias与prelu函数的输入参数,所有函数
  • mtcnn.cpp之中,与memset相关的,memcpy,和fread
  • initconvandfc,initprelu之中
  • fpgaAcc之中,巨大量的需要更改。

2.4 传入参数更改

传入参数为指针型的结构体,相对复杂,经过HLS实验之后发现此结构体HLS难以编译,所以我们需要对此输入函数进行更改。

神经网络实现于FPGA的难点就是牵一发而动全身。每更改一个变量,就需要把所有相关的变量均进行更改。

void convolution_3x3(int inHight,int inWidth,int inChanNum,int outHight,int outWidth,int OutChanNum,
			int stride,
			volatile float *weight_ptr,volatile float *input_ptr,volatile float *output_ptr)

 先在fpga.cpp之中更改成功,然后HLS testbench更改通过,

	//conv in PL
	convolution_3x3(featureIn.height, featureIn.width ,featureIn.channel,
						 conv_PL_out.height,conv_PL_out.width,conv_PL_out.channel,
						 weightIn.stride,
						 weightIn.pdata, featureIn.pdata,conv_PL_out.pdata);

然后更改mtcnn.cpp之中的代码,在mtcnn之中也更改通过。需要将所有的conv3*3换为这个函数。

其中所有设计3*3卷积的函数均改为这个形式。

	convolution_3x3(this->pooling1_out->height,this->pooling1_out->width,this->pooling1_out->channel,
					this->conv2_out->height,this->conv2_out->width,this->conv2_out->channel,
					this->conv2_wb->stride,
					this->conv2_wb->pdata,this->pooling1_out->pdata,this->conv2_out->pdata);

大量更改之后嵌套入原程序执行成功。

2.5 最终执行的接口HLS

//----------------convolution in FPGA-----------------------------------
void convolution_3x3(int inHight,int inWidth,int inChanNum,int outHight,int outWidth,int OutChanNum,
					 int stride,
					 volatile float *weight_ptr,volatile float *input_ptr,volatile float *output_ptr){
#pragma HLS INTERFACE s_axilite register port=inHight bundle=axilite
#pragma HLS INTERFACE s_axilite register port=inWidth bundle=axilite
#pragma HLS INTERFACE s_axilite register port=inChanNum bundle=axilite
#pragma HLS INTERFACE s_axilite register port=outHight bundle=axilite
#pragma HLS INTERFACE s_axilite register port=outWidth bundle=axilite
#pragma HLS INTERFACE s_axilite register port=OutChanNum bundle=axilite
#pragma HLS INTERFACE s_axilite register port=stride bundle=axilite
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=weight_ptr offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=input_ptr offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=output_ptr offset=slave bundle=memorybus

 参数直接通过s_axilite协议传入,运用register,bundle设为

三、进行HLS

程序在mtcnn主程序之中测试通过

然后再HLS-testBench之中测试通过

在接口之中测试通过

Starting C synthesis ...
/mnt/workspace/Xilinx/Vivado/2017.4/bin/vivado_hls /home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/csynth.tcl
INFO: [HLS 200-10] Running '/mnt/workspace/Xilinx/Vivado/2017.4/bin/unwrapped/lnx64.o/vivado_hls'
INFO: [HLS 200-10] For user 'osrc' on host 'osrc-virtual-machine' (Linux_x86_64 version 4.13.0-32-generic) on Tue Dec 11 16:53:16 CST 2018
INFO: [HLS 200-10] On os Ubuntu 16.04.3 LTS
INFO: [HLS 200-10] In directory '/home/osrc/Desktop/document/conv_Core/HLS_Conv'
INFO: [HLS 200-10] Opening project '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore'.
INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.cpp' to the project
INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.hpp' to the project
INFO: [HLS 200-10] Adding design file 'src/pBox.cpp' to the project
INFO: [HLS 200-10] Adding design file 'src/pBox.h' to the project
INFO: [HLS 200-10] Adding test bench file 'src/test_convBench.cpp' to the project
INFO: [HLS 200-10] Opening solution '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
INFO: [HLS 200-10] Setting target device to 'xc7z035ffg676-2'
INFO: [HLS 200-10] Analyzing design file 'src/pBox.cpp' ...
INFO: [HLS 200-10] Analyzing design file 'src/fpgaAcc.cpp' ...
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:00:42 ; elapsed = 00:01:18 . Memory (MB): peak = 361.637 ; gain = 13.375 ; free physical = 337 ; free virtual = 32673
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:00:44 ; elapsed = 00:01:20 . Memory (MB): peak = 361.637 ; gain = 13.375 ; free physical = 335 ; free virtual = 32673
INFO: [HLS 200-10] Starting code transformations ...
INFO: [XFORM 203-603] Inlining function 'MemoryController::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:77).
INFO: [XFORM 203-603] Inlining function 'ImageCache::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:78).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:79).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::get_9_weights_to_buffer' (src/fpgaAcc.cpp:307).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::load_WBRAM_from_DRAM' (src/fpgaAcc.cpp:284).
INFO: [XFORM 203-603] Inlining function 'MemoryController::load_weight_2_reg' into 'WeightsCache::load_WBRAM_from_DRAM' (src/fpgaAcc.cpp:291).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::load_WBRAM_from_DRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:83).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:94).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:87).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:85).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadOffset' into 'ImageCache::loadRowDRAM_2_IBRAM' (src/fpgaAcc.cpp:330).
INFO: [XFORM 203-603] Inlining function 'MemoryController::loadInputChannelPixel' into 'ImageCache::loadPixelDRAM_2_IBRAM' (src/fpgaAcc.cpp:339).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadPixelDRAM_2_IBRAM' into 'ImageCache::loadRowDRAM_2_IBRAM' (src/fpgaAcc.cpp:331).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:95).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:88).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:86).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelOutOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:99).
INFO: [XFORM 203-603] Inlining function 'ImageCache::calcu_IBRAM_row_offset' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:209).
INFO: [XFORM 203-603] Inlining function 'ImageCache::get_IBRAM_Pixel' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:213).
INFO: [XFORM 203-603] Inlining function 'ProcessingElement::loadPixel_buffer' into 'ProcessingElement::processInputChannel' (src/fpgaAcc.cpp:230).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_9_weights_to_buffer' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:247).
INFO: [XFORM 203-603] Inlining function 'ProcessingElement::macc2d' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:249).
INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:384).
INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:252).
INFO: [XFORM 203-603] Inlining function 'OutputCache::getOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:382).
INFO: [XFORM 203-603] Inlining function 'OutputCache::accumulateChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:254).
INFO: [XFORM 203-603] Inlining function 'MemoryController::writeBackOutputChannel' into 'convolution_3x3' (src/fpgaAcc.cpp:109).
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:45 ; elapsed = 00:01:22 . Memory (MB): peak = 361.922 ; gain = 13.660 ; free physical = 324 ; free virtual = 32664
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:340->src/fpgaAcc.cpp:331->src/fpgaAcc.cpp:86) automatically.
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:00:46 ; elapsed = 00:01:22 . Memory (MB): peak = 361.922 ; gain = 13.660 ; free physical = 320 ; free virtual = 32661
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' for pipelining.
INFO: [XFORM 203-501] Unrolling loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' partially with a factor of 8.
INFO: [XFORM 203-501] Unrolling loop 'Loop-1.1' (src/fpgaAcc.cpp:308) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-501] Unrolling loop 'L_MACC_multiply' (src/fpgaAcc.cpp:190) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-501] Unrolling loop 'L_MACC_accumulate' (src/fpgaAcc.cpp:195) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-101] Partitioning array 'pixel_buffer' (src/fpgaAcc.cpp:228) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'weights_local' (src/fpgaAcc.cpp:244) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM'  in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'multresult' (src/fpgaAcc.cpp:187) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'OutputCache::OBRAM'  in dimension 1 with a cyclic factor 8.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.0'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.1'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.2'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.3'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.4'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.5'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.6'  in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.7'  in dimension 2 completely.
INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:340->src/fpgaAcc.cpp:331->src/fpgaAcc.cpp:86) automatically.
INFO: [XFORM 203-622] Instantiating function 'ProcessingElement::processInputChannel'(src/fpgaAcc.cpp:221) to 'ProcessingElement::processInputChannel.0' at call site (src/fpgaAcc.cpp:103) by setting 'cur_ci' to 'cur_channel_in'.
INFO: [XFORM 203-721] Changing loop 'Loop_load_pixel_2_PE_row_loop_proc' (src/fpgaAcc.cpp:207) to a process function for dataflow in function 'ProcessingElement::processInputChannel.0'.
INFO: [XFORM 203-712] Applying dataflow to function 'ProcessingElement::processInputChannel.0' (src/fpgaAcc.cpp:224:1), detected/extracted 2 process function(s): 
	 'ProcessingElement::processInputChannel.0_Loop_load_pixel_2_PE_row_loop_proc5'
	 'ProcessingElement::processAll_channelOut'.
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:00:49 ; elapsed = 00:01:25 . Memory (MB): peak = 489.633 ; gain = 141.371 ; free physical = 291 ; free virtual = 32635
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-1.1' (src/fpgaAcc.cpp:283:18) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-1' (src/fpgaAcc.cpp:280:18) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-4.1' (src/fpgaAcc.cpp:93:3) in function 'convolution_3x3' : 


the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-4' (src/fpgaAcc.cpp:91:6) in function 'convolution_3x3' : 


more than one sub loop.
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processInputChannel.0_Loop_load_pixel_2_PE_row_loop_proc5' to 'processInputChannel.' (src/fpgaAcc.cpp:207:3)
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processInputChannel.0' to 'processInputChannel..1' (src/fpgaAcc.cpp:226:1)
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processAll_channelOut' to 'processAll_channelOu' (src/fpgaAcc.cpp:192:43)
INFO: [XFORM 203-811] Inferring bus burst read of variable length on port 'memorybus' (src/fpgaAcc.cpp:178:15).
WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.
WARNING: [XFORM 203-713] Function 'processInputChannel..1' (src/fpgaAcc.cpp:226:1) failed dataflow checking:  A dataflow region cannot be instantiated from with a pipelined loop  (src/fpgaAcc.cpp:226:1). Ignoring pipeline directive to allow the dataflow directive to take precedence. This behavior can be disabled by using 'config_compile -disable_dataflow_pipeline_check'.
Instruction does not dominate all uses!
  %tmp_57 = add i32 %WeightsCache_inChan_1, %tmp_56
  %memorybus_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.floatP(float* %memorybus_addr, i32 %tmp_57), !dbg !1031
Broken module found, compilation aborted!
Stack dump:
0.	Running pass 'Function Pass Manager' on module '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/.autopilot/db/a.o.2.bc'.
1.	Running pass 'Module Verifier' on function '@convolution_3x3'
/mnt/workspace/Xilinx/Vivado/2017.4/bin/loader: line 194: 13582 Aborted                 (core dumped) "$RDI_PROG" "$@"
Finished C synthesis.

虽有其他报错,但是我们关于接口的问题已经调试通过。接口在IPcore端的HLS完成

四、 必须有return值

在进行FPGA测试时,发现一个bug,必须给程序加一个return值,否则无法判断IPcore是否完成。

所以我们需要将卷积加一个返回值。这样才会生成下面这样的驱动的函数:

while (!XMigtester_IsDone(&XMigtesterCore));
    result=XMigtester_Get_return(&XMigtesterCore);

所以我们将卷积加一个return值。

你可能感兴趣的:(FPGA,MTCNN)