背景:编写好IPcore并且验证通过,但是接口需要进行HLS。
目的:将卷积IPcore接口进行HLS,将权重输入输出同步为DRAM的地址,axi-stream协议进行传输数据。将神经网络参数通过axi-lite协议进行传输。
参考:
用IPcore调用DDR3相关知识 https://blog.csdn.net/weixin_36474809/article/details/81018040
AXI-Lite实现PS与PL通信 https://blog.csdn.net/weixin_36474809/article/details/81206660
FPGA实践教程(五)PS用MIG调用DDR https://blog.csdn.net/weixin_36474809/article/details/80997945#%E4%BA%94%E3%80%81SDK
ARM用MIG调用DDR3的c程序解析 https://blog.csdn.net/weixin_36474809/article/details/81012267
FPGA实践教程(七)运用IPcore调用DDR https://blog.csdn.net/weixin_36474809/article/details/84942607
UG1037 (v4.0) July 15, 2017 , AXI Reference guide
目录
目录
一、参考部分的接口
1.1 axi-lite
1.2 m_axi
二、添加指令
2.1 需要传递的参数(参考)
2.2 IPcore的参数传入(参考)
2.3 加入volatile指令
2.4 传入参数更改
2.5 最终执行的接口HLS
三、进行HLS
四、 必须有return值
原接口输入格式为结构体的格式,其参数包含了网络参数也包含DRAM上的指针,所以难以进行接口HLS,我们需要将DRAM指针与网络参数分开传入卷积。
void AxiLiteTest(int * tenNum, int * oneNum, int * outNum)
{
#pragma HLS INTERFACE s_axilite port=outNum
#pragma HLS INTERFACE s_axilite port=oneNum
#pragma HLS INTERFACE s_axilite port=tenNum
直接进行axi-lite即可,port表示进行axi-lite接口的变量,bundle表示一批,其他内容均在这一批之下。
int migTester(int size, volatile int *migPtr ,int totalNumDDR){
#pragma HLS INTERFACE s_axilite port=totalNumDDR
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE m_axi depth=512 port=migPtr offset=slave
#pragma HLS INTERFACE s_axilite port=size
unsigned int memDDR3Tester(unsigned int start, unsigned int size,
unsigned int mode, unsigned int data,
volatile unsigned int *memPtr, unsigned int *expectedVal,
unsigned int *failedAddr, unsigned int *numErrors)
{
#pragma HLS INTERFACE s_axilite port=numErrors bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=failedAddr bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=expectedVal bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=start bundle=CRTL_BUS
#pragma HLS INTERFACE m_axi depth=512 port=memPtr offset=slave
#pragma HLS INTERFACE s_axilite port=data bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=mode bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=size bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=return bundle=CRTL_BUS
void fpga_top(layer_t layer, data_t *SHARED_DRAM, unsigned int weights_offset,
weightaddr_t num_weights, unsigned int input_offset) {
#pragma HLS INTERFACE m_axi depth = DRAM_DEPTH port = SHARED_DRAM offset = \
slave bundle = memorybus register
#pragma HLS INTERFACE s_axilite port = layer bundle = axilite register
#pragma HLS INTERFACE s_axilite port = num_weights bundle = axilite register
#pragma HLS INTERFACE s_axilite port = weights_offset bundle = axilite register
#pragma HLS INTERFACE s_axilite port = input_offset bundle = axilite register
#pragma HLS INTERFACE s_axilite port = return bundle = axilite register
关于register的参数设置暂不深究,后续需要查找文档找axi接口的相关问题。UG1037 (v4.0) July 15, 2017
所以我们实现卷积时候需要设置axi-lite下面这些内容:
此接口协议为IPcore与DRAM之间通过axi协议进行通信,前缀m表示IPcore为主,控制DDR。
unsigned int memDDR3Tester(unsigned int start, unsigned int size,
unsigned int mode, unsigned int data,
volatile unsigned int *memPtr, unsigned int *expectedVal,
unsigned int *failedAddr, unsigned int *numErrors)
{
#pragma HLS INTERFACE s_axilite port=numErrors bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=failedAddr bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=expectedVal bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=start bundle=CRTL_BUS
#pragma HLS INTERFACE m_axi depth=512 port=memPtr offset=slave
#pragma HLS INTERFACE s_axilite port=data bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=mode bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=size bundle=CRTL_BUS
#pragma HLS INTERFACE s_axilite port=return bundle=CRTL_BUS
int migTester(int size, volatile int *migPtr ,int totalNumDDR){
#pragma HLS INTERFACE s_axilite port=totalNumDDR
#pragma HLS INTERFACE s_axilite port=return
#pragma HLS INTERFACE m_axi depth=512 port=migPtr offset=slave
#pragma HLS INTERFACE s_axilite port=size
void fpga_top(layer_t layer, data_t *SHARED_DRAM, unsigned int weights_offset,
weightaddr_t num_weights, unsigned int input_offset) {
#pragma HLS INTERFACE m_axi depth = DRAM_DEPTH port = SHARED_DRAM offset = \
slave bundle = memorybus register
#pragma HLS INTERFACE s_axilite port = layer bundle = axilite register
#pragma HLS INTERFACE s_axilite port = num_weights bundle = axilite register
#pragma HLS INTERFACE s_axilite port = weights_offset bundle = axilite register
#pragma HLS INTERFACE s_axilite port = input_offset bundle = axilite register
#pragma HLS INTERFACE s_axilite port = return bundle = axilite register
下面为我们在HLS里面自己添加指令得出的预编译源码。
#pragma HLS INTERFACE m_axi depth=512 port=weightIn->pdata offset=slave bundle=memorybus
depth我们不太清楚含义,zynqNet之中,const int DRAM_DEPTH = 5932576;较深。
offset=salve表示需要设置指针的偏移地址。
bundle表示一系列的线。
所以调用m_axi需要的指令为:
此步因为涉及多指针的问题,后面舍弃掉了。
函数之中,需要用到axi-lite指令传递的参数为:
//current varable for loop
int cur_channel_out,cur_channel_in,cur_row_out,cur_col_out;
int filter_col,filter_row;
//network parameters
int stride = weightIn->stride;
int kernelSize=weightIn->kernelSize,kernelSize_2D=weightIn->kernelSize*weightIn->kernelSize;//kernel
//DRAM location offset variable
int output_loc,weight_pre_loc,input_pre_loc,weight_loc,input_loc;
//DRAM three variable pointer
float* weight_ptr=weightIn->pdata;float *input_ptr=pboxIn->pdata;float *output_ptr=outpBox->pdata;
layer_setup:{
MemoryController::setLayerConfig(weightIn,pboxIn,outpBox);
ImageCache::setLayerConfig(weightIn,pboxIn);
WeightsCache::setLayerConfig(weightIn);
};
其中涉及的结构体:
struct Weight
{
mydataFmt *pdata;
mydataFmt *pbias;
int out_ChannelNum;
int in_ChannelNum;
int kernelSize;
int stride;
int leftPad;
int rightPad;
};
struct pBox
{
mydataFmt *pdata;
int width;
int height;
int channel;
};
为后续实现方便,我们一次性将所以的参数均用axilite协议传入FPGA
此步设计多指针的问题,后面舍弃掉了。
//----------------convolution in FPGA-----------------------------------
void convolution_3x3(const Weight *weightIn, const pBox *pboxIn, pBox *outpBox){
//axilite interface
#pragma HLS INTERFACE s_axilite register port=weightIn->out_ChannelNum bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->in_ChannelNum bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->kernelSize bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->stride bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->leftPad bundle=axilite
#pragma HLS INTERFACE s_axilite register port=weightIn->rightPad bundle=axilite //weight
#pragma HLS INTERFACE s_axilite register port=pboxIn->width bundle=axilite
#pragma HLS INTERFACE s_axilite register port=pboxIn->height bundle=axilite
#pragma HLS INTERFACE s_axilite register port=pboxIn->channel bundle=axilite //pboxIn
#pragma HLS INTERFACE s_axilite register port=outpBox->width bundle=axilite
#pragma HLS INTERFACE s_axilite register port=outpBox->height bundle=axilite
#pragma HLS INTERFACE s_axilite register port=outpBox->channel bundle=axilite //outpBox
//m_axi interface
#pragma HLS INTERFACE m_axi depth=512 port=weightIn->pdata offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=512 port=pboxIn->pdata offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=512 port=outpBox->pdata offset=slave bundle=memorybus
按照上面的语句,实现相应的预编译语句
https://baike.baidu.com/item/volatile/10606957?fr=aladdin
这是c代码之中的volatile指令,加volatile指令用于告诉编译器volatile修饰的值要求每次直接读值。
DDR上的调用需要在变量前加入volatile的语句。我们先不加进行实验。发现依然是两个报错,
所以我们需要加入volatile指令来指定相应的接口类型。
加入的位置:更改过程之中,编译器会大量报错,按照编译器的报错依次更令。主要更改为加入强制类型转换。
传入参数为指针型的结构体,相对复杂,经过HLS实验之后发现此结构体HLS难以编译,所以我们需要对此输入函数进行更改。
神经网络实现于FPGA的难点就是牵一发而动全身。每更改一个变量,就需要把所有相关的变量均进行更改。
void convolution_3x3(int inHight,int inWidth,int inChanNum,int outHight,int outWidth,int OutChanNum,
int stride,
volatile float *weight_ptr,volatile float *input_ptr,volatile float *output_ptr)
先在fpga.cpp之中更改成功,然后HLS testbench更改通过,
//conv in PL
convolution_3x3(featureIn.height, featureIn.width ,featureIn.channel,
conv_PL_out.height,conv_PL_out.width,conv_PL_out.channel,
weightIn.stride,
weightIn.pdata, featureIn.pdata,conv_PL_out.pdata);
然后更改mtcnn.cpp之中的代码,在mtcnn之中也更改通过。需要将所有的conv3*3换为这个函数。
其中所有设计3*3卷积的函数均改为这个形式。
convolution_3x3(this->pooling1_out->height,this->pooling1_out->width,this->pooling1_out->channel,
this->conv2_out->height,this->conv2_out->width,this->conv2_out->channel,
this->conv2_wb->stride,
this->conv2_wb->pdata,this->pooling1_out->pdata,this->conv2_out->pdata);
大量更改之后嵌套入原程序执行成功。
//----------------convolution in FPGA-----------------------------------
void convolution_3x3(int inHight,int inWidth,int inChanNum,int outHight,int outWidth,int OutChanNum,
int stride,
volatile float *weight_ptr,volatile float *input_ptr,volatile float *output_ptr){
#pragma HLS INTERFACE s_axilite register port=inHight bundle=axilite
#pragma HLS INTERFACE s_axilite register port=inWidth bundle=axilite
#pragma HLS INTERFACE s_axilite register port=inChanNum bundle=axilite
#pragma HLS INTERFACE s_axilite register port=outHight bundle=axilite
#pragma HLS INTERFACE s_axilite register port=outWidth bundle=axilite
#pragma HLS INTERFACE s_axilite register port=OutChanNum bundle=axilite
#pragma HLS INTERFACE s_axilite register port=stride bundle=axilite
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=weight_ptr offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=input_ptr offset=slave bundle=memorybus
#pragma HLS INTERFACE m_axi depth=DRAM_DEPTH port=output_ptr offset=slave bundle=memorybus
参数直接通过s_axilite协议传入,运用register,bundle设为
程序在mtcnn主程序之中测试通过
然后再HLS-testBench之中测试通过
在接口之中测试通过
Starting C synthesis ...
/mnt/workspace/Xilinx/Vivado/2017.4/bin/vivado_hls /home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/csynth.tcl
INFO: [HLS 200-10] Running '/mnt/workspace/Xilinx/Vivado/2017.4/bin/unwrapped/lnx64.o/vivado_hls'
INFO: [HLS 200-10] For user 'osrc' on host 'osrc-virtual-machine' (Linux_x86_64 version 4.13.0-32-generic) on Tue Dec 11 16:53:16 CST 2018
INFO: [HLS 200-10] On os Ubuntu 16.04.3 LTS
INFO: [HLS 200-10] In directory '/home/osrc/Desktop/document/conv_Core/HLS_Conv'
INFO: [HLS 200-10] Opening project '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore'.
INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.cpp' to the project
INFO: [HLS 200-10] Adding design file 'src/fpgaAcc.hpp' to the project
INFO: [HLS 200-10] Adding design file 'src/pBox.cpp' to the project
INFO: [HLS 200-10] Adding design file 'src/pBox.h' to the project
INFO: [HLS 200-10] Adding test bench file 'src/test_convBench.cpp' to the project
INFO: [HLS 200-10] Opening solution '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1'.
INFO: [SYN 201-201] Setting up clock 'default' with a period of 10ns.
INFO: [HLS 200-10] Setting target device to 'xc7z035ffg676-2'
INFO: [HLS 200-10] Analyzing design file 'src/pBox.cpp' ...
INFO: [HLS 200-10] Analyzing design file 'src/fpgaAcc.cpp' ...
INFO: [HLS 200-10] Validating synthesis directives ...
INFO: [HLS 200-111] Finished Checking Pragmas Time (s): cpu = 00:00:42 ; elapsed = 00:01:18 . Memory (MB): peak = 361.637 ; gain = 13.375 ; free physical = 337 ; free virtual = 32673
INFO: [HLS 200-111] Finished Linking Time (s): cpu = 00:00:44 ; elapsed = 00:01:20 . Memory (MB): peak = 361.637 ; gain = 13.375 ; free physical = 335 ; free virtual = 32673
INFO: [HLS 200-10] Starting code transformations ...
INFO: [XFORM 203-603] Inlining function 'MemoryController::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:77).
INFO: [XFORM 203-603] Inlining function 'ImageCache::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:78).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::setLayerConfig' into 'convolution_3x3' (src/fpgaAcc.cpp:79).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::get_9_weights_to_buffer' (src/fpgaAcc.cpp:307).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_WBRAM_addr' into 'WeightsCache::load_WBRAM_from_DRAM' (src/fpgaAcc.cpp:284).
INFO: [XFORM 203-603] Inlining function 'MemoryController::load_weight_2_reg' into 'WeightsCache::load_WBRAM_from_DRAM' (src/fpgaAcc.cpp:291).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::load_WBRAM_from_DRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:83).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:94).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:87).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadRowOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:85).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelLoadOffset' into 'ImageCache::loadRowDRAM_2_IBRAM' (src/fpgaAcc.cpp:330).
INFO: [XFORM 203-603] Inlining function 'MemoryController::loadInputChannelPixel' into 'ImageCache::loadPixelDRAM_2_IBRAM' (src/fpgaAcc.cpp:339).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadPixelDRAM_2_IBRAM' into 'ImageCache::loadRowDRAM_2_IBRAM' (src/fpgaAcc.cpp:331).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:95).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:88).
INFO: [XFORM 203-603] Inlining function 'ImageCache::loadRowDRAM_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:86).
INFO: [XFORM 203-603] Inlining function 'MemoryController::setPixelOutOffset' into 'convolution_3x3' (src/fpgaAcc.cpp:99).
INFO: [XFORM 203-603] Inlining function 'ImageCache::calcu_IBRAM_row_offset' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:209).
INFO: [XFORM 203-603] Inlining function 'ImageCache::get_IBRAM_Pixel' into 'ProcessingElement::loadPixel_buffer' (src/fpgaAcc.cpp:213).
INFO: [XFORM 203-603] Inlining function 'ProcessingElement::loadPixel_buffer' into 'ProcessingElement::processInputChannel' (src/fpgaAcc.cpp:230).
INFO: [XFORM 203-603] Inlining function 'WeightsCache::get_9_weights_to_buffer' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:247).
INFO: [XFORM 203-603] Inlining function 'ProcessingElement::macc2d' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:249).
INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:384).
INFO: [XFORM 203-603] Inlining function 'OutputCache::setOutChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:252).
INFO: [XFORM 203-603] Inlining function 'OutputCache::getOutChannel' into 'OutputCache::accumulateChannel' (src/fpgaAcc.cpp:382).
INFO: [XFORM 203-603] Inlining function 'OutputCache::accumulateChannel' into 'ProcessingElement::processAll_channelOut' (src/fpgaAcc.cpp:254).
INFO: [XFORM 203-603] Inlining function 'MemoryController::writeBackOutputChannel' into 'convolution_3x3' (src/fpgaAcc.cpp:109).
INFO: [HLS 200-111] Finished Standard Transforms Time (s): cpu = 00:00:45 ; elapsed = 00:01:22 . Memory (MB): peak = 361.922 ; gain = 13.660 ; free physical = 324 ; free virtual = 32664
INFO: [HLS 200-10] Checking synthesizability ...
INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:340->src/fpgaAcc.cpp:331->src/fpgaAcc.cpp:86) automatically.
INFO: [HLS 200-111] Finished Checking Synthesizability Time (s): cpu = 00:00:46 ; elapsed = 00:01:22 . Memory (MB): peak = 361.922 ; gain = 13.660 ; free physical = 320 ; free virtual = 32661
INFO: [XFORM 203-502] Unrolling all sub-loops inside loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' for pipelining.
INFO: [XFORM 203-501] Unrolling loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'ProcessingElement::processAll_channelOut' partially with a factor of 8.
INFO: [XFORM 203-501] Unrolling loop 'Loop-1.1' (src/fpgaAcc.cpp:308) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-501] Unrolling loop 'L_MACC_multiply' (src/fpgaAcc.cpp:190) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-501] Unrolling loop 'L_MACC_accumulate' (src/fpgaAcc.cpp:195) in function 'ProcessingElement::processAll_channelOut' completely.
INFO: [XFORM 203-101] Partitioning array 'pixel_buffer' (src/fpgaAcc.cpp:228) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'weights_local' (src/fpgaAcc.cpp:244) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM' in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'multresult' (src/fpgaAcc.cpp:187) in dimension 1 completely.
INFO: [XFORM 203-101] Partitioning array 'OutputCache::OBRAM' in dimension 1 with a cyclic factor 8.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.0' in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.1' in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.2' in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.3' in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.4' in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.5' in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.6' in dimension 2 completely.
INFO: [XFORM 203-101] Partitioning array 'WeightsCache::WBRAM.7' in dimension 2 completely.
INFO: [XFORM 203-602] Inlining function 'ImageCache::writeNextChannelPixel_2_IBRAM' into 'convolution_3x3' (src/fpgaAcc.cpp:340->src/fpgaAcc.cpp:331->src/fpgaAcc.cpp:86) automatically.
INFO: [XFORM 203-622] Instantiating function 'ProcessingElement::processInputChannel'(src/fpgaAcc.cpp:221) to 'ProcessingElement::processInputChannel.0' at call site (src/fpgaAcc.cpp:103) by setting 'cur_ci' to 'cur_channel_in'.
INFO: [XFORM 203-721] Changing loop 'Loop_load_pixel_2_PE_row_loop_proc' (src/fpgaAcc.cpp:207) to a process function for dataflow in function 'ProcessingElement::processInputChannel.0'.
INFO: [XFORM 203-712] Applying dataflow to function 'ProcessingElement::processInputChannel.0' (src/fpgaAcc.cpp:224:1), detected/extracted 2 process function(s):
'ProcessingElement::processInputChannel.0_Loop_load_pixel_2_PE_row_loop_proc5'
'ProcessingElement::processAll_channelOut'.
INFO: [HLS 200-111] Finished Pre-synthesis Time (s): cpu = 00:00:49 ; elapsed = 00:01:25 . Memory (MB): peak = 489.633 ; gain = 141.371 ; free physical = 291 ; free virtual = 32635
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-1.1' (src/fpgaAcc.cpp:283:18) in function 'convolution_3x3' :
the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-1' (src/fpgaAcc.cpp:280:18) in function 'convolution_3x3' :
the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' :
the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' :
the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'L_DRAM_PRELOADROW_X' (src/fpgaAcc.cpp:329:77) in function 'convolution_3x3' :
the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-4.1' (src/fpgaAcc.cpp:93:3) in function 'convolution_3x3' :
the outer loop is not a perfect loop.
WARNING: [XFORM 203-542] Cannot flatten a loop nest 'Loop-4' (src/fpgaAcc.cpp:91:6) in function 'convolution_3x3' :
more than one sub loop.
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processInputChannel.0_Loop_load_pixel_2_PE_row_loop_proc5' to 'processInputChannel.' (src/fpgaAcc.cpp:207:3)
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processInputChannel.0' to 'processInputChannel..1' (src/fpgaAcc.cpp:226:1)
WARNING: [XFORM 203-631] Renaming function 'ProcessingElement::processAll_channelOut' to 'processAll_channelOu' (src/fpgaAcc.cpp:192:43)
INFO: [XFORM 203-811] Inferring bus burst read of variable length on port 'memorybus' (src/fpgaAcc.cpp:178:15).
WARNING: [XFORM 203-562] Loop 'L_CH_OUT' (src/fpgaAcc.cpp:241) in function 'processAll_channelOu' has unknown bound because it has multiple exiting blocks.
WARNING: [XFORM 203-713] Function 'processInputChannel..1' (src/fpgaAcc.cpp:226:1) failed dataflow checking: A dataflow region cannot be instantiated from with a pipelined loop (src/fpgaAcc.cpp:226:1). Ignoring pipeline directive to allow the dataflow directive to take precedence. This behavior can be disabled by using 'config_compile -disable_dataflow_pipeline_check'.
Instruction does not dominate all uses!
%tmp_57 = add i32 %WeightsCache_inChan_1, %tmp_56
%memorybus_addr_rd_re = call i1 @_ssdm_op_ReadReq.m_axi.floatP(float* %memorybus_addr, i32 %tmp_57), !dbg !1031
Broken module found, compilation aborted!
Stack dump:
0. Running pass 'Function Pass Manager' on module '/home/osrc/Desktop/document/conv_Core/HLS_Conv/conv3x3_IPcore/solution1/.autopilot/db/a.o.2.bc'.
1. Running pass 'Module Verifier' on function '@convolution_3x3'
/mnt/workspace/Xilinx/Vivado/2017.4/bin/loader: line 194: 13582 Aborted (core dumped) "$RDI_PROG" "$@"
Finished C synthesis.
虽有其他报错,但是我们关于接口的问题已经调试通过。接口在IPcore端的HLS完成
在进行FPGA测试时,发现一个bug,必须给程序加一个return值,否则无法判断IPcore是否完成。
所以我们需要将卷积加一个返回值。这样才会生成下面这样的驱动的函数:
while (!XMigtester_IsDone(&XMigtesterCore));
result=XMigtester_Get_return(&XMigtesterCore);
所以我们将卷积加一个return值。