【XRT Vitis-Tutorials】cl调度优化

1 前言

前面文章导航:
ZCU106 XRT环境搭建
ZCU106 XRT Vivado工程分析
ZCU106 XRT PetaLinux工程分析
【XRT Vitis-Tutorials】RTL Kernels测试
【XRT Vitis-Tutorials】C++/RTL Kernel混合编程测试
【XRT Vitis-Tutorials】图像并行计算

官方文档:
2019.2 Vitis™ Application Acceleration Development Flow Tutorials
Vitis Unified Software Platform Documentation Application Acceleration Development
Vitis Unified Software Platform Documentation Embedded Software Development

Vitis ZCU106 Platform
ZCU106 Vitis Platform

pre-built,直接下载并复制到SD卡即可测试:
ZCU106 Test Image

2 创建Vitis工程

本篇文章来测试Tutorials中的第4个例子:Host Code Optimization
该例子中进行了多个实验,用于优化pipeline的调度
本例子将会进行如下测试:

  • 使用hardware实测方法,直接使用测试各种模式

2.1 工程创建

我为了方便还是使用GUI的方法吧。

2.1.1 新建工程

在Vitis中创建一个新的Application Project,平台选择zcu106vcu_base。

2.1.2 添加源代码

我们将需要编译的内容直接添加到src目录下,包括:
src/srcCommon
src/srcKernel
src/srcPipeline
最终的工程目录结构如下图:
【XRT Vitis-Tutorials】cl调度优化_第1张图片

2.1.3 代码分析

代码都比较简单就不分析了

Task.h

ApiHandle.h

host.cpp

2.3 上板测试

2.3.1 编译

该工程中直接选择Hardware进行编译即可,container将cu数量改为3

改为3貌似导致后面的测试带宽没有明显区别,下次研究

2.3.2 测试验证

将固件复制到SD卡,然后运行命令进行测试

运行测试:

SCRIPTPATH4=$(dirname $BASH_SOURCE
echo ${SCRIPTPATH4
cp ${SCRIPTPATH4}/platform_desc.txt /etc/xocl.
export XILINX_XRT=/usr
root@zcu106vcu_base:~#
root@zcu106vcu_base:~# /mnt/host_opt_pipe_order.exe /mnt/pass_containe.xclbin
DEVICE: zcu106vcu_base
Loading Bitstream: /mnt/pass_containe.xclbin
INFO: Loaded file
Create Kernel: pass
Create Sequential Queue
Setup Complete


 Total number of buffers: 10
              BufferSize: 16384
        Bits per Element: 512
      Bytes per Transfer: 1048576
            processDelay: 1
      Out of Order Queue: false

Running FPGA

          Total data: 80 MBits
           FPGA Time: 0.0283075 s
     FPGA Throughput: 2826.11 MBits/s
FPGA PCIe Throughput: 5652.22 MBits/s

PASS: Simulation
root@zcu106vcu_base:~#
root@zcu106vcu_base:~#
root@zcu106vcu_base:~# /mnt/host_opt_pipe_ooo.exe /mnt/pass_containe.xclbin
DEVICE: zcu106vcu_base
Loading Bitstream: /mnt/pass_containe.xclbin
INFO: Loaded file
Create Kernel: pass
Create Out of Order Queue
Setup Complete


 Total number of buffers: 10
              BufferSize: 16384
        Bits per Element: 512
      Bytes per Transfer: 1048576
            processDelay: 1
      Out of Order Queue: true

Running FPGA

          Total data: 80 MBits
           FPGA Time: 0.0282436 s
     FPGA Throughput: 2832.5 MBits/s
FPGA PCIe Throughput: 5665 MBits/s

PASS: 
root@zcu106vcu_base
root@zcu106vcu_base:~# /mnt/host_opt_sync_period.exe /mnt/pass_containe.xclbin
DEVICE: zcu106vcu_base
Loading Bitstream: /mnt/pass_containe.xclbin
INFO: Loaded file
Create Kernel: pass
Create Out of Order Queue
Setup Complete


 Total number of buffers: 10
              BufferSize: 16384
        Bits per Element: 512
      Bytes per Transfer: 1048576
            processDelay: 1
      Out of Order Queue: true

Running FPGA

          Total data: 80 MBits
           FPGA Time: 0.0327634 s
     FPGA Throughput: 2441.75 MBits/s
FPGA PCIe Throughput: 4883.5 MBits/s

PASS: Simulation
root@zcu106vcu_base:~# /mnt/host_opt_sync_pre.exe /mnt/pass_containe.xclbin
DEVICE: zcu106vcu_base
Loading Bitstream: /mnt/pass_containe.xclbin
INFO: Loaded file
Create Kernel: pass
Create Out of Order Queue
Setup Complete


 Total number of buffers: 10
              BufferSize: 16384
        Bits per Element: 512
      Bytes per Transfer: 1048576
            processDelay: 1
      Out of Order Queue: true

Running FPGA

          Total data: 80 MBits
           FPGA Time: 0.027947 s
     FPGA Throughput: 2862.56 MBits/s
FPGA PCIe Throughput: 5725.12 MBits/s

PASS: Simulation
root@zcu106vcu_base:~#
root@zcu106vcu_base:~#
root@zcu106vcu_base:~#
root@zcu106vcu_base:~# /mnt/host_opt_buf.exe /mnt/pass_containe.xclbin 10
DEVICE: zcu106vcu_base
Loading Bitstream: /mnt/pass_containe.xclbin
INFO: Loaded file
Create Kernel: pass
Create Out of Order Queue
Setup Complete


 Total number of buffers: 100
              BufferSize: 1024
        Bits per Element: 512
      Bytes per Transfer: 65536
            processDelay: 1
      Out of Order Queue: true

Running FPGA

          Total data: 50 MBits
           FPGA Time: 0.0222115 s
     FPGA Throughput: 2251.08 MBits/s
FPGA PCIe Throughput: 4502.17 MBits/s

PASS: Simulation
root@zcu106vcu_base:~#
root@zcu106vcu_base:~#
root@zcu106vcu_base:~# /mnt/host_opt_buf.exe /mnt/pass_containe.xclbin 15
DEVICE: zcu106vcu_base
Loading Bitstream: /mnt/pass_containe.xclbin
INFO: Loaded file
Create Kernel: pass
Create Out of Order Queue
Setup Complete


 Total number of buffers: 100
              BufferSize: 32768
        Bits per Element: 512
      Bytes per Transfer: 2097152
            processDelay: 1
      Out of Order Queue: true

Running FPGA

          Total data: 1600 MBits
           FPGA Time: 0.437819 s
     FPGA Throughput: 3654.47 MBits/s
FPGA PCIe Throughput: 7308.95 MBits/s

PASS: Simulation
root@zcu106vcu_base:~#

3 总结

使用Vitis和自定义的ZCU106 XRT平台完成了Vitis-Tutorials中的Host Code Optimization功能测试。

你可能感兴趣的:(XRT,PetaLinux)