By Toradex秦海
1). 简介
NXP iMX8系列ARM处理器是NXP近几年新发布的产品,架构均升级到了64bit的ARMv8,其中包含了iMX8,iMX8x,iMX8M Mini,iMX8M Plus等一系列处理器,其基本参数属性的对比可以参考下面来自于NXP官网的表格,而本文就从CPU核心、GPU核心、内存性能等几个方面对于iMX8系列的不同产品做一个简单对比测试供参考。需要注意的是Benchmark测试无法做到绝对完整客观,通常和软硬件系统配置以及所采用的Benchmark工具有很大的相关性,不同配置或工具可能得出不同结论,因此本文的测试只是基于本文测试环境以及所用Benchmark工具的参考数据。
本文所采用的全部测试平台均来自于Toradex iMX8 系列ARM核心板和对应载板。
2). 准备
a). Apalis iMX8QM 4GB WB ITARM核心版配合Ioxra 载板,连接调试串口UART1(载板X22)到开发主机方便调试。
b). Colibri iMX8QXP 2GB WB ITARM核心板配合Colibri Evaluation Board载板,连接调试串口UART1(载板X27)到开发主机方便调试。
c). Verdin iMX8M Mini Quad 2GB WB ITARM核心板配合Verdin Development Board载板,连接调试串口UART1(载板X66)到开发主机方便调试。
d). Verdin iMX8M Plus Quad 4GB WB ITARM核心板配合Verdin Development Board载板,连接调试串口UART1(载板X66)到开发主机方便调试。
3). 预设软件测试环境
a). 所有测试模块均更新为Toradex Ycoto Linux Reference Multimedia Image5.3.0 Quarterly 版本。
b). 参考这里的说明在Linux下将所有测试模块的CPU核心工作模式设置为 ”performance”,也就是持续运行在最高主频。
-------------------------------
$ echo performance > /sys/devices/system/cpu/
-------------------------------
4). CPU单核性能对比测试
a).测试工具软件 nbench,测试Linux BSP自带
b).测试方法
-------------------------------
### 进入/usr/bin目录,因为测试需要NNET.DAT这个文件
$ cd /usr/bin
### 对于Apalis iMX8QM,分别测试A72核心和A53核心
$ taskset -c 4,5 nbench&& taskset -c 0-3 nbench
### 对于其他模块
$ nbench
-------------------------------
c).测试结果
./ Apalis iMX8QM A72
-------------------------------
root@apalis-imx8:/usr/bin# taskset -c 4,5 nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 695.3 : 17.83 : 5.86
STRING SORT : 345.36 : 154.32 : 23.89
BITFIELD : 2.2166e+08 : 38.02 : 7.94
FP EMULATION : 198.55 : 95.27 : 21.98
FOURIER : 47074 : 53.54 : 30.07
ASSIGNMENT : 23.827 : 90.67 : 23.52
IDEA : 6381.3 : 97.60 : 28.98
HUFFMAN : 1921.2 : 53.27 : 17.01
NEURAL NET : 19.353 : 31.09 : 13.08
LU DECOMPOSITION : 1125.2 : 58.29 : 42.09
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 64.608
FLOATING-POINT INDEX: 45.949
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
-------------------------------
./ Apalis iMX8QM A53
-------------------------------
root@apalis-imx8:/usr/bin# taskset -c 0-3 nbench
BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 432.04 : 11.08 : 3.64
STRING SORT : 142.03 : 63.46 : 9.82
BITFIELD : 1.8588e+08 : 31.88 : 6.66
FP EMULATION : 94.362 : 45.28 : 10.45
FOURIER : 19599 : 22.29 : 12.52
ASSIGNMENT : 11.424 : 43.47 : 11.28
IDEA : 2943.3 : 45.02 : 13.37
HUFFMAN : 998.72 : 27.69 : 8.84
NEURAL NET : 6.9556 : 11.17 : 4.70
LU DECOMPOSITION : 462.88 : 23.98 : 17.32
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 34.226
FLOATING-POINT INDEX: 18.143
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
-------------------------------
./ Colibri iMX8X A35
-------------------------------
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 370.76 : 9.51 : 3.12
STRING SORT : 103.29 : 46.15 : 7.14
BITFIELD : 1.3321e+08 : 22.85 : 4.77
FP EMULATION : 70.273 : 33.72 : 7.78
FOURIER : 16962 : 19.29 : 10.83
ASSIGNMENT : 9.5353 : 36.28 : 9.41
IDEA : 2319.5 : 35.48 : 10.53
HUFFMAN : 814.55 : 22.59 : 7.21
NEURAL NET : 5.9326 : 9.53 : 4.01
LU DECOMPOSITION : 408.75 : 21.18 : 15.29
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 26.762
FLOATING-POINT INDEX: 15.731
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
-------------------------------
./ Verdin iMX8M Mini A53
-------------------------------
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 576.19 : 14.78 : 4.85
STRING SORT : 189.54 : 84.69 : 13.11
BITFIELD : 2.4316e+08 : 41.71 : 8.71
FP EMULATION : 125.9 : 60.41 : 13.94
FOURIER : 26154 : 29.74 : 16.71
ASSIGNMENT : 15.263 : 58.08 : 15.06
IDEA : 3927 : 60.06 : 17.83
HUFFMAN : 1332.7 : 36.96 : 11.80
NEURAL NET : 9.2744 : 14.90 : 6.27
LU DECOMPOSITION : 621.37 : 32.19 : 23.24
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 45.545
FLOATING-POINT INDEX: 24.252
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
-------------------------------
./ Verdin iMX8M Plus A53
-------------------------------
TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 576.66 : 14.79 : 4.86
STRING SORT : 189.55 : 84.69 : 13.11
BITFIELD : 2.429e+08 : 41.67 : 8.70
FP EMULATION : 125.89 : 60.41 : 13.94
FOURIER : 26154 : 29.74 : 16.71
ASSIGNMENT : 15.209 : 57.87 : 15.01
IDEA : 3927.1 : 60.06 : 17.83
HUFFMAN : 1332.7 : 36.96 : 11.80
NEURAL NET : 9.2744 : 14.90 : 6.27
LU DECOMPOSITION : 615.61 : 31.89 : 23.03
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 45.520
FLOATING-POINT INDEX: 24.177
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
-------------------------------
d). 测试结果图表展示
5). CPU多核性能对比测试一
a).测试工具软件 7zip,需通过Ycoto环境编译
-------------------------------
### option-1 IPK package ###
### on compiling server
$ bitbake p7zip
$ scp deploy/ipk/aarch64/p7zip_16.02-r0_aarch64.ipk root@
### on target device
$ cd /home/root && opkg install p7zip_16.02-r0_aarch64.ipk
### option-2 modify conf/local.conf to include 7zip into image
IMAGE_INSTALL_append = " p7zip"
-------------------------------
b).测试方法
-------------------------------
### 为了将多核尽可能满负荷,benchmark threads数量设置根据memory容量1.5或者2倍于设备CPU核心数
### 对于Apalis iMX8QM
$ 7z b -mmt12
### 对于其他模块
$ 7z b -mmt6
-------------------------------
c).测试结果
./ Apalis iMX8 2xA72+4xA53
-------------------------------
root@apalis-imx8:~# 7z b -mmt12
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,6 CPUs LE)
LE
CPU Freq: - - - - - - - - -
RAM size: 3713 MB, # CPU hardware threads: 6
RAM usage: 2647 MB, # Benchmark threads: 12
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 4431 528 817 4311 | 85954 583 1257 7331
23: 4342 540 819 4424 | 84676 586 1251 7326
24: 4534 576 846 4876 | 83469 588 1246 7326
25: 4478 580 882 5114 | 82760 591 1247 7365
---------------------------------- | ------------------------------
Avr: 556 841 4681 | 587 1250 7337
Tot: 571 1046 6009
-------------------------------
./ Colibri iMX8X 4xA35
-------------------------------
root@colibri-imx8x:~# 7z b -mmt6
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)
LE
CPU Freq: 64000000 - - - - - - - -
RAM size: 1775 MB, # CPU hardware threads: 4
RAM usage: 1323 MB, # Benchmark threads: 6
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 1885 362 506 1835 | 35207 359 837 3003
23: 1860 364 521 1896 | 36510 377 838 3159
24: 1890 378 538 2033 | 35043 368 836 3076
25: 1870 378 566 2136 | 33807 359 838 3009
---------------------------------- | ------------------------------
Avr: 371 533 1975 | 366 837 3062
Tot: 368 685 2518
-------------------------------
./ Verdin iMX8M Mini 4xA53
-------------------------------
root@verdin-imx8mm:~# 7z b -mmt6
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)
LE
CPU Freq: - - - - - - - - -
RAM size: 1982 MB, # CPU hardware threads: 4
RAM usage: 1323 MB, # Benchmark threads: 6
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 2587 368 683 2517 | 62071 380 1393 5293
23: 2575 376 698 2624 | 58488 364 1389 5061
24: 2608 384 730 2805 | 56726 360 1384 4979
25: 2547 378 769 2908 | 56061 359 1390 4989
---------------------------------- | ------------------------------
Avr: 377 720 2713 | 366 1389 5081
Tot: 371 1054 3897
-------------------------------
./ Verdin iMX8M Plus 4xA53
-------------------------------
root@verdin-imx8mp:~# 7z b -mmt6
7-Zip (a) [64] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=C,Utf16=off,HugeFiles=on,64 bits,4 CPUs LE)
LE
CPU Freq: 64000000 - - 64000000 - 256000000 - - -
RAM size: 3635 MB, # CPU hardware threads: 4
RAM usage: 1323 MB, # Benchmark threads: 6
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 3100 377 801 3016 | 61989 371 1426 5286
23: 3035 375 824 3093 | 63639 386 1427 5507
24: 3035 379 862 3264 | 63178 389 1427 5545
25: 3004 377 910 3431 | 57788 360 1430 5143
---------------------------------- | ------------------------------
Avr: 377 849 3201 | 376 1427 5370
Tot: 377 1138 4286
-------------------------------
d). 测试结果图表展示
6). CPU多核性能对比测试二
a).测试工具软件 smallpt,需下载源码编译,详细的下载编译使用请参考这里
b).测试方法
-------------------------------
### 对于Apalis iMX8QM
$ taskset -c 0-5 time ./smallpt 100
### 对于其他模块
$ taskset -c 0-3 time ./smallpt 100
-------------------------------
c).测试结果
./ Apalis iMX8QM 2xA72+4xA53
-------------------------------
root@apalis-imx8:~# taskset -c 0-5 time ./smallpt 100
Rendering (100 spp) 100.00%
real 1m 34.80s
user 9m 23.44s
sys 0m 0.03s
-------------------------------
./ Colibri iMX8X 4xA35
-------------------------------
root@colibri-imx8x-06748681:~# taskset -c 0-3 time ./smallpt 100
Rendering (100 spp) 100.00%
real 3m 52.67s
user 15m 18.40s
sys 0m 0.13s
-------------------------------
./ Verdin iMX8M Mini 4xA53
-------------------------------
root@verdin-imx8mm:~# taskset -c 0-3 time ./smallpt 100
Rendering (100 spp) 100.00%
real 2m 30.11s
user 9m 56.64s
sys 0m 0.06s
-------------------------------
./ Verdin iMX8M Plus A53
-------------------------------
root@verdin-imx8mp:~# taskset -c 0-3 time ./smallpt 100
Rendering (100 spp) 100.00%
real 2m 30.05s
user 9m 56.26s
sys 0m 0.07s
-------------------------------
d). 测试结果图表展示
7). 内存性能测试
a).测试工具软件 stream,测试Linux BSP自带。另外BSP中还有其他内存测试工具,比如tinymembench,有兴趣的读者可以自行测试。
b).测试方法
-------------------------------
### 对于Apalis iMX8QM,分别测试A72核心和A53核心
$ taskset -c 4-5 stream && taskset -c 0-3 stream
### 对于其他模块
$ taskset -c 0-3 stream
-------------------------------
c).测试结果
./ Apalis iMX8QM 2xA72
-------------------------------
root@apalis-imx8:~# taskset -c 4-5 stream
STREAM copy latency: 2.60 nanoseconds
STREAM copy bandwidth: 6159.37 MB/sec
STREAM scale latency: 2.57 nanoseconds
STREAM scale bandwidth: 6232.96 MB/sec
STREAM add latency: 4.52 nanoseconds
STREAM add bandwidth: 5312.67 MB/sec
STREAM triad latency: 4.52 nanoseconds
STREAM triad bandwidth: 5305.04 MB/sec
-------------------------------
./ Apalis iMX8QM 4xA53
-------------------------------
root@apalis-imx8:~# taskset -c 0-3 stream
STREAM copy latency: 5.88 nanoseconds
STREAM copy bandwidth: 2722.48 MB/sec
STREAM scale latency: 7.44 nanoseconds
STREAM scale bandwidth: 2149.96 MB/sec
STREAM add latency: 13.36 nanoseconds
STREAM add bandwidth: 1796.68 MB/sec
STREAM triad latency: 15.41 nanoseconds
STREAM triad bandwidth: 1557.53 MB/sec
-------------------------------
./ Colibri iMX8X 4xA35
-------------------------------
root@colibri-imx8x-06748681:~# taskset -c 0-3 stream
STREAM copy latency: 5.94 nanoseconds
STREAM copy bandwidth: 2695.42 MB/sec
STREAM scale latency: 8.78 nanoseconds
STREAM scale bandwidth: 1821.91 MB/sec
STREAM add latency: 13.61 nanoseconds
STREAM add bandwidth: 1763.93 MB/sec
STREAM triad latency: 15.72 nanoseconds
STREAM triad bandwidth: 1526.23 MB/sec
-------------------------------
./ Verdin iMX8M Mini 4xA53
-------------------------------
root@verdin-imx8mm:~# taskset -c 0-3 stream
STREAM copy latency: 4.17 nanoseconds
STREAM copy bandwidth: 3839.23 MB/sec
STREAM scale latency: 5.46 nanoseconds
STREAM scale bandwidth: 2928.26 MB/sec
STREAM add latency: 9.13 nanoseconds
STREAM add bandwidth: 2628.41 MB/sec
STREAM triad latency: 10.70 nanoseconds
STREAM triad bandwidth: 2242.15 MB/sec
-------------------------------
./ Verdin iMX8M Plus 4xA53
-------------------------------
root@verdin-imx8mp:~# taskset -c 0-3 stream
STREAM copy latency: 3.06 nanoseconds
STREAM copy bandwidth: 5226.20 MB/sec
STREAM scale latency: 5.30 nanoseconds
STREAM scale bandwidth: 3020.58 MB/sec
STREAM add latency: 8.58 nanoseconds
STREAM add bandwidth: 2797.53 MB/sec
STREAM triad latency: 10.22 nanoseconds
STREAM triad bandwidth: 2348.11 MB/sec
-------------------------------
d). 测试结果图表展示
8). GPU性能测试
a).测试工具软件glmark2-es2-wayland,测试Linux BSP自带。Apalis iMX8/Colibr iiMX8X连接LVDS液晶屏,分辨率为1280x800; Verdin iMX8MM/iMX8MP连接HDMI显示器,分辨率1920x1080。
b).测试方法
-------------------------------
### 保证测试输出分辨率统一为 1280x800 ###
### Apalis iMX8/Colibri iMX8X
$ glmark2-es2-wayland --fullscreen
### Verdin iMX8MM/iMX8MP
$ glmark2-es2-wayland --size 1280x800
-------------------------------
c).测试结果
./ Apalis iMX8QM 2x GC7000XSVX GPU
-------------------------------
root@apalis-imx8:~# glmark2-es2-wayland --fullscreen
EGL: Warning: No default display support on wayland
=======================================================
glmark2 2017.07
=======================================================
OpenGL Information
GL_VENDOR: Vivante Corporation
GL_RENDERER: Vivante GC7000XSVX
GL_VERSION: OpenGL ES 3.2 V6.4.3.p1.305572
=======================================================
......
=======================================================
glmark2 Score: 1308
=======================================================
-------------------------------
./ Colibri iMX8X 1x GC7000L GPU
-------------------------------
root@colibri-imx8x-06748681:~# glmark2-es2-wayland --fullscreen
EGL: Warning: No default display support on wayland
=======================================================
glmark2 2017.07
=======================================================
OpenGL Information
GL_VENDOR: Vivante Corporation
GL_RENDERER: Vivante GC7000L
GL_VERSION: OpenGL ES 3.1 V6.4.3.p1.305572
=======================================================
......
=======================================================
glmark2 Score: 516
=======================================================
-------------------------------
./ Verdin iMX8M Mini 1x GC7000NanoUltra GPU
-------------------------------
root@verdin-imx8mm:~# glmark2-es2-wayland --size 1280x800
EGL: Warning: No default display support on wayland
=======================================================
glmark2 2017.07
=======================================================
OpenGL Information
GL_VENDOR: Vivante Corporation
GL_RENDERER: Vivante GC7000NanoUltra
GL_VERSION: OpenGL ES 2.0 V6.4.3.p1.305572
=======================================================
......
=======================================================
glmark2 Score: 165
=======================================================
-------------------------------
./ Verdin iMX8M Plus 1x GC7000UL GPU
-------------------------------
root@verdin-imx8mp:~# glmark2-es2-wayland --size 1280x800
EGL: Warning: No default display support on wayland
=======================================================
glmark2 2017.07
=======================================================
OpenGL Information
GL_VENDOR: Vivante Corporation
GL_RENDERER: Vivante GC7000UL
GL_VERSION: OpenGL ES 3.1 V6.4.3.p1.305572
=======================================================
......
=======================================================
glmark2 Score: 521
=======================================================
-------------------------------
d). 测试结果图表展示
9). 总结
本文从CPU、GPU、内存等几个方面简单对NXP iMX8系列嵌入式ARM处理器家族产品进行了一些benchmark测试,从结果可以看到iMX8QM各方面性能都是非常领先的;iMX8X强调功耗性能比,非常均衡;iMX8M Mini和iMX8M Plus在CPU方面基本相当,GPU则是iMX8M Plus领先巨大,同时iMX8M Plus还拥有一个神经网络算法加速的NPU核心,综合下来iMX8M Plus性能还是好于iMX8M Mini。
参考文献
https://developer.toradex.cn/knowledge-base/board-support-package/openembedded-core
https://developer.toradex.cn/knowledge-base/toradex-easy-installer-os-and-demo-images