这两天发现 PXA310 的浮点运算不如 OMAP2420, 研究发现 OMAP2420 支持硬件级 VFP, 而pxa310 不支持硬件级浮点数计算。
按照以前的做法,使用内核的 nwfpe (或者 fastfpe)进行浮点运算模拟:系统运行时发现不支持的指令,于是进入中断陷入序列,然后跳到 nwfpe的软件模拟函数中执行浮点运算,然后返回。
新的 gcc (EABI版本)则支持直接嵌入浮点模拟运算,从而节省了状态切换的时间。
Gcc 的 –mfloat-abi=soft 表示使用gcc内嵌软件模拟。 Softfp 以及 hard 则表示生成硬件vfp 指令。其中softfp 可以和使用soft编译的二进制进行连接,而hard则要求所有代码使用。
从而: 如果系统硬件支持 VFP, 则使用-mfloat-abi=softfp, 如果硬件不支持 VFP, 则使用-mfloat-abi=soft
另外: 最新gcc 针对 PXA CPU会产生更加优化的浮点运算指令,需要使用-march=iwmmxt编译选项。
附注: 发现最新的内核里已经不存在 /arm/arm/fastfpe目录了, 而 nwfpe对于 EABI应该也是过时了:该算法针对 FPA, 对 EABI 支持的 VFP 应该不能正确支持。
参考:http://wiki.debian.org/ArmEabiPort
#include <stdlib.h>
#include <sys/time.h>
#include <time.h>
#define MAX_DIVIDEND 1000000.231
#define MIN_DIVIDEND 0.29
#define STEP_DIVIDEND 0.33
#define DIVISOR 23.0
#define BUFFER_SIZE 200
static void timestamp(const char* buffer) {
static int startSecond = 0;
static int startMs = 0;
struct timeval tv;
int deltaSecond, deltaMs;
gettimeofday(&tv, NULL);
/* Running for the first time? */
if (startSecond == 0) {
/* Copy to prev so that we get 0 delta. */
startSecond = tv.tv_sec;
startMs = tv.tv_usec;
}
/* Calculate the delta (in microseconds). */
deltaSecond = tv.tv_sec - startSecond;
deltaMs = tv.tv_usec - startMs;
/* Create the string giving offset from start in seconds. */
snprintf(buffer, BUFFER_SIZE, "%u.%u",deltaSecond,deltaMs);
}
int main(int argc, char * argv[])
{
double divident, result;
char buffer[BUFFER_SIZE];
timestamp(buffer);
printf("Start time is: %s/n",buffer);
for(divident=MIN_DIVIDEND; divident<MAX_DIVIDEND; divident+=STEP_DIVIDEND)
result = divident/DIVISOR;
timestamp(buffer);
printf("DIV End time is: %s/n",buffer);
for(divident=MIN_DIVIDEND; divident<MAX_DIVIDEND; divident+=STEP_DIVIDEND)
result = divident*DIVISOR;
timestamp(buffer);
printf("MUL End time is: %s/n",buffer);
for(divident=MIN_DIVIDEND; divident<MAX_DIVIDEND; divident+=STEP_DIVIDEND)
result = divident+DIVISOR;
timestamp(buffer);
printf("ADD End time is: %s/n",buffer);
for(divident=MIN_DIVIDEND; divident<MAX_DIVIDEND; divident+=STEP_DIVIDEND)
result = divident-DIVISOR;
timestamp(buffer);
printf("SUB End time is: %s/n",buffer);
return 0;
}
编译器1: 以下为maemo gcc 信息:
[sbox-CHINOOK_ARMEL: ~] > gcc --version
sbox-arm-linux-gcc (GCC) 3.4.4 (release) (CodeSourcery ARM 2005q3-2)
Copyright (C) 2004 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
[sbox-CHINOOK_ARMEL: ~] > gcc -v
Reading specs from /scratchbox/compilers/cs2005q3.2-glibc2.5-arm/bin/../lib/gcc/arm-none-linux-gnueabi/ 3.4.4 /specs
Reading specs from /scratchbox/compilers/cs2005q3.2-glibc2.5-arm/gcc.specs
rename spec cpp to old_cpp
Configured with: /home/kl/cs2005q3-2_toolchain/gcc/glibc/work/gcc-2005q3-2/configure --build=i386-linux --host=i386-linux --target=arm-none-linux-gnueabi --prefix=/scratchbox/compilers/cs2005q3.2-glibc-arm --with-headers=/scratchbox/compilers/cs2005q3.2-glibc-arm/usr/include --enable-languages=c,c++ --enable-shared --enable-threads --disable-checking --enable-symvers=gnu --program-prefix=arm-linux- --with-gnu-ld --enable-__cxa_atexit --disable-libssp --disable-libstdcxx-pch --with-cpu= --enable-interwork
Thread model: posix
gcc version 3.4.4 (release) (CodeSourcery ARM 2005q3-2)
编译器2: 以下为marvell gcc 信息:
tmp>arm-iwmmxt-linux-gnueabi-gcc --version
arm-iwmmxt-linux-gnueabi-gcc (GCC) 4.1.1
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
tmp>arm-iwmmxt-linux-gnueabi-gcc -v
Using built-in specs.
Target: arm-iwmmxt-linux-gnueabi
Configured with: /home1/bridge/toolchain/crosstool/toolchain- 2007-03-19 /build/arm-iwmmxt-linux-gnueabi/gcc- 4.1.1 -glibc-2.5/gcc-4.1.1/configure --target=arm-iwmmxt-linux-gnueabi --host=i686-host_pc-linux-gnu --prefix=/usr/local/bridge/arm-iwmmxt-linux-gnueabi --with-cpu=iwmmxt --with-float=soft --enable-cxx-flags=-msoft-float --with-headers=/usr/local/bridge/arm-iwmmxt-linux-gnueabi/arm-iwmmxt-linux-gnueabi/include --with-local-prefix=/usr/local/bridge/arm-iwmmxt-linux-gnueabi/arm-iwmmxt-linux-gnueabi --disable-nls --enable-threads=posix --enable-symvers=gnu --enable-__cxa_atexit --enable-languages=c,c++ --enable-shared --enable-c99 --enable-long-long
Thread model: posix
gcc version 4.1.1
使用不同的编译器配合不同的编译选项对测试程序进行编译,并分别在 OMAP2420上以及 PXA310上运行, 前三个使用编译器1,最后一个使用编译器2,注意前面三个在scratchbox 中编译,所以没有交叉编译前缀。
gcc -mfloat-abi=soft float.c -o float1
gcc -mfloat-abi=softfp float.c -o float2
gcc -march=iwmmxt float.c -o float3
arm-iwmmxt-linux-gnueabi-gcc float.c -o float4
OMAP2420:/tmp# ./float1
Start time is: 0.0
DIV End time is: 8.4294827617
MUL End time is: 10.303344
ADD End time is: 13.4294875774
SUB End time is: 16.4294558757
OMAP2420:/tmp# ./float1
Start time is: 0.0
DIV End time is: 8.4294494517
MUL End time is: 10.4294921030
ADD End time is: 13.4294482493
SUB End time is: 15.133392
OMAP2420:/tmp# ./float1
Start time is: 0.0
DIV End time is: 7.579528
MUL End time is: 10.4294947215
ADD End time is: 12.556763
SUB End time is: 15.201508
OMAP2420:/tmp# ./float1
Start time is: 0.0
DIV End time is: 8.4294515698
MUL End time is: 10.4294934185
ADD End time is: 13.4294495892
SUB End time is: 16.4294132915
OMAP2420:/tmp# ./float2
Start time is: 0.0
DIV End time is: 1.4294907969
MUL End time is: 2.4294625102
ADD End time is: 3.4294333079
SUB End time is: 4.4294033336
OMAP2420:/tmp# ./float2
Start time is: 0.0
DIV End time is: 1.4294897350
MUL End time is: 2.4294642314
ADD End time is: 3.4294335918
SUB End time is: 4.4294029795
OMAP2420:/tmp# ./float2
Start time is: 0.0
DIV End time is: 1.4294897563
MUL End time is: 1.633240
ADD End time is: 2.331757
SUB End time is: 3.21210
OMAP2420:/tmp# ./float2
Start time is: 0.0
DIV End time is: 1.4294896984
MUL End time is: 1.633728
ADD End time is: 2.328186
SUB End time is: 3.20905
/ # ./float1
Start time is: 0.0
DIV End time is: 4.49465
MUL End time is: 6.4294450290
ADD End time is: 7.14588
SUB End time is: 9.4294547088
/ # ./float1
Start time is: 0.0
DIV End time is: 4.52069
MUL End time is: 5.486351
ADD End time is: 7.17117
SUB End time is: 8.581988
/ # ./float1
Start time is: 0.0
DIV End time is: 4.49788
MUL End time is: 5.483496
ADD End time is: 7.17022
SUB End time is: 9.4294549453
/ # ./float1
Start time is: 0.0
DIV End time is: 4.49902
MUL End time is: 6.4294450916
ADD End time is: 7.14907
SUB End time is: 9.4294547965
/ # ./float3
Start time is: 0.0
DIV End time is: 4.4294864860
MUL End time is: 5.257107
ADD End time is: 7.4294684639
SUB End time is: 8.171667
/ # ./float3
Start time is: 0.0
DIV End time is: 4.4294864869
MUL End time is: 5.257758
ADD End time is: 7.4294682952
SUB End time is: 8.168985
/ # ./float3
Start time is: 0.0
DIV End time is: 4.4294864656
MUL End time is: 5.257443
ADD End time is: 7.4294682639
SUB End time is: 8.168756
/ # ./float3
Start time is: 0.0
DIV End time is: 4.4294863772
MUL End time is: 5.256900
ADD End time is: 6.714551
SUB End time is: 8.169785
/ # ./float4
Start time is: 0.0
DIV End time is: 3.597009
MUL End time is: 5.4294619794
ADD End time is: 6.4294696892
SUB End time is: 7.4294807493
/ # ./float4
Start time is: 0.0
DIV End time is: 4.4294563947
MUL End time is: 5.4294619198
ADD End time is: 6.4294696044
SUB End time is: 7.4294806699
/ # ./float4
Start time is: 0.0
DIV End time is: 4.4294564235
MUL End time is: 5.4294620202
ADD End time is: 6.4294697228
SUB End time is: 7.4294807689
/ # ./float4
Start time is: 0.0
DIV End time is: 4.4294564363
MUL End time is: 5.4294619851
ADD End time is: 6.4294696876
SUB End time is: 7.4294807901
PXA310平台上没有硬件级的浮点数支持,我们应该通过添加 –mfloat-abi=soft –march=iwmmxt等编译选项尽量优化浮点性能。