gcc 自动向量化相关指令
gcc -O3级优化已包括 “-ftree-vectorize” 选项对程序进行自动向量化,关闭向量化的选项是-fno-tree-vectorize。使用-ftree-vectorizer-verbose=n选项可以显示自动向量化的结果,其中n的取值范围为0到9。对于n的解释可参考如下网址:http://blog.csdn.net/waverider2012/article/details/8529257
同时还可以通过命令-mavx -msse -msse2 -msse3生成对应指令集的,其中m代表硬件相关。查看cpu是否支持上述指令集可通过cat /proc/cpuinfo命令。查看cache大小可通过lscpu命令。给出一个博客地址,其对于linux系统相关信息查询总结较为全面:http://www.cnblogs.com/lhj588/archive/2012/05/15/2501007.html
对于“ relocation truncated to fit: R_X86_64_32S against `.bss' ”问题的解决方法:
在网上看到的资料显示是由于数组过大(超过2G)造成链接失败,解决方法是通过添加编译选项“-mcmodel=medium”。
对于该编译选项的解释请见:http://stackoverflow.com/questions/12916176/gfortran-for-dummies-what-does-mcmodel-medium-do-exactly
2016/3/29订正部分内容
在gcc 4.9.2下 “-ftree-vectorize” 与 “-ftree-vectorizer-verbose=n” 编译选项没有作用。启用自动向量化的编译选项为“ -O3”或“ -Ofast”,向量化过程中相关信息输出的选项为“ -fopt-info-vec-missed”。
代码如下:
int a[256], b[256], c[256]; foo () { int i; for (i=0; i<256; i++){ a[i] = b[i] + c[i]; } }
ex1.o:ex1.c gcc -O3 -c -fopt-info-vec-missed ex1.c
gcc -O3 -c -fopt-info-vec-missed ex1.c ex1.c:5:3: note: misalign = 0 bytes of ref b[i_11] ex1.c:5:3: note: misalign = 0 bytes of ref c[i_11] ex1.c:5:3: note: misalign = 0 bytes of ref a[i_11] ex1.c:5:3: note: virtual phi. skip. ex1.c:5:3: note: num. args = 4 (not unary/binary/ternary op). ex1.c:5:3: note: not ssa-name. ex1.c:5:3: note: use not simple. ex1.c:5:3: note: num. args = 4 (not unary/binary/ternary op). ex1.c:5:3: note: not ssa-name. ex1.c:5:3: note: use not simple. ex1.c:2:1: note: not vectorized: not enough data-refs in basic block. ex1.c:6:13: note: not vectorized: no vectype for stmt: vect__4.5_1 = MEM[(int *)vectp_b.3_9]; scalar_type: vector(4) int ex1.c:6:13: note: not vectorized: not enough data-refs in basic block. ex1.c:2:1: note: not vectorized: not enough data-refs in basic block. ex1.c:8:1: note: not vectorized: not enough data-refs in basic block.