【arm】aarch64汇编优化demo以及arm32与aarch64命令格式区别

2018.7.29 周末


1、arm32架构gnu asm命令格式:
V{<mod>}<op>{<shape>}{<cond>}{.<dt>}{<dest>}, src1, src2

Where:

<\mod> - modifiers

Q: The instruction uses saturating arithmetic, so that the result is saturated within the range of the specified data type, such as VQABS, VQSHL etc.
H: The instruction will halve the result. It does this by shifting right by one place (effectively a divide by two with truncation), such as VHADD, VHSUB.
D: The instruction doubles the result, such as VQDMULL, VQDMLAL, VQDMLSL and VQ{R}DMULH
R: The instruction will perform rounding on the result, equivalent to adding 0.5 to the result before truncating, such as VRHADD, VRSHR.
<\op> - the operation (for example, ADD, SUB, MUL).

<\shape> - Shape.

Neon data processing instructions are typically available in Normal, Long, Wide and Narrow variants.

Long (L): instructions operate on double-word vector operands and produce a quad-word vector result. The result elements are twice the width of the operands, and of the same type. Lengthening instructions are specified using an L appended to the instruction.
Neon data processing instruction long

Wide (W): instructions operate on a double-word vector operand and a quad-word vector operand, producing a quad-word vector result. The result elements and the first operand are twice the width of the second operand elements. Widening instructions have a W appended to the instruction.
Neon data processing instruction wide

Narrow (N): instructions operate on quad-word vector operands, and produce a double-word vector result. The result elements are half the width of the operand elements. Narrowing instructions are specified using an N appended to the instruction.

2、aarch64架构gnu asm命令格式:

In the AArch64 execution state, the syntax of NEON instruction has changed. It can be described as follows:

{<prefix>}<op>{<suffix>}  Vd.<T>, Vn.<T>, Vm.<T>

Where:

<\prefix> - prefix, such as using S/U/F/P to represent signed/unsigned/float/bool data type.

<\op> – operation, such as ADD, AND etc.

<\suffix> - suffix

P: “pairwise” operations, such as ADDP.
V: the new reduction (across-all-lanes) operations, such as FMAXV.
2: new widening/narrowing “second part” instructions, such as ADDHN2, SADDL2.
ADDHN2: add two 128-bit vectors and produce a 64-bit vector result which is stored as high 64-bit part of NEON register.

SADDL2: add two high 64-bit vectors of NEON register and produce a 128-bit vector result.

<\T> - data type, 8B/16B/4H/8H/2S/4S/2D. B represents byte (8-bit). H represents half-word (16-bit). S represents word (32-bit). D represents a double-word (64-bit).

3、aarch64 neon assembly demo
.text

    .align 4

    .global add_float_neon2

    .type add_float_neon2, %function

add_float_neon2:

.L_loop:

    ld1     {v0.4s}, [x1], #16

    ld1     {v1.4s}, [x2], #16

    fadd    v0.4s, v0.4s, v1.4s

    subs x3, x3, #4

    st1  {v0.4s}, [x0], #16

    bgt .L_loop

    ret

参考:

https://community.arm.com/android-community/b/android/posts/arm-neon-programming-quick-reference

https://community.arm.com/android-community/b/android/posts/arm-neon-optimization


THE END!

转载于:https://www.cnblogs.com/SoaringLee/p/10532405.html

你可能感兴趣的:(【arm】aarch64汇编优化demo以及arm32与aarch64命令格式区别)