2018.7.29 周末
1、arm32架构gnu asm命令格式:
V{<mod>}<op>{<shape>}{<cond>}{.<dt>}{<dest>}, src1, src2
Where:
<\mod> - modifiers
Q: The instruction uses saturating arithmetic, so that the result is saturated within the range of the specified data type, such as VQABS, VQSHL etc.
H: The instruction will halve the result. It does this by shifting right by one place (effectively a divide by two with truncation), such as VHADD, VHSUB.
D: The instruction doubles the result, such as VQDMULL, VQDMLAL, VQDMLSL and VQ{R}DMULH
R: The instruction will perform rounding on the result, equivalent to adding 0.5 to the result before truncating, such as VRHADD, VRSHR.
<\op> - the operation (for example, ADD, SUB, MUL).
<\shape> - Shape.
Neon data processing instructions are typically available in Normal, Long, Wide and Narrow variants.
Long (L): instructions operate on double-word vector operands and produce a quad-word vector result. The result elements are twice the width of the operands, and of the same type. Lengthening instructions are specified using an L appended to the instruction.
Neon data processing instruction long
Wide (W): instructions operate on a double-word vector operand and a quad-word vector operand, producing a quad-word vector result. The result elements and the first operand are twice the width of the second operand elements. Widening instructions have a W appended to the instruction.
Neon data processing instruction wide
Narrow (N): instructions operate on quad-word vector operands, and produce a double-word vector result. The result elements are half the width of the operand elements. Narrowing instructions are specified using an N appended to the instruction.
2、aarch64架构gnu asm命令格式:
In the AArch64 execution state, the syntax of NEON instruction has changed. It can be described as follows:
{<prefix>}<op>{<suffix>} Vd.<T>, Vn.<T>, Vm.<T>
Where:
<\prefix> - prefix, such as using S/U/F/P to represent signed/unsigned/float/bool data type.
<\op> – operation, such as ADD, AND etc.
<\suffix> - suffix
P: “pairwise” operations, such as ADDP.
V: the new reduction (across-all-lanes) operations, such as FMAXV.
2: new widening/narrowing “second part” instructions, such as ADDHN2, SADDL2.
ADDHN2: add two 128-bit vectors and produce a 64-bit vector result which is stored as high 64-bit part of NEON register.
SADDL2: add two high 64-bit vectors of NEON register and produce a 128-bit vector result.
<\T> - data type, 8B/16B/4H/8H/2S/4S/2D. B represents byte (8-bit). H represents half-word (16-bit). S represents word (32-bit). D represents a double-word (64-bit).
3、aarch64 neon assembly demo
.text
.align 4
.global add_float_neon2
.type add_float_neon2, %function
add_float_neon2:
.L_loop:
ld1 {v0.4s}, [x1], #16
ld1 {v1.4s}, [x2], #16
fadd v0.4s, v0.4s, v1.4s
subs x3, x3, #4
st1 {v0.4s}, [x0], #16
bgt .L_loop
ret
参考:
https://community.arm.com/android-community/b/android/posts/arm-neon-programming-quick-reference
https://community.arm.com/android-community/b/android/posts/arm-neon-optimization
THE END!