Neon

姒�蹇碉�1锛�SIMDsingle instruction multiple data璧峰����ARM澶����ㄤ���搴��ㄦ���轰����ㄥ��瀛��ㄧ��锛�涓���浠ゆ������涔���锛�婕���涓�Advanced SIMD extension锛�涓��ㄦ����ARM Cortex-A绯诲��澶����ㄤ�����瀹��般��Neon涓�VFP��Neon涓�VFP���舵����锛�浼��辩�ㄥ��涓�缁�瀵�瀛���Neon��浠ゅ���斤�memory accessesdata copying between NEON and general purpose registersdata type conversiondata processing�������版��绫诲������8-bit,16-bit锛�32-bit锛�64-bit signed and unsigned integers��涔�����32-bit single precision floating point elements锛�8-bit or 16-bit polynomialsNeon��瀵�瀛���Q0~Q15,128bitD0~D31锛� 63bitNeon��浠ゅ��浠�V寮�澶村�VADD.I16 q0, q1, q2寮����规�锛�1锛��存�ュ��姹�缂�锛�2锛�����Neon C寮���锛������芥��#include缂�璇��戒护涓�娣诲��

-mfpu=neon

锛�3锛�automatic vectorization

锛�4锛�Using Neon optimized libraries

OpenMAX

Neon涓�娆″�戒护澶������版��涓��版����瀹���锛�浣����ㄥ����搴��ㄤ腑锛�缁�甯镐����版��涓���澶���涓��扮���存�板������甯哥��瑙e�虫�规���瀵瑰�╀���绱�����澶���

锛�1锛�Larger Arrays

瑕�娉ㄦ���版������濮���锛��叉�㈠奖����缁�缁���

锛�2锛�Overlapping

��浜��版��澶���涓ゆ��

锛�3锛�Single Element processing

Neon涓���瀵归�

Load and Store addresses must be aligned to cache lines to allow more efficient memory access.

Cortex-A8涓�涓�16涓�word

Avoid writing to the same area of memory, specifically the same cache line, from both ARM

and NEON code.

To obtain best performance from hand-written NEON code, it is necessary to be aware of some

underlying hardware features. In particular, the programmer should be aware of pipelining and

scheduling issues, memory access behavior and scheduling hazards.

你可能感兴趣的:(Neon)