



void vecsum4(short *restrict sum, restrict short *in1, restrict short *in2,unsigned N)


   int i;

   #pragma MUST_ITERATE(10);





 #pragma MUST_ITERATE(10)说明下面的循环至少要执行10次。这个信息对软件流水至关重要。





void vecsum5(short *restrict sum, const short *restrict in1,short *restrict in2,unsigned int N)


  int i;

/* test to see if sum ,in2 and in1 are aligned to a word boundary*/

if(((int)sum| (int)in2 |(int) in1) &0x02)


 #pragma MUST_ITERATE(20);






 #pragma MUST_ITERATE(10);







The following example shows an example that can benefit from the packed compare and expand intrinsics in action. The Clear Below Threshold kernel scans an image of 8-bit unsigned pixels, and sets all pixels that are below a certain threshold to 0.
Clear Below Threshold Kernel

void clear_below_thresh(unsigned char *restrict image, int count, unsigned char threshold)


    int i;

    for (i = 0; i < count; i++)


        if (image[i] <= threshold) image[i] = 0;


Vectorization techniques are applied to the code (as described Packed-Data Processing on the C64x), giving the result shown in the following example. The _cmpgtu4() intrinsic compares against the threshold values, and the _xpnd4() intrinsic generates a mask for setting pixels to 0. Note that the new code has the restriction that the input image must be double-word aligned, and must contain a multiple of 8 pixels. These restrictions are reasonable as common image sizes have a multiple of 8 pixels.

Clear Below Threshold Kernel, Using _cmpgtu4 and _xpnd4 Intrinsics

void clear_below_thresh(unsigned char *restrict image, int count, unsigned char threshold)


 int i;

 unsigned t3_t2_t1_t0; /* Threshold (replicated) */

 unsigned p7_p6_p5_p4, p3_p2_p1_p0; /* Pixels */

 unsigned c7_c6_c5_c4, c3_c2_c1_c0; /* Comparison results */

 unsigned x7_x6_x5_x4, x3_x2_x1_x0; /* Expanded masks */

 /* Replicate the threshold value four times in a single word */ unsigned temp = _pack2(threshold, threshold);

t3_t2_t1_t0 = _packl4(temp, temp);

for (i = 0; i < count; i += 8)


/* Load 8 pixels from input image (one double-word). */

p7_p6_p5_p4 = _hi(_amemd8(&image[i]));

 p3_p2_p1_p0 = _lo(_amemd8(&image[i]));

/* Compare each of the pixels to the threshold. */

c7_c6_c5_c4 = _cmpgtu4(p7_p6_p5_p4, t3_t2_t1_t0);

c3_c2_c1_c0 = _cmpgtu4(p3_p2_p1_p0, t3_t2_t1_t0);

/* Expand the comparison results to generate a bitmask. */

 x7_x6_x5_x4 = _xpnd4(c7_c6_c5_c4);

x3_x2_x1_x0 = _xpnd4(c3_c2_c1_c0);

/* Apply mask to the pixels. Pixels that were less than or */

/* equal to the threshold will be forced to 0 because the */

/* corresponding mask bits will be all 0s. The pixels that */

/* were greater will not be modified, because their mask */

/* bits will be all 1s. */

 p7_p6_p5_p4 = p7_p6_p5_p4 & x7_x6_x5_x4; p3_p2_p1_p0 = p3_p2_p1_p0 & x3_x2_x1_x0;

/* Store the thresholded pixels back to the image. */

_amemd8(&image[i]) = _itod(p7_p6_p5_p4, p3_p2_p1_p0);


