软件预取调度的距离

Intel优化文档部分翻译   By  G-Spider 2010-12-14  不妥之处,欢迎指正。

http://blog.csdn.net/G_Spider

 

Software Prefetch Scheduling Distance
软件预取调度的距离

Determining the ideal prefetch placement in the code depends on many architecturalparameters, including: the amount of memory to be prefetched, cache lookuplatency, system memory latency, and estimate of computation cycle. The ideal
distance for prefetching data is processor- and platform-dependent. If the distance is too short, the prefetch will not hide the latency of the fetch behind computation. Ifthe prefetch is too far ahead, prefetched data may be flushed out of the cache by the time it is required.

在代码中确定理想的预取位置取决于许多结构性参数,其中包括:将预取的存储量,缓存查找延迟,系统内存延迟,和运算周期的估计。理想
预取数据的距离是处理器和平台相关的。如果距离太短,预取将不能掩盖背后的提取计算延迟。如果预取是过于超前,有用的预取数据可能被刷出缓存。

Since prefetch distance is not a well-defined metric, for this discussion, we define a new term, prefetch scheduling distance (PSD), which is represented by the number of iterations. For large loops, prefetch scheduling distance can be set to 1 (that is, schedule prefetch instructions one iteration ahead). For small loop bodies (that is, loop iterations with little computation), the prefetch scheduling distance must be more than one iteration.


由于预取距离不是一个明确的指标,为了讨论,我们定义一个新的术语,预取调度距离(PSD),它是由迭代的次数反映。对于大循环,调度预取距离可设置为1(即,预取指令附在第一次迭代前)。对于小的循环体(即有很少的循环迭代计算),预取距离必须调度不止一次迭代。

 

A simplified equation to compute PSD is deduced from the mathematical model. For a simplified equation, complete mathematical model, and methodology of prefetch distance determination, see Appendix E, “Summary of Rules and Suggestions.”

关于计算PSD的一个简化公式可由数学模型推导出。对于简化方程,完整的数学模型和预取方法距离测定,见附录E,“规则和建议摘要”。

 

Example 7-3 illustrates the use of a prefetch within the loop body. The prefetch scheduling distance is set to 3, ESI is effectively the pointer to a line, EDX is the address of the data being referenced and XMM1-XMM4 are the data used in computation. Example 7-4 uses two independent cache lines of data per iteration. The PSD would need to be increased/decreased if more/less than two cache lines are used per iteration.

例7-3说明了一个预取在循环体内的使用。预取调度距离(PSD)设置为3,ESI是有效的数据基指,EDX是数据的参考地址,XMM1 - XMM4存放计算中使用的数据。示例7-4每次迭代使用两个独立的数据高速缓存行。如果每次迭代使用多于/小于两个缓存行,PSD需要增加/减少。

 

 

例 7-3. 预取调度距离
top_loop:
 prefetchnta [edx + esi + 128*3]
 prefetchnta [edx*4 + esi + 128*3]
 ......
 ......
 movaps xmm1, [edx + esi]
 movaps xmm2, [edx*4 + esi]
 movaps xmm3, [edx + esi + 16]
 movaps xmm4, [edx*4 + esi + 16]
 ......
 ......
 add esi, 128
 cmp esi, ecx
jl top_loop

 

 

 

 

 

 

 

 

 

 

 

你可能感兴趣的:(优化,cache,System,文档,loops,distance)