HDU4920 Matrix multiplication (CPU cache对程序的影响)

Problem Description
Given two matrices A and B of size n×n, find the product of them.

bobo hates big integers. So you are only asked to find the result modulo 3.
 

Input
The input consists of several tests. For each tests:

The first line contains n (1≤n≤800). Each of the following n lines contain n integers -- the description of the matrix A. The j-th integer in the i-th line equals A ij. The next n lines describe the matrix B in similar format (0≤A ij,B ij≤10 9).
 

Output
For each tests:

Print n lines. Each of them contain n integers -- the matrix A×B in similar format.
 

Sample Input
   
   
   
   
1 0 1 2 0 1 2 3 4 5 6 7
 

Sample Output
   
   
   
   
0 0 1 2 1


经典的矩阵乘法因为第三层循环(最内层循环)是对k进行循环,因此b[k][j]是对b逐列进行访问。我们知道内存中二维数组是以行为单位连续存储的,逐列访问将会每次跳1000*4(bytes)。根据cpu cache的替换策略,将会有大量的cache失效。

因此square2.cpp将j循环和k循环交换位置,这样就保证了

c[i][j] += a[i][k] * b[k][j];

这条语句对内存的访问是连续的,增加了cache的命中率,大大提升了程序执行速度。

具体见样例:http://blog.csdn.net/a775700879/article/details/11750703

代码如下:

#include <iostream>
#include <cstdio>
#include <cstring>
using namespace std;

const int maxn = 810;

int a[maxn][maxn],b[maxn][maxn],c[maxn][maxn];

int n;

int main()
{
    while(~scanf("%d",&n)){
        int i,j,k;
        for(i=0;i<n;i++){
            for(j=0;j<n;j++){
                scanf("%d",&a[i][j]);
                a[i][j]%=3;
                c[i][j]=0;
            }
        }
        for(i=0;i<n;i++)
            for(int j=0;j<n;j++){
                scanf("%d",&b[i][j]);
                b[i][j]%=3;
            }
        for(i=0;i<n;i++)
            for(k=0;k<n;k++)
                for(j=0;j<n;j++)
                    c[i][j]=c[i][j]+a[i][k]*b[k][j];
        for(i=0;i<n;i++){
            for(j=0;j<n-1;j++)
                printf("%d ",c[i][j]%3);
            printf("%d\n",c[i][n-1]%3);
        }
    }
    return 0;
}


你可能感兴趣的:(HDU4920 Matrix multiplication (CPU cache对程序的影响))