CacheLab- Cache Simulator - Part I

cache-simulator思路

先读docs, valgrind 可以提供关于 cache 的 memory access,命令:

linux> valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes ls -l

memory trace 是这样的状态:


I 0400d7d4,8
 M 0421c7f0,4
 L 04f6b868,8
 S 7ff0005c8,8

[space]operation address,size

I -> instruction load, L -> data load, S -> data store, M -> data modify.(data load followed by a data store). 然后I前面一定没有space,M, L, S 前面都会有一个space。addres 是 64-bit hex memory 地址, size 是operation access。

我们用的是trace file,存在 traces 文件夹下,看一下,比如 yi.trace:

 L 10,1
 M 20,1
 L 22,1
 S 18,1
 L 110,1
 L 210,1
 M 12,1

看另一个文件:

I  00400531,2
I  00400581,4
 L 7ff000384,4
I  00400585,2
I  00400533,7
 S 7ff000388,4

的确有 instruction load 和 data load,所以需要注意,也有f的出现,所以确定是 hex 。

然后来看给的例子:

linux> ./csim-ref -s 4 -E 1 -b 4 -t traces/yi.trace
hits:4 misses:5 evictions:3

verbose模式:

linux> ./csim-ref -v -s 4 -E 1 -b 4 -t traces/yi.trace 
L 10,1 miss
M 20,1 miss hit
L 22,1 hit
S 18,1 hit
L 110,1 miss eviction
L 210,1 miss eviction
M 12,1 miss eviction hit 
hits:4 misses:5 evictions:3

先来分析一下结果:s = 4 -> S = 16, E = 1, b = 4, B = 2^4 = 16.

所以看结构,

  • L 10,1 miss 必然的,这个时候 0x10 - 0x1f 被加载进 set 1
  • M 20,1 miss hit 因为M是 load + store,首先 0x20 - 0x2f 被加载进 set 2,先 miss, 再hit
  • L 22,1 hit 必然因为 0x22 此刻位于 set 2
  • L 18,1 hit 此刻0x18 位于 set 1
  • L 110,1 miss eviction 0x110 % (16 * 16) = 16 -> 所以这个也想在set 1, 所以出现 miss eviction(驱逐)
  • L 210,1 miss eviction 0x210 % (16 * 16) = 16 -> 同想在set 1, 再出现 miss eviction
  • M 12,1 miss eviction hit M = load + store 0x12 想在set 1, 所以先 miss eviction 再hit
  • 所以数数: hits:4 misses:5 evictions:3

再看一个文件例子: trans.trace:

 S 00600aa0,1
I  004005b6,5
I  004005bb,5
I  004005c0,5
 S 7ff000398,8
I  0040051e,1
 S 7ff000390,8
I  0040051f,3
I  00400522,4
 S 7ff000378,8
I  00400526,4
 S 7ff000370,8
I  0040052a,7
 S 7ff000384,4
I  00400531,2
I  00400581,4
 L 7ff000384,4
I  00400585,2
I  00400533,7
 S 7ff000388,4
 ....

如果运行,结果是:

S 600aa0,1 miss 
S 7ff000398,8 miss 
S 7ff000390,8 hit 
S 7ff000378,8 miss 
S 7ff000370,8 hit 
S 7ff000384,4 miss 
L 7ff000384,4 hit 
S 7ff000388,4 hit 
...

所以可以看出来,首先这个I *****, instruction load 并不会影响数据,因为是 load 指令,S则会。

再看例子:

./csim-ref -v -s 4 -E 2 -b 4 -t traces/yi.trace
L 10,1 miss 
M 20,1 miss hit 
L 22,1 hit 
S 18,1 hit 
L 110,1 miss 
L 210,1 miss eviction 
M 12,1 miss eviction hit 
hits:4 misses:5 evictions:2

这一次 S = 16, E = 2, B = 16, c = 512

大体思路清楚,尝试coding。

  • L 0x10,1 加载入 0x10 / 16 = set 1
  • M 0x20,1 加载入 0x20 / 16 = set 2
  • L 0x22,1 跟0x20一起 hit 0x22 / 16 = set 2
  • S 0x18,1 跟0x10一起 hit 0x18 / 16 = set 1
  • L 0x110,1 放入 set 1 中空闲部分。0x110 / 16 = 17, 17 % 16 = 1
  • L 0x210,1 需要驱逐set 1中某个,所以 eviction 0x210 / 16 = 33, 33 % 16 = 1
  • M 0x12, 1 同样需要处于set 1中

这里我需要作出正确的计算公式

Coding

getopt

模仿slides 代码先写 getopt, 注意optString的格式:

char*optstring = “ab:c::”;
单个字符a         表示选项a没有参数            格式:-a即可,不加参数
单字符加冒号b:     表示选项b有且必须加参数      格式:-b 100或-b100,但-b=100错
单字符加2冒号c::   表示选项c可以有,也可以无     格式:-c200,其它格式错误

optarg —— 指向当前选项参数(如果有)的指针。

可以参见文章: 命令行选项解析函数(C语言):getopt()和getopt_long()

while(-1 != (opt = getopt(argc, argv, "hvs:E:b:t:"))){
  ...
}

先来最简单的 case 'h': usage(), 直接照抄string,然后可以值得注意的是:对于一个比较长的string 可以用折行符 \

// 折行符'\'是代码换行连接的标记(一行不够写)
"a looooooooooong \
 string" 

Cache

看hint:

A cache is just 2D array of cache lines:

  • struct cache_line cache[S][E];
  • S = 2^s, is the number of sets
  • E is associa-vity

Each cache_line has:

  • Valid bit
  • Tag
  • LRU counter( only if you are not using a queue)

先来看 cache_line,因为这里的跟实际的有所区别,所以: valid_bit, start_address, end_adress 感觉一定是有必要的,关于LRU, 只会在如下的状态下会出现,E > 1, 然后都被填满了,我们需要 evict, 那么我们调一个最近都没有用过的block来驱逐,最直观的想法是用一个timestamp,然后每次检查最久远的timestamp,驱逐这个。

看一下有关C中的时间:

Unix and POSIX-compliant systems implement time_t as an integer or real-floating type (typically a 32- or 64-bit integer) which represents the number of seconds since the start of the Unix epoch: midnight UTC of January 1, 1970 (not counting leap seconds). Some systems correctly handle negative time values, while others do not. Systems using a signed 32-bit time_t type are susceptible to the Year 2038 problem.

查看此处 C Programming/C Reference/time.h/time_t

#include 
#include 

int main(){
  time_t t;
  t = time(NULL);
  printf("%ld\n",t);

  return 0;
}

运行:1543130656 , 1543130661

依旧可能出现问题,因为这里给的是 seconds 秒数,可能如果操作间隔时间很短可能我们无法分辨(?to be questioned here)

然后看到有很多这样的问题 time in milliseconds

long long current_timestamp() {
  struct timeval te;
  gettimeofday(&te, NULL); // get current time
  long long milliseconds = te.tv_sec*1000LL + te.tv_usec/1000; // calculate milliseconds
  return milliseconds;
} 

如果出现不能分辨的问题可能需要用milliseconds. 暂时先用 seconds.

所以暂定 cache_line:

typedef struct {
  bool valid_bit;
  long start_address;
  long end_address;
  long lru_counter;
} cache_line;

然后需要动态分配二维数组 struct cache_line cache[S][E]。查看 C语言常见问题集。

传统的解决方案是分配一个指针数组, 然后把每个指针初始化为动态分配的 “列”。以下为一个二维的例子:

#include 
        int **array1 = malloc(nrows * sizeof(int *));
        for(i = 0; i < nrows; i++)
            array1[i] = malloc(ncolumns * sizeof(int));

当然, 在真实代码中, 所有的 malloc 返回值都必须检查。你也可以使用 sizeof(array1)和sizeof(*array1)代替sizeof(int *)和sizeof(int)

发现配套的 csapp.h 和 csapp.c 已经具备 wrap 的 Malloc, Free, 无需我们再错误检查:

// csapp.c
void *Malloc(size_t size) 
{
    void *p;

    if ((p  = malloc(size)) == NULL)
    unix_error("Malloc error");
    return p;
}

根据网上的提示吧 csapp.h 和 csapp.c 拷贝到 /usr/include 中。然后 csapp.h 的 #endif之前添上#include , 但是compile 发现错误,应该是所谓的 " csapp.c文件中有关于线程中部分,gcc编译的时候必须带 -lpthread,否则会出错的。"

修改 Makefile:

 12 csim: csim.c cachelab.c cachelab.h
 13         $(CC) $(CFLAGS) -o csim csim.c cachelab.c -lm -lpthread 

然后可以make成功。

读文件

make 成功的下一步我们来尝试读取文件。

// open the trace file to read
FILE *trace_file;
trace_file = fopen(file_name, "r");
  
char identifier;
unsigned address;
int size;
while(fscanf(trace_file, " %c %x,%d", &identifier, &address, &size) >0)
{
  // 
  printf(" %c %x,%d\n",identifier, address, size);
}
printf("\n");
fclose(trace_file);

读取成功,能成功显示。正式开始和 structurelized.

寻找cache_block

按照之前的写法来寻找一个地址应当对应的 cache_block. 然后发现最好/需要将一些东西变成全局变量,毕竟也没有禁止使用 global variable.

纠错

发现如果使用 LRU counter 无论我的 time stamp 用 seconds 或者 milliseconds 都可能导致两条指令太接近不能分辨,所以干脆用一个 global variable.

终于 ./test-csim 拿到全部分数||||

代码如下,如果就 ./csim -h 会有以下结果:

输完帮助参数之后出现|||
Segmentation fault (core dumped)

我猜代码应当有很多可以优化的地方|||

#include "cachelab.h"
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define bool char
#define true 1
#define false 0

typedef struct {
  bool valid;
  long start;
  long end;
  long lru;
} cache_line;

void usage();
int find_s_index(unsigned long address);
int find_e_index(unsigned long address, int size);
int find_unsed_e_index(unsigned long address, int size);
void modify_cache_block(int s_index, int e_index, unsigned long address);
char* save_data(unsigned long address, int size);
char* load_data(unsigned long address, int size);
void cache_instruction(char identifier, unsigned long address, int size);

unsigned S,E,B; // S = 2^s, B = 2^b
bool verbose;
unsigned hits,misses,evictions; 
char* file_name; // trace file name
cache_line** cache;
long lru_timer;

int main(int argc, char** argv)
{
  int opt;
  unsigned s,b;

  // init to omit errors
  lru_timer = 0;
  verbose = false;
  hits = misses = evictions = 0;

  // get opt 
  while(-1 != (opt = getopt(argc, argv, "hvs:E:b:t:"))){
    switch(opt){
      case 'h':
        usage();
        break;
      case 'v':
        verbose = true;
        break;
      case 's':
        s = atoi(optarg);
        break;
      case 'E':
        E = atoi(optarg);
        break;
      case 'b':
        b = atoi(optarg);
        break;
      case 't':
        file_name = optarg;
        break;
      default:
        printf("wrong argument\n");
        break;
    }
  }
  
  S = pow(2.0, (float)s);
  B = pow(2.0, (float)b);

  // allocate cache
  cache = Malloc(S * sizeof(cache_line *));
  for(int i = 0; i < S; i++)
    cache[i] = Malloc( E * sizeof(cache_line));

  for(int i = 0; i < S ; i++)
    for(int j = 0; j < E; j++)
      cache[i][j].valid = false;

  // open the trace file to read
  FILE *trace_file;
  trace_file = fopen(file_name, "r");
  
  char identifier;
  unsigned long address;
  int size;
  while(fscanf(trace_file, " %c %lx,%d", &identifier, &address, &size) >0)
  {
    cache_instruction(identifier, address, size);
  }

  for (int i = 0; i < S; ++i)
  {
    Free(cache[i]);
  }
  
  fclose(trace_file);
  printSummary(hits, misses, evictions);

  return 0;
}

/*
 * find the set of the address
 */
int find_s_index(unsigned long address)
{
  int s_index = address / B;
  if (s_index >= S)
    s_index = s_index % S;
 
  return s_index;
}

void cache_instruction(char identifier, unsigned long address, int size)
{
  switch(identifier){
    case 'S':
    {
      char* description = load_data(address, size);
      if (verbose == true)
      {
        printf("%c %lx,%d ",identifier, address, size);
        printf("%s \n", description);
      }     
      break;
    }
    case 'M':
    {
      char* load = load_data(address, size);
      char* save = save_data(address, size);
      if (verbose == true)
      {
        printf("%c %lx,%d ",identifier, address, size);
        printf("%s ", load);
        printf("%s \n",save);
      }
      break;
    }
    case 'L':
    {
      char* description = load_data(address, size);
      if (verbose == true)
      {
        printf("%c %lx,%d ",identifier, address, size);
        printf("%s \n", description);
      }
      break;
    }
  }
  lru_timer++;
}

char* save_data(unsigned long address, int size)
{
  char* description;
  int s_index = find_s_index(address);
  int e_index = find_e_index(address, size);

  if(e_index == -1)
  {
    misses += 1;
    description = "miss";
  } else {
    hits += 1;

    modify_cache_block(s_index, e_index, address);
    description = "hit";
  }

  return description;
}

/*
 * modify cache block
 */
void modify_cache_block(int s_index, int e_index, unsigned long address)
{
  cache[s_index][e_index].valid = true;
  cache[s_index][e_index].start = (address / B) * B;
  cache[s_index][e_index].end = cache[s_index][e_index].start + (B - 1);
  cache[s_index][e_index].lru = lru_timer;
}



// here I made the assumption that the size will be valid?
char* load_data(unsigned long address, int size)
{
  char* description;

  int s_index = find_s_index(address);
  int e_index = find_e_index(address, size);
  
  // we have a hit
  if (e_index != -1){
    description = "hit";
    hits +=1;
    

    modify_cache_block(s_index, e_index, address);
    return description;
  }
 
  // we have not fulled block and a miss
  e_index = find_unsed_e_index(address, size);
  if (e_index != -1)
  {
    description = "miss";
    misses += 1;


    modify_cache_block(s_index,e_index,address);
    return description;
  }

  cache_line* located_set = cache[s_index];
  e_index = 0;
  // every block is used and we have to pick out lru
  for(int i = 0; i < E ; i++)
  {
    cache_line block = located_set[i];
    if(located_set[e_index].lru >  block.lru)
       e_index = i;
  }

  description = "miss eviction";
  misses += 1;
  evictions += 1;

  modify_cache_block(s_index,e_index,address);
  return description;
}

/*
 * find e_index of cache block, -1 stands for not found
 */
int find_e_index(unsigned long address, int size)
{
  int s_index = find_s_index(address);
  cache_line* located_set = cache[s_index];

  for(int i = 0; i < E; i++)
  {
    cache_line block = located_set[i];

    if(block.valid == true && address >= block.start && address <= block.end)
       return i;
   }

  return -1; 
}

int find_unsed_e_index(unsigned long address, int size)
{
  int s_index = find_s_index(address);
  cache_line* located_set = cache[s_index];
  
  for (int i = 0; i < E; i++)
  {
    if (located_set[i].valid == false)
      return i;
  }
  return -1;
}


void usage()
{
  char *usage  =
  "Usage: ./csim [-hv] -s  -E  -b  -t \n\
   Options:\n\
    -h         Print this help message.\n\
    -v         Optional verbose flag.\n\
    -s    Number of set index bits.\n\
    -E    Number of lines per set.\n\
    -b    Number of block offset bits.\n\
    -t   Trace file.\n\
    \n\
  Examples:\n\
    linux>  ./csim -s 4 -E 1 -b 4 -t traces/yi.trace \n\
    linux>  ./csim -v -s 8 -E 2 -b 4 -t traces/yi.trace";
    printf("%s\n",usage);
}

你可能感兴趣的:(CacheLab- Cache Simulator - Part I)