cache-simulator思路
先读docs, valgrind 可以提供关于 cache 的 memory access,命令:
linux> valgrind --log-fd=1 --tool=lackey -v --trace-mem=yes ls -l
memory trace 是这样的状态:
I 0400d7d4,8
M 0421c7f0,4
L 04f6b868,8
S 7ff0005c8,8
[space]operation address,size
I -> instruction load, L -> data load, S -> data store, M -> data modify.(data load followed by a data store). 然后I前面一定没有space,M, L, S 前面都会有一个space。addres 是 64-bit hex memory 地址, size 是operation access。
我们用的是trace file,存在 traces 文件夹下,看一下,比如 yi.trace:
L 10,1
M 20,1
L 22,1
S 18,1
L 110,1
L 210,1
M 12,1
看另一个文件:
I 00400531,2
I 00400581,4
L 7ff000384,4
I 00400585,2
I 00400533,7
S 7ff000388,4
的确有 instruction load 和 data load,所以需要注意,也有f的出现,所以确定是 hex 。
然后来看给的例子:
linux> ./csim-ref -s 4 -E 1 -b 4 -t traces/yi.trace
hits:4 misses:5 evictions:3
verbose模式:
linux> ./csim-ref -v -s 4 -E 1 -b 4 -t traces/yi.trace
L 10,1 miss
M 20,1 miss hit
L 22,1 hit
S 18,1 hit
L 110,1 miss eviction
L 210,1 miss eviction
M 12,1 miss eviction hit
hits:4 misses:5 evictions:3
先来分析一下结果:s = 4 -> S = 16, E = 1, b = 4, B = 2^4 = 16.
所以看结构,
- L 10,1 miss 必然的,这个时候 0x10 - 0x1f 被加载进 set 1
- M 20,1 miss hit 因为M是 load + store,首先 0x20 - 0x2f 被加载进 set 2,先 miss, 再hit
- L 22,1 hit 必然因为 0x22 此刻位于 set 2
- L 18,1 hit 此刻0x18 位于 set 1
- L 110,1 miss eviction 0x110 % (16 * 16) = 16 -> 所以这个也想在set 1, 所以出现 miss eviction(驱逐)
- L 210,1 miss eviction 0x210 % (16 * 16) = 16 -> 同想在set 1, 再出现 miss eviction
- M 12,1 miss eviction hit M = load + store 0x12 想在set 1, 所以先 miss eviction 再hit
- 所以数数: hits:4 misses:5 evictions:3
再看一个文件例子: trans.trace:
S 00600aa0,1
I 004005b6,5
I 004005bb,5
I 004005c0,5
S 7ff000398,8
I 0040051e,1
S 7ff000390,8
I 0040051f,3
I 00400522,4
S 7ff000378,8
I 00400526,4
S 7ff000370,8
I 0040052a,7
S 7ff000384,4
I 00400531,2
I 00400581,4
L 7ff000384,4
I 00400585,2
I 00400533,7
S 7ff000388,4
....
如果运行,结果是:
S 600aa0,1 miss
S 7ff000398,8 miss
S 7ff000390,8 hit
S 7ff000378,8 miss
S 7ff000370,8 hit
S 7ff000384,4 miss
L 7ff000384,4 hit
S 7ff000388,4 hit
...
所以可以看出来,首先这个I *****, instruction load 并不会影响数据,因为是 load 指令,S则会。
再看例子:
./csim-ref -v -s 4 -E 2 -b 4 -t traces/yi.trace
L 10,1 miss
M 20,1 miss hit
L 22,1 hit
S 18,1 hit
L 110,1 miss
L 210,1 miss eviction
M 12,1 miss eviction hit
hits:4 misses:5 evictions:2
这一次 S = 16, E = 2, B = 16, c = 512
大体思路清楚,尝试coding。
- L 0x10,1 加载入 0x10 / 16 = set 1
- M 0x20,1 加载入 0x20 / 16 = set 2
- L 0x22,1 跟0x20一起 hit 0x22 / 16 = set 2
- S 0x18,1 跟0x10一起 hit 0x18 / 16 = set 1
- L 0x110,1 放入 set 1 中空闲部分。0x110 / 16 = 17, 17 % 16 = 1
- L 0x210,1 需要驱逐set 1中某个,所以 eviction 0x210 / 16 = 33, 33 % 16 = 1
- M 0x12, 1 同样需要处于set 1中
这里我需要作出正确的计算公式
Coding
getopt
模仿slides 代码先写 getopt, 注意optString的格式:
char*optstring = “ab:c::”;
单个字符a 表示选项a没有参数 格式:-a即可,不加参数
单字符加冒号b: 表示选项b有且必须加参数 格式:-b 100或-b100,但-b=100错
单字符加2冒号c:: 表示选项c可以有,也可以无 格式:-c200,其它格式错误
optarg —— 指向当前选项参数(如果有)的指针。
可以参见文章: 命令行选项解析函数(C语言):getopt()和getopt_long()
while(-1 != (opt = getopt(argc, argv, "hvs:E:b:t:"))){
...
}
先来最简单的 case 'h': usage(), 直接照抄string,然后可以值得注意的是:对于一个比较长的string 可以用折行符 \
// 折行符'\'是代码换行连接的标记(一行不够写)
"a looooooooooong \
string"
Cache
看hint:
A cache is just 2D array of cache lines:
- struct cache_line cache[S][E];
- S = 2^s, is the number of sets
- E is associa-vity
Each cache_line has:
- Valid bit
- Tag
- LRU counter( only if you are not using a queue)
先来看 cache_line,因为这里的跟实际的有所区别,所以: valid_bit, start_address, end_adress 感觉一定是有必要的,关于LRU, 只会在如下的状态下会出现,E > 1, 然后都被填满了,我们需要 evict, 那么我们调一个最近都没有用过的block来驱逐,最直观的想法是用一个timestamp,然后每次检查最久远的timestamp,驱逐这个。
看一下有关C中的时间:
Unix and POSIX-compliant systems implement time_t as an integer or real-floating type (typically a 32- or 64-bit integer) which represents the number of seconds since the start of the Unix epoch: midnight UTC of January 1, 1970 (not counting leap seconds). Some systems correctly handle negative time values, while others do not. Systems using a signed 32-bit time_t type are susceptible to the Year 2038 problem.
查看此处 C Programming/C Reference/time.h/time_t
#include
#include
int main(){
time_t t;
t = time(NULL);
printf("%ld\n",t);
return 0;
}
运行:1543130656 , 1543130661
依旧可能出现问题,因为这里给的是 seconds 秒数,可能如果操作间隔时间很短可能我们无法分辨(?to be questioned here)
然后看到有很多这样的问题 time in milliseconds
long long current_timestamp() {
struct timeval te;
gettimeofday(&te, NULL); // get current time
long long milliseconds = te.tv_sec*1000LL + te.tv_usec/1000; // calculate milliseconds
return milliseconds;
}
如果出现不能分辨的问题可能需要用milliseconds. 暂时先用 seconds.
所以暂定 cache_line:
typedef struct {
bool valid_bit;
long start_address;
long end_address;
long lru_counter;
} cache_line;
然后需要动态分配二维数组 struct cache_line cache[S][E]。查看 C语言常见问题集。
传统的解决方案是分配一个指针数组, 然后把每个指针初始化为动态分配的 “列”。以下为一个二维的例子:
#include
int **array1 = malloc(nrows * sizeof(int *));
for(i = 0; i < nrows; i++)
array1[i] = malloc(ncolumns * sizeof(int));
当然, 在真实代码中, 所有的 malloc 返回值都必须检查。你也可以使用 sizeof(array1)和sizeof(*array1)代替sizeof(int *)和sizeof(int)
发现配套的 csapp.h 和 csapp.c 已经具备 wrap 的 Malloc, Free, 无需我们再错误检查:
// csapp.c
void *Malloc(size_t size)
{
void *p;
if ((p = malloc(size)) == NULL)
unix_error("Malloc error");
return p;
}
根据网上的提示吧 csapp.h 和 csapp.c 拷贝到 /usr/include 中。然后 csapp.h 的 #endif之前添上#include
修改 Makefile:
12 csim: csim.c cachelab.c cachelab.h
13 $(CC) $(CFLAGS) -o csim csim.c cachelab.c -lm -lpthread
然后可以make成功。
读文件
make 成功的下一步我们来尝试读取文件。
// open the trace file to read
FILE *trace_file;
trace_file = fopen(file_name, "r");
char identifier;
unsigned address;
int size;
while(fscanf(trace_file, " %c %x,%d", &identifier, &address, &size) >0)
{
//
printf(" %c %x,%d\n",identifier, address, size);
}
printf("\n");
fclose(trace_file);
读取成功,能成功显示。正式开始和 structurelized.
寻找cache_block
按照之前的写法来寻找一个地址应当对应的 cache_block. 然后发现最好/需要将一些东西变成全局变量,毕竟也没有禁止使用 global variable.
纠错
发现如果使用 LRU counter 无论我的 time stamp 用 seconds 或者 milliseconds 都可能导致两条指令太接近不能分辨,所以干脆用一个 global variable.
终于 ./test-csim 拿到全部分数||||
代码如下,如果就 ./csim -h 会有以下结果:
输完帮助参数之后出现|||
Segmentation fault (core dumped)
我猜代码应当有很多可以优化的地方|||
#include "cachelab.h"
#include
#include
#include
#include
#include
#include
#include
#define bool char
#define true 1
#define false 0
typedef struct {
bool valid;
long start;
long end;
long lru;
} cache_line;
void usage();
int find_s_index(unsigned long address);
int find_e_index(unsigned long address, int size);
int find_unsed_e_index(unsigned long address, int size);
void modify_cache_block(int s_index, int e_index, unsigned long address);
char* save_data(unsigned long address, int size);
char* load_data(unsigned long address, int size);
void cache_instruction(char identifier, unsigned long address, int size);
unsigned S,E,B; // S = 2^s, B = 2^b
bool verbose;
unsigned hits,misses,evictions;
char* file_name; // trace file name
cache_line** cache;
long lru_timer;
int main(int argc, char** argv)
{
int opt;
unsigned s,b;
// init to omit errors
lru_timer = 0;
verbose = false;
hits = misses = evictions = 0;
// get opt
while(-1 != (opt = getopt(argc, argv, "hvs:E:b:t:"))){
switch(opt){
case 'h':
usage();
break;
case 'v':
verbose = true;
break;
case 's':
s = atoi(optarg);
break;
case 'E':
E = atoi(optarg);
break;
case 'b':
b = atoi(optarg);
break;
case 't':
file_name = optarg;
break;
default:
printf("wrong argument\n");
break;
}
}
S = pow(2.0, (float)s);
B = pow(2.0, (float)b);
// allocate cache
cache = Malloc(S * sizeof(cache_line *));
for(int i = 0; i < S; i++)
cache[i] = Malloc( E * sizeof(cache_line));
for(int i = 0; i < S ; i++)
for(int j = 0; j < E; j++)
cache[i][j].valid = false;
// open the trace file to read
FILE *trace_file;
trace_file = fopen(file_name, "r");
char identifier;
unsigned long address;
int size;
while(fscanf(trace_file, " %c %lx,%d", &identifier, &address, &size) >0)
{
cache_instruction(identifier, address, size);
}
for (int i = 0; i < S; ++i)
{
Free(cache[i]);
}
fclose(trace_file);
printSummary(hits, misses, evictions);
return 0;
}
/*
* find the set of the address
*/
int find_s_index(unsigned long address)
{
int s_index = address / B;
if (s_index >= S)
s_index = s_index % S;
return s_index;
}
void cache_instruction(char identifier, unsigned long address, int size)
{
switch(identifier){
case 'S':
{
char* description = load_data(address, size);
if (verbose == true)
{
printf("%c %lx,%d ",identifier, address, size);
printf("%s \n", description);
}
break;
}
case 'M':
{
char* load = load_data(address, size);
char* save = save_data(address, size);
if (verbose == true)
{
printf("%c %lx,%d ",identifier, address, size);
printf("%s ", load);
printf("%s \n",save);
}
break;
}
case 'L':
{
char* description = load_data(address, size);
if (verbose == true)
{
printf("%c %lx,%d ",identifier, address, size);
printf("%s \n", description);
}
break;
}
}
lru_timer++;
}
char* save_data(unsigned long address, int size)
{
char* description;
int s_index = find_s_index(address);
int e_index = find_e_index(address, size);
if(e_index == -1)
{
misses += 1;
description = "miss";
} else {
hits += 1;
modify_cache_block(s_index, e_index, address);
description = "hit";
}
return description;
}
/*
* modify cache block
*/
void modify_cache_block(int s_index, int e_index, unsigned long address)
{
cache[s_index][e_index].valid = true;
cache[s_index][e_index].start = (address / B) * B;
cache[s_index][e_index].end = cache[s_index][e_index].start + (B - 1);
cache[s_index][e_index].lru = lru_timer;
}
// here I made the assumption that the size will be valid?
char* load_data(unsigned long address, int size)
{
char* description;
int s_index = find_s_index(address);
int e_index = find_e_index(address, size);
// we have a hit
if (e_index != -1){
description = "hit";
hits +=1;
modify_cache_block(s_index, e_index, address);
return description;
}
// we have not fulled block and a miss
e_index = find_unsed_e_index(address, size);
if (e_index != -1)
{
description = "miss";
misses += 1;
modify_cache_block(s_index,e_index,address);
return description;
}
cache_line* located_set = cache[s_index];
e_index = 0;
// every block is used and we have to pick out lru
for(int i = 0; i < E ; i++)
{
cache_line block = located_set[i];
if(located_set[e_index].lru > block.lru)
e_index = i;
}
description = "miss eviction";
misses += 1;
evictions += 1;
modify_cache_block(s_index,e_index,address);
return description;
}
/*
* find e_index of cache block, -1 stands for not found
*/
int find_e_index(unsigned long address, int size)
{
int s_index = find_s_index(address);
cache_line* located_set = cache[s_index];
for(int i = 0; i < E; i++)
{
cache_line block = located_set[i];
if(block.valid == true && address >= block.start && address <= block.end)
return i;
}
return -1;
}
int find_unsed_e_index(unsigned long address, int size)
{
int s_index = find_s_index(address);
cache_line* located_set = cache[s_index];
for (int i = 0; i < E; i++)
{
if (located_set[i].valid == false)
return i;
}
return -1;
}
void usage()
{
char *usage =
"Usage: ./csim [-hv] -s -E -b -t \n\
Options:\n\
-h Print this help message.\n\
-v Optional verbose flag.\n\
-s Number of set index bits.\n\
-E Number of lines per set.\n\
-b Number of block offset bits.\n\
-t Trace file.\n\
\n\
Examples:\n\
linux> ./csim -s 4 -E 1 -b 4 -t traces/yi.trace \n\
linux> ./csim -v -s 8 -E 2 -b 4 -t traces/yi.trace";
printf("%s\n",usage);
}