nginx源码分析—hash结构ngx_hash_t(v1.0.4)

本博客(http://blog.csdn.net/livelylittlefish )贴出作者(阿波)相关研究、学习内容所做的笔记,欢迎广大朋友指正!

Content

0.

1.hash结构

1.1ngx_hash_t结构

1.2ngx_hash_init_t结构

1.3ngx_hash_key_t结构

1.4hash的逻辑结构

2.hash操作

2.1NGX_HASH_ELT_SIZE

2.2hash函数

2.3hash初始化

2.4hash查找

3.一个例子

3.1代码

3.2如何编译

3.3运行结果

3.3.1bucket_size=64字节

3.3.2bucket_size=256字节

4.小结

 

0.

 

本文继续介绍nginx的数据结构——hash结构。

 

链表实现文件:文件:./src/core/ngx_hash.h/.c.表示nginx-1.0.4代码目录,本文为/usr/src/nginx-1.0.4

 

1. hash结构

 

nginxhash结构比其listarrayqueue等结构稍微复杂一些,下图是hash相关数据结构图。下面一一介绍。

 

1.1 ngx_hash_t结构

 

nginxhash结构为ngx_hash_thash元素结构为ngx_hash_elt_t,定义如下。

typedef struct {               //hash元素结构
    void             *value;   //value,即某个key对应的值,即<key,value>中的value
    u_short           len;     //name长度
    u_char            name[1]; //某个要hash的数据(在nginx中表现为字符串),即<key,value>中的key
} ngx_hash_elt_t;

typedef struct {               //hash结构
    ngx_hash_elt_t  **buckets; //hash桶(有size个桶)
    ngx_uint_t        size;    //hash桶个数

} ngx_hash_t;

其中,sizeof(ngx_hash_t) = 8sizeof(ngx_hash_elt_t) = 8。实际上,ngx_hash_elt_t结构中的name字段就是ngx_hash_key_t结构中的key。这在ngx_hash_init()函数中可以看到,请参考后续的分析。该结构在模块配置解析时经常使用。

 

1.2 ngx_hash_init_t结构

 

nginxhash初始化结构是ngx_hash_init_t,用来将其相关数据封装起来作为参数传递给ngx_hash_init()ngx_hash_wildcard_init()函数。这两个函数主要是在http相关模块中使用,例如ngx_http_server_names()函数(优化http Server Names)ngx_http_merge_types()函数(合并httptype)ngx_http_fastcgi_merge_loc_conf()函数(合并FastCGI Location Configuration)等函数或过程用到的参数、局部对象/变量等。这些内容将在后续的文章中讲述。

 

ngx_hash_init_t结构如下。sizeof(ngx_hash_init_t)=28

typedef struct {                    //hash初始化结构
    ngx_hash_t       *hash;         //指向待初始化的hash结构
    ngx_hash_key_pt   key;          //hash函数指针

    ngx_uint_t        max_size;     //bucket的最大个数
    ngx_uint_t        bucket_size;  //每个bucket的空间

    char             *name;         //该hash结构的名字(仅在错误日志中使用)
    ngx_pool_t       *pool;         //该hash结构从pool指向的内存池中分配
    ngx_pool_t       *temp_pool;    //分配临时数据空间的内存池
} ngx_hash_init_t;

1.3 ngx_hash_key_t结构

 

该结构也主要用来保存要hash的数据,即键-值对<key,value>,在实际使用中,一般将多个键-值对保存在ngx_hash_key_t结构的数组中,作为参数传给ngx_hash_init()ngx_hash_wildcard_init()函数。其定义如下。

typedef struct {                    //hash key结构
    ngx_str_t         key;          //key,为nginx的字符串结构
    ngx_uint_t        key_hash;     //由该key计算出的hash值(通过hash函数如ngx_hash_key_lc())
    void             *value;        //该key对应的值,组成一个键-值对<key,value>
} ngx_hash_key_t;

typedef struct {                    //字符串结构
    size_t      len;                //字符串长度
    u_char     *data;               //字符串内容
} ngx_str_t;

其中,sizeof(ngx_hash_key_t) = 16。一般在使用中,value指针可能指向静态数据区(例如全局数组、常量字符串)、堆区(例如动态分配的数据区用来保存value)等。可参考本文后面的例子。

 

关于ngx_table_elt_t结构和ngx_hash_keys_arrays_t结构,因其对于hash结构本身没有太大作用,主要是为模块配置、referer合法性验证等设计的数据结构,例如httpcore模块、map模块、referer模块、SSI filter模块等,此处不再讲述,将在后续的文章中介绍。

1.4 hash的逻辑结构

ngx_hash_init_t结构引用了ngx_pool_t结构,因此本文参考nginx-1.0.4源码分析—内存池结构ngx_pool_t及内存管理一文画出相关结构的逻辑图,如下。注:本文采用UML的方式画出该图。

nginx源码分析—hash结构ngx_hash_t(v1.0.4)_第1张图片

2. hash操作

 

2.1 NGX_HASH_ELT_SIZE

 

NGX_HASH_ELT_SIZE宏用来计算上述ngx_hash_elt_t结构大小,定义如下。

#define NGX_HASH_ELT_SIZE(name)         \      //该参数name即为ngx_hash_elt_t结构指针
    (sizeof(void *) + ngx_align((name)->key.len + 2, sizeof(void *))) //以4字节对齐

32位平台上,sizeof(void*)=4(name)->key.len即是ngx_hash_elt_t结构中name数组保存的内容的长度,其中的"+2"是要加上该结构中len字段(u_short类型)的大小。

 

因此,NGX_HASH_ELT_SIZE(name)=4+ngx_align((name)->key.len + 2, 4),该式后半部分即是(name)->key.len+24字节对齐的大小。

 

2.2 hash函数

 

nginx-1.0.4提供的hash函数有以下几种。

#define ngx_hash(key, c)   ((ngx_uint_t) key * 31 + c)  //hash宏
ngx_uint_t ngx_hash_key(u_char *data, size_t len);
ngx_uint_t ngx_hash_key_lc(u_char *data, size_t len);   //lc表示lower case,即字符串转换为小写后再计算hash值
ngx_uint_t ngx_hash_strlow(u_char *dst, u_char *src, size_t n);

hash函数都很简单,以上3个函数都会调用ngx_hash宏,该宏返回一个()整数。此处介绍第一个函数,定义如下。

ngx_uint_t
ngx_hash_key(u_char *data, size_t len)
{
    ngx_uint_t  i, key;

    key = 0;

    for (i = 0; i < len; i++) {
        key = ngx_hash(key, data[i]);
    }

    return key;
}

因此,ngx_hash_key函数的计算可表述为下列公式。

Key[0] = data[0]
Key[1] = data[0]*31 + data[1]
Key[2] = (data[0]*31 + data[1])*31 + data[2]
...
Key[len-1] = ((((data[0]*31 + data[1])*31 + data[2])*31) ... data[len-2])*31 + data[len-1]

key[len-1]即为传入的参数data对应的hash值。

 

2.3 hash初始化

 

hash初始化由ngx_hash_init()函数完成,其names参数是ngx_hash_key_t结构的数组,即键-值对<key,value>数组,nelts表示该数组元素的个数。因此,在调用该函数进行初始化之前,ngx_hash_key_t结构的数组是准备好的,如何使用,可以采用nginxngx_array_t结构,详见本文后面的例子。

 

该函数初始化的结果就是将names数组保存的键-值对<key,value>,通过hash的方式将其存入相应的一个或多个hash(即代码中的buckets)中,该hash过程用到的hash函数一般为ngx_hash_key_lc等。hash桶里面存放的是ngx_hash_elt_t结构的指针(hash元素指针),该指针指向一个基本连续的数据区。该数据区中存放的是经hash之后的键-值对<key',value'>,即ngx_hash_elt_t结构中的字段<name,value>。每一个这样的数据区存放的键-值对<key',value'>可以是一个或多个。

 

此处有几个问题需要说明。

 

问题1:为什么说是基本连续?

——用NGX_HASH_ELT_SIZE宏计算某个hash元素的总长度时,存在以sizeof(void*)对齐的填补(padding)。因此将names数组中的键-值对<key,value>中的key拷贝到ngx_hash_elt_t结构的name[1]数组中时,已经为该hash元素分配的空间不会完全被用完,故这个数据区是基本连续的。这一点也可以参考本节后面的结构图或本文后面的例子。

 

问题2:这些基本连续的数据区从哪里分配的?

——当然是从该函数的第一个参数ngx_hash_init_tpool字段指向的内存池中分配的。

 

问题3<key',value'><key,value>不同的是什么?

——key保存的仅仅是个指针,而key'却是key拷贝到name[1]的结果。而valuevalue'都是指针。如1.3节说明,value指针可能指向静态数据区(例如全局数组、常量字符串)、堆区(例如动态分配的数据区用来保存value)等。可参考本文后面的例子。

 

问题4:如何知道某个键-值对<key,value>放在哪个hash桶中?

——key = names[n].key_hash % size; 代码中的这个计算是也。计算结果key即是该键要放在那个hash桶的编号(0size-1)

 

该函数代码如下。一些疑点、难点的解释请参考//后笔者所加的注释,也可参考本节的hash结构图。

//nelts是names数组中(实际)元素的个数
ngx_int_t
ngx_hash_init(ngx_hash_init_t *hinit, ngx_hash_key_t *names, ngx_uint_t nelts)
{
    u_char          *elts;
    size_t           len;
    u_short         *test;
    ngx_uint_t       i, n, key, size, start, bucket_size;
    ngx_hash_elt_t  *elt, **buckets;

    for (n = 0; n < nelts; n++) {  //检查names数组的每一个元素,判断桶的大小是否够分配
        if (hinit->bucket_size < NGX_HASH_ELT_SIZE(&names[n]) + sizeof(void *))
        {   //有任何一个元素,桶的大小不够为该元素分配空间,则退出
            ngx_log_error(NGX_LOG_EMERG, hinit->pool->log, 0,
                          "could not build the %s, you should "
                          "increase %s_bucket_size: %i",
                          hinit->name, hinit->name, hinit->bucket_size);
            return NGX_ERROR;
        }
    }

    //分配2*max_size个字节的空间保存hash数据(该内存分配操作不在nginx的内存池中进行,因为test只是临时的)
    test = ngx_alloc(hinit->max_size * sizeof(u_short), hinit->pool->log);
    if (test == NULL) {
        return NGX_ERROR;
    }

    bucket_size = hinit->bucket_size - sizeof(void *); //一般sizeof(void*)=4

    start = nelts / (bucket_size / (2 * sizeof(void *))); //
    start = start ? start : 1;

    if (hinit->max_size > 10000 && hinit->max_size / nelts < 100) {
        start = hinit->max_size - 1000;
    }

    for (size = start; size < hinit->max_size; size++) {

        ngx_memzero(test, size * sizeof(u_short));

        //标记1:此块代码是检查bucket大小是否够分配hash数据
        for (n = 0; n < nelts; n++) {
            if (names[n].key.data == NULL) {
                continue;
            }

            //计算key和names中所有name长度,并保存在test[key]中
            key = names[n].key_hash % size; //若size=1,则key一直为0
            test[key] = (u_short) (test[key] + NGX_HASH_ELT_SIZE(&names[n]));

            if (test[key] > (u_short) bucket_size) {//若超过了桶的大小,则到下一个桶重新计算
                goto next;
            }
        }

        goto found;

    next:

        continue;
    }

    //若没有找到合适的bucket,退出
    ngx_log_error(NGX_LOG_EMERG, hinit->pool->log, 0,
                  "could not build the %s, you should increase "
                  "either %s_max_size: %i or %s_bucket_size: %i",
                  hinit->name, hinit->name, hinit->max_size,
                  hinit->name, hinit->bucket_size);

    ngx_free(test);

    return NGX_ERROR;

found:  //找到合适的bucket

    for (i = 0; i < size; i++) {  //将test数组前size个元素初始化为4
        test[i] = sizeof(void *);
    }

    /** 标记2:与标记1代码基本相同,但此块代码是再次计算所有hash数据的总长度(标记1的检查已通过)
        但此处的test[i]已被初始化为4,即相当于后续的计算再加上一个void指针的大小。
     */
    for (n = 0; n < nelts; n++) {
        if (names[n].key.data == NULL) {
            continue;
        }

        //计算key和names中所有name长度,并保存在test[key]中
        key = names[n].key_hash % size; //若size=1,则key一直为0
        test[key] = (u_short) (test[key] + NGX_HASH_ELT_SIZE(&names[n]));
    }

     //计算hash数据的总长度
    len = 0;

    for (i = 0; i < size; i++) {
        if (test[i] == sizeof(void *)) {//若test[i]仍为初始化的值4,即没有变化,则继续
            continue;
        }

        //对test[i]按ngx_cacheline_size对齐(32位平台,ngx_cacheline_size=32)
        test[i] = (u_short) (ngx_align(test[i], ngx_cacheline_size));

        len += test[i];
    }

    if (hinit->hash == NULL) {//在内存池中分配hash头及buckets数组(size个ngx_hash_elt_t*结构)
        hinit->hash = ngx_pcalloc(hinit->pool, sizeof(ngx_hash_wildcard_t)
            + size * sizeof(ngx_hash_elt_t *));
        if (hinit->hash == NULL) {
            ngx_free(test);
            return NGX_ERROR;
        }

        //计算buckets的启示位置(在ngx_hash_wildcard_t结构之后)
        buckets = (ngx_hash_elt_t **)
            ((u_char *) hinit->hash + sizeof(ngx_hash_wildcard_t));

    } else {  //在内存池中分配buckets数组(size个ngx_hash_elt_t*结构)
        buckets = ngx_pcalloc(hinit->pool, size * sizeof(ngx_hash_elt_t *));
        if (buckets == NULL) {
            ngx_free(test);
            return NGX_ERROR;
        }
    }

    //接着分配elts,大小为len+ngx_cacheline_size,此处为什么+32?——下面要按32字节对齐
    elts = ngx_palloc(hinit->pool, len + ngx_cacheline_size);
    if (elts == NULL) {
        ngx_free(test);
        return NGX_ERROR;
    }

     //将elts地址按ngx_cacheline_size=32对齐
    elts = ngx_align_ptr(elts, ngx_cacheline_size);

    for (i = 0; i < size; i++) {  //将buckets数组与相应elts对应起来
        if (test[i] == sizeof(void *)) {
            continue;
        }

        buckets[i] = (ngx_hash_elt_t *) elts;
        elts += test[i];

    }

    for (i = 0; i < size; i++) {  //test数组置0
        test[i] = 0;
    }

    for (n = 0; n < nelts; n++) { //将传进来的每一个hash数据存入hash表
        if (names[n].key.data == NULL) {
            continue;
        }

        //计算key,即将被hash的数据在第几个bucket,并计算其对应的elts位置
        key = names[n].key_hash % size;
        elt = (ngx_hash_elt_t *) ((u_char *) buckets[key] + test[key]);

        //对ngx_hash_elt_t结构赋值
        elt->value = names[n].value;
        elt->len = (u_short) names[n].key.len;

        ngx_strlow(elt->name, names[n].key.data, names[n].key.len);

        //计算下一个要被hash的数据的长度偏移
        test[key] = (u_short) (test[key] + NGX_HASH_ELT_SIZE(&names[n]));
    }

    for (i = 0; i < size; i++) {
        if (buckets[i] == NULL) {
            continue;
        }

        //test[i]相当于所有被hash的数据总长度
        elt = (ngx_hash_elt_t *) ((u_char *) buckets[i] + test[i]);

        elt->value = NULL;
    }

    ngx_free(test);  //释放该临时空间

    hinit->hash->buckets = buckets;
    hinit->hash->size = size;

    return NGX_OK;
}

所谓的hash数据长度即指ngx_hash_elt_t结构被赋值后的长度。nelts个元素存放在names数组中,调用该函数对hash进行初始化之后,这nelts个元素被保存在sizehash桶指向的ngx_hash_elts_t数据区,这些数据区中共保存了neltshash元素。即hash(buckets)存放的是ngx_hash_elt_t数据区的起始地址,以该起始地址开始的数据区存放的是经hash之后的hash元素,每个hash元素的最后是以name[0]为开始的字符串,该字符串就是names数组中某个元素的key,即键值对<key,value>中的key,然后该字符串之后会有几个字节的因对齐产生的padding

 

一个典型的经初始化后的hash物理结构如下。具体的可参考后文的例子。

nginx源码分析—hash结构ngx_hash_t(v1.0.4)_第2张图片

2.4 hash查找

 

hash查找操作由ngx_hash_find()函数完成,代码如下。//后的注释为笔者所加。

//由key,name,len信息在hash指向的hash table中查找该key对应的value
void *
ngx_hash_find(ngx_hash_t *hash, ngx_uint_t key, u_char *name, size_t len)
{
    ngx_uint_t       i;
    ngx_hash_elt_t  *elt;

    elt = hash->buckets[key % hash->size];//由key找到所在的bucket(该bucket中保存其elts地址)

    if (elt == NULL) {
        return NULL;
    }

    while (elt->value) {
        if (len != (size_t) elt->len) {  //先判断长度
            goto next;
        }

        for (i = 0; i < len; i++) {
            if (name[i] != elt->name[i]) {  //接着比较name的内容(此处按字符匹配)
                goto next;
            }
        }

        return elt->value;  //匹配成功,直接返回该ngx_hash_elt_t结构的value字段

    next:
        //注意此处从elt->name[0]地址处向后偏移,故偏移只需加该elt的len即可,然后在以4字节对齐
        elt = (ngx_hash_elt_t *) ngx_align_ptr(&elt->name[0] + elt->len,
                                               sizeof(void *));
        continue;
    }

    return NULL;
}

查找操作相当简单,由key直接计算所在的bucket,该bucket中保存其所在ngx_hash_elt_t数据区的起始地址;然后根据长度判断并用name内容匹配,匹配成功,其ngx_hash_elt_t结构的value字段即是所求。

 

3. 一个例子

 

本节给出一个创建内存池并从中分配hash结构、hash桶、hash元素并将键-值对<key,value>加入该hash结构的简单例子。

 

在该例中,将完成这样一个应用,将给定的多个url及其ip组成的二元组<url,ip>作为<key,value>,初始化时对这些<url,ip>进行hash,然后根据给定的url查找其对应的ip地址,若没有找到,则给出相关提示信息。以此向读者展示nginxhash使用方法。

3.1代码

/**
 * ngx_hash_t test
 * in this example, it will first save URLs into the memory pool, and IPs saved in static memory.
 * then, give some examples to find IP according to a URL.
 */

#include <stdio.h>
#include "ngx_config.h"
#include "ngx_conf_file.h"
#include "nginx.h"
#include "ngx_core.h"
#include "ngx_string.h"
#include "ngx_palloc.h"
#include "ngx_array.h"
#include "ngx_hash.h"

#define Max_Num 7
#define Max_Size 1024
#define Bucket_Size 64  //256, 64

#define NGX_HASH_ELT_SIZE(name)               \
    (sizeof(void *) + ngx_align((name)->key.len + 2, sizeof(void *)))

/* for hash test */
static ngx_str_t urls[Max_Num] = {
    ngx_string("www.baidu.com"),  //220.181.111.147
    ngx_string("www.sina.com.cn"),  //58.63.236.35
    ngx_string("www.google.com"),  //74.125.71.105
    ngx_string("www.qq.com"),  //60.28.14.190
    ngx_string("www.163.com"),  //123.103.14.237
    ngx_string("www.sohu.com"),  //219.234.82.50
    ngx_string("www.abo321.org")  //117.40.196.26
};

static char* values[Max_Num] = {
    "220.181.111.147",
    "58.63.236.35",
    "74.125.71.105",
    "60.28.14.190",
    "123.103.14.237",
    "219.234.82.50",
    "117.40.196.26"
};

#define Max_Url_Len 15
#define Max_Ip_Len 15

#define Max_Num2 2

/* for finding test */
static ngx_str_t urls2[Max_Num2] = {
    ngx_string("www.china.com"),  //60.217.58.79
    ngx_string("www.csdn.net")  //117.79.157.242
};

ngx_hash_t* init_hash(ngx_pool_t *pool, ngx_array_t *array);
void dump_pool(ngx_pool_t* pool);
void dump_hash_array(ngx_array_t* a);
void dump_hash(ngx_hash_t *hash, ngx_array_t *array);
ngx_array_t* add_urls_to_array(ngx_pool_t *pool);
void find_test(ngx_hash_t *hash, ngx_str_t addr[], int num);

/* for passing compiling */
volatile ngx_cycle_t  *ngx_cycle;
void ngx_log_error_core(ngx_uint_t level, ngx_log_t *log, ngx_err_t err, const char *fmt, ...)
{
}

int main(/* int argc, char **argv */)
{
    ngx_pool_t *pool = NULL;
    ngx_array_t *array = NULL;
    ngx_hash_t *hash;

    printf("--------------------------------\n");
    printf("create a new pool:\n");
    printf("--------------------------------\n");
    pool = ngx_create_pool(1024, NULL);

    dump_pool(pool);

    printf("--------------------------------\n");
    printf("create and add urls to it:\n");
    printf("--------------------------------\n");
    array = add_urls_to_array(pool);  //in fact, here should validate array
    dump_hash_array(array);

    printf("--------------------------------\n");
    printf("the pool:\n");
    printf("--------------------------------\n");
    dump_pool(pool);

    hash = init_hash(pool, array);
    if (hash == NULL)
    {
        printf("Failed to initialize hash!\n");
        return -1;
    }

    printf("--------------------------------\n");
    printf("the hash:\n");
    printf("--------------------------------\n");
    dump_hash(hash, array);
    printf("\n");

    printf("--------------------------------\n");
    printf("the pool:\n");
    printf("--------------------------------\n");
    dump_pool(pool);

    //find test
    printf("--------------------------------\n");
    printf("find test:\n");
    printf("--------------------------------\n");
    find_test(hash, urls, Max_Num);
    printf("\n");

    find_test(hash, urls2, Max_Num2);

    //release
    ngx_array_destroy(array);
    ngx_destroy_pool(pool);

    return 0;
}

ngx_hash_t* init_hash(ngx_pool_t *pool, ngx_array_t *array)
{
    ngx_int_t result;
    ngx_hash_init_t hinit;

    ngx_cacheline_size = 32;  //here this variable for nginx must be defined
    hinit.hash = NULL;  //if hinit.hash is NULL, it will alloc memory for it in ngx_hash_init
    hinit.key = &ngx_hash_key_lc;  //hash function
    hinit.max_size = Max_Size;
    hinit.bucket_size = Bucket_Size;
    hinit.name = "my_hash_sample";
    hinit.pool = pool;  //the hash table exists in the memory pool
    hinit.temp_pool = NULL;

    result = ngx_hash_init(&hinit, (ngx_hash_key_t*)array->elts, array->nelts);
    if (result != NGX_OK)
        return NULL;

    return hinit.hash;
}

void dump_pool(ngx_pool_t* pool)
{
    while (pool)
    {
        printf("pool = 0x%x\n", pool);
        printf("  .d\n");
        printf("    .last = 0x%x\n", pool->d.last);
        printf("    .end = 0x%x\n", pool->d.end);
        printf("    .next = 0x%x\n", pool->d.next);
        printf("    .failed = %d\n", pool->d.failed);
        printf("  .max = %d\n", pool->max);
        printf("  .current = 0x%x\n", pool->current);
        printf("  .chain = 0x%x\n", pool->chain);
        printf("  .large = 0x%x\n", pool->large);
        printf("  .cleanup = 0x%x\n", pool->cleanup);
        printf("  .log = 0x%x\n", pool->log);
        printf("available pool memory = %d\n\n", pool->d.end - pool->d.last);
        pool = pool->d.next;
    }
}

void dump_hash_array(ngx_array_t* a)
{
    char prefix[] = "          ";

    if (a == NULL)
        return;

    printf("array = 0x%x\n", a);
    printf("  .elts = 0x%x\n", a->elts);
    printf("  .nelts = %d\n", a->nelts);
    printf("  .size = %d\n", a->size);
    printf("  .nalloc = %d\n", a->nalloc);
    printf("  .pool = 0x%x\n", a->pool);

    printf("  elements:\n");
    ngx_hash_key_t *ptr = (ngx_hash_key_t*)(a->elts);
    for (; ptr < (ngx_hash_key_t*)(a->elts + a->nalloc * a->size); ptr++)
    {
        printf("    0x%x: {key = (\"%s\"%.*s, %d), key_hash = %-10ld, value = \"%s\"%.*s}\n", 
            ptr, ptr->key.data, Max_Url_Len - ptr->key.len, prefix, ptr->key.len, 
            ptr->key_hash, ptr->value, Max_Ip_Len - strlen(ptr->value), prefix);
    }
    printf("\n");
}

/**
 * pass array pointer to read elts[i].key_hash, then for getting the position - key
 */
void dump_hash(ngx_hash_t *hash, ngx_array_t *array)
{
    int loop;
    char prefix[] = "          ";
    u_short test[Max_Num] = {0};
    ngx_uint_t key;
    ngx_hash_key_t* elts;
    int nelts;

    if (hash == NULL)
        return;

    printf("hash = 0x%x: **buckets = 0x%x, size = %d\n", hash, hash->buckets, hash->size);

    for (loop = 0; loop < hash->size; loop++)
    {
        ngx_hash_elt_t *elt = hash->buckets[loop];
        printf("  0x%x: buckets[%d] = 0x%x\n", &(hash->buckets[loop]), loop, elt);
    }
    printf("\n");

    elts = (ngx_hash_key_t*)array->elts;
    nelts = array->nelts;
    for (loop = 0; loop < nelts; loop++)
    {
        char url[Max_Url_Len + 1] = {0};

        key = elts[loop].key_hash % hash->size;
        ngx_hash_elt_t *elt = (ngx_hash_elt_t *) ((u_char *) hash->buckets[key] + test[key]);

        ngx_strlow(url, elt->name, elt->len);
        printf("  buckets %d: 0x%x: {value = \"%s\"%.*s, len = %d, name = \"%s\"%.*s}\n", 
            key, elt, (char*)elt->value, Max_Ip_Len - strlen((char*)elt->value), prefix, 
            elt->len, url, Max_Url_Len - elt->len, prefix); //replace elt->name with url

        test[key] = (u_short) (test[key] + NGX_HASH_ELT_SIZE(&elts[loop]));
    }
}

ngx_array_t* add_urls_to_array(ngx_pool_t *pool)
{
    int loop;
    char prefix[] = "          ";
    ngx_array_t *a = ngx_array_create(pool, Max_Num, sizeof(ngx_hash_key_t));

    for (loop = 0; loop < Max_Num; loop++)
    {
        ngx_hash_key_t *hashkey = (ngx_hash_key_t*)ngx_array_push(a);
        hashkey->key = urls[loop];
        hashkey->key_hash = ngx_hash_key_lc(urls[loop].data, urls[loop].len);
        hashkey->value = (void*)values[loop];
        /** for debug
        printf("{key = (\"%s\"%.*s, %d), key_hash = %-10ld, value = \"%s\"%.*s}, added to array\n",
            hashkey->key.data, Max_Url_Len - hashkey->key.len, prefix, hashkey->key.len,
            hashkey->key_hash, hashkey->value, Max_Ip_Len - strlen(hashkey->value), prefix);
        */
    }

    return a;    
}

void find_test(ngx_hash_t *hash, ngx_str_t addr[], int num)
{
    ngx_uint_t key;
    int loop;
    char prefix[] = "          ";

    for (loop = 0; loop < num; loop++)
    {
        key = ngx_hash_key_lc(addr[loop].data, addr[loop].len);
        void *value = ngx_hash_find(hash, key, addr[loop].data, addr[loop].len);
        if (value)
        {
            printf("(url = \"%s\"%.*s, key = %-10ld) found, (ip = \"%s\")\n", 
                addr[loop].data, Max_Url_Len - addr[loop].len, prefix, key, (char*)value);
        }
        else
        {
            printf("(url = \"%s\"%.*s, key = %-10d) not found!\n", 
                addr[loop].data, Max_Url_Len - addr[loop].len, prefix, key);
        }
    }
}

3.2如何编译

 

请参考nginx-1.0.4源码分析—内存池结构ngx_pool_t及内存管理一文。本文编写的makefile文件如下。

CXX = gcc
CXXFLAGS += -g -Wall -Wextra

NGX_ROOT = /usr/src/nginx-1.0.4

TARGETS = ngx_hash_t_test
TARGETS_C_FILE = $(TARGETS).c

CLEANUP = rm -f $(TARGETS) *.o

all: $(TARGETS)

clean:
	$(CLEANUP)

CORE_INCS = -I. \
	-I$(NGX_ROOT)/src/core \
	-I$(NGX_ROOT)/src/event \
	-I$(NGX_ROOT)/src/event/modules \
	-I$(NGX_ROOT)/src/os/unix \
	-I$(NGX_ROOT)/objs \

NGX_PALLOC = $(NGX_ROOT)/objs/src/core/ngx_palloc.o
NGX_STRING = $(NGX_ROOT)/objs/src/core/ngx_string.o
NGX_ALLOC = $(NGX_ROOT)/objs/src/os/unix/ngx_alloc.o
NGX_ARRAY = $(NGX_ROOT)/objs/src/core/ngx_array.o
NGX_HASH = $(NGX_ROOT)/objs/src/core/ngx_hash.o

$(TARGETS): $(TARGETS_C_FILE)
	$(CXX) $(CXXFLAGS) $(CORE_INCS) $(NGX_PALLOC) $(NGX_STRING) $(NGX_ALLOC) $(NGX_ARRAY) $(NGX_HASH) $^ -o $@

 

3.3 运行结果

3.3.1 bucket_size=64字节

bucket_size=64字节时,运行结果如下。

# ./ngx_hash_t_test
--------------------------------
create a new pool:
--------------------------------
pool = 0x8870020
  .d
    .last = 0x8870048
    .end = 0x8870420
    .next = 0x0
    .failed = 0
  .max = 984
  .current = 0x8870020
  .chain = 0x0
  .large = 0x0
  .cleanup = 0x0
  .log = 0x0
available pool memory = 984

--------------------------------
create and add urls to it:
--------------------------------
array = 0x8870048
  .elts = 0x887005c
  .nelts = 7
  .size = 16
  .nalloc = 7
  .pool = 0x8870020
  elements:
    0x887005c: {key = ("www.baidu.com"  , 13), key_hash = 270263191 , value = "220.181.111.147"}
    0x887006c: {key = ("www.sina.com.cn", 15), key_hash = 1528635686, value = "58.63.236.35"   }
    0x887007c: {key = ("www.google.com" , 14), key_hash = -702889725, value = "74.125.71.105"  }
    0x887008c: {key = ("www.qq.com"     , 10), key_hash = 203430122 , value = "60.28.14.190"   }
    0x887009c: {key = ("www.163.com"    , 11), key_hash = -640386838, value = "123.103.14.237" }
    0x88700ac: {key = ("www.sohu.com"   , 12), key_hash = 1313636595, value = "219.234.82.50"  }
    0x88700bc: {key = ("www.abo321.org" , 14), key_hash = 1884209457, value = "117.40.196.26"  }

--------------------------------
the pool:
--------------------------------
pool = 0x8870020
  .d
    .last = 0x88700cc
    .end = 0x8870420
    .next = 0x0
    .failed = 0
  .max = 984
  .current = 0x8870020
  .chain = 0x0
  .large = 0x0
  .cleanup = 0x0
  .log = 0x0
available pool memory = 852

--------------------------------
the hash:
--------------------------------
hash = 0x88700cc: **buckets = 0x88700d8, size = 3
  0x88700d8: buckets[0] = 0x8870100
  0x88700dc: buckets[1] = 0x8870140
  0x88700e0: buckets[2] = 0x8870180

  buckets 1: 0x8870140: {value = "220.181.111.147", len = 13, name = "www.baidu.com"  }
  buckets 2: 0x8870180: {value = "58.63.236.35"   , len = 15, name = "www.sina.com.cn"}
  buckets 1: 0x8870154: {value = "74.125.71.105"  , len = 14, name = "www.google.com" }
  buckets 2: 0x8870198: {value = "60.28.14.190"   , len = 10, name = "www.qq.com"     }
  buckets 0: 0x8870100: {value = "123.103.14.237" , len = 11, name = "www.163.com"    }
  buckets 0: 0x8870114: {value = "219.234.82.50"  , len = 12, name = "www.sohu.com"   }
  buckets 0: 0x8870128: {value = "117.40.196.26"  , len = 14, name = "www.abo321.org" }

--------------------------------
the pool:
--------------------------------
pool = 0x8870020
  .d
    .last = 0x88701c4
    .end = 0x8870420
    .next = 0x0
    .failed = 0
  .max = 984
  .current = 0x8870020
  .chain = 0x0
  .large = 0x0
  .cleanup = 0x0
  .log = 0x0
available pool memory = 604

--------------------------------
find test:
--------------------------------
(url = "www.baidu.com"  , key = 270263191 ) found, (ip = "220.181.111.147")
(url = "www.sina.com.cn", key = 1528635686) found, (ip = "58.63.236.35")
(url = "www.google.com" , key = -702889725) found, (ip = "74.125.71.105")
(url = "www.qq.com"     , key = 203430122 ) found, (ip = "60.28.14.190")
(url = "www.163.com"    , key = -640386838) found, (ip = "123.103.14.237")
(url = "www.sohu.com"   , key = 1313636595) found, (ip = "219.234.82.50")
(url = "www.abo321.org" , key = 1884209457) found, (ip = "117.40.196.26")

(url = "www.china.com"  , key = -1954599725) not found!
(url = "www.csdn.net"   , key = -1667448544) not found!

以上结果是bucket_size=64字节的输出。由该结果可以看出,对于给定的7url,程序将其分到了3bucket中,详见该结果。该例子的hash物理结构图如下。

nginx源码分析—hash结构ngx_hash_t(v1.0.4)_第3张图片

3.3.2 bucket_size=256字节

bucket_size=256字节时,运行结果如下。
# ./ngx_hash_t_test
--------------------------------
create a new pool:
--------------------------------
pool = 0x8b74020
  .d
    .last = 0x8b74048
    .end = 0x8b74420
    .next = 0x0
    .failed = 0
  .max = 984
  .current = 0x8b74020
  .chain = 0x0
  .large = 0x0
  .cleanup = 0x0
  .log = 0x0
available pool memory = 984

--------------------------------
create and add urls to it:
--------------------------------
array = 0x8b74048
  .elts = 0x8b7405c
  .nelts = 7
  .size = 16
  .nalloc = 7
  .pool = 0x8b74020
  elements:
    0x8b7405c: {key = ("www.baidu.com"  , 13), key_hash = 270263191 , value = "220.181.111.147"}
    0x8b7406c: {key = ("www.sina.com.cn", 15), key_hash = 1528635686, value = "58.63.236.35"   }
    0x8b7407c: {key = ("www.google.com" , 14), key_hash = -702889725, value = "74.125.71.105"  }
    0x8b7408c: {key = ("www.qq.com"     , 10), key_hash = 203430122 , value = "60.28.14.190"   }
    0x8b7409c: {key = ("www.163.com"    , 11), key_hash = -640386838, value = "123.103.14.237" }
    0x8b740ac: {key = ("www.sohu.com"   , 12), key_hash = 1313636595, value = "219.234.82.50"  }
    0x8b740bc: {key = ("www.abo321.org" , 14), key_hash = 1884209457, value = "117.40.196.26"  }

--------------------------------
the pool:
--------------------------------
pool = 0x8b74020
  .d
    .last = 0x8b740cc
    .end = 0x8b74420
    .next = 0x0
    .failed = 0
  .max = 984
  .current = 0x8b74020
  .chain = 0x0
  .large = 0x0
  .cleanup = 0x0
  .log = 0x0
available pool memory = 852

--------------------------------
the hash:
--------------------------------
hash = 0x8b740cc: **buckets = 0x8b740d8, size = 1
  0x8b740d8: buckets[0] = 0x8b740e0

  buckets 0: {value = "220.181.111.147", len = 13, name = "www.baidu.com"  }
  buckets 0: {value = "58.63.236.35"   , len = 15, name = "www.sina.com.cn"}
  buckets 0: {value = "74.125.71.105"  , len = 14, name = "www.google.com" }
  buckets 0: {value = "60.28.14.190"   , len = 10, name = "www.qq.com"     }
  buckets 0: {value = "123.103.14.237" , len = 11, name = "www.163.com"    }
  buckets 0: {value = "219.234.82.50"  , len = 12, name = "www.sohu.com"   }
  buckets 0: {value = "117.40.196.26"  , len = 14, name = "www.abo321.org" }

--------------------------------
the pool:
--------------------------------
pool = 0x8b74020
  .d
    .last = 0x8b7419c
    .end = 0x8b74420
    .next = 0x0
    .failed = 0
  .max = 984
  .current = 0x8b74020
  .chain = 0x0
  .large = 0x0
  .cleanup = 0x0
  .log = 0x0
available pool memory = 644

--------------------------------
find test:
--------------------------------
(url = "www.baidu.com"  , key = 270263191 ) found, (ip = "220.181.111.147")
(url = "www.sina.com.cn", key = 1528635686) found, (ip = "58.63.236.35")
(url = "www.google.com" , key = -702889725) found, (ip = "74.125.71.105")
(url = "www.qq.com"     , key = 203430122 ) found, (ip = "60.28.14.190")
(url = "www.163.com"    , key = -640386838) found, (ip = "123.103.14.237")
(url = "www.sohu.com"   , key = 1313636595) found, (ip = "219.234.82.50")
(url = "www.abo321.org" , key = 1884209457) found, (ip = "117.40.196.26")

(url = "www.china.com"  , key = -1954599725) not found!
(url = "www.csdn.net"   , key = -1667448544) not found!

以上结果是bucket_size=256字节的输出。由给结果可以看出,对于给定的7url,程序将其放到了1bucket中,即ngx_hash_init()函数中的size=1,因这7url的总长度只有140,因此,只需size=1bucket,即buckets[0]

 

下表是ngx_hash_init()函数在计算过程中的一些数据。物理结构图省略,可参考上图。

 

url

计算长度

test[0]的值

www.baidu.com

4+ngx_align(13+2,4)=20

20

www.sina.com.cn

4+ngx_align(15+2,4)=24

44

www.google.com

4+ngx_align(14+2,4)=20

64

www.qq.com

4+ngx_align(10+2,4)=16

80

www.163.com

4+ngx_align(11+2,4)=20

100

www.sohu.com

4+ngx_align(12+2,4)=20

120

www.abo321.org

4+ngx_align(14+2,4)=20

140

 

4. 小结

本文针对nginx-1.0.4hash结构进行了较为全面的分析,包括hash结构、hash元素结构、hash初始化结构等,hash操作主要包括hash初始化、hash查找等。最后通过一个简单例子向读者展示nginxhash使用方法,并给出详细的运行结果,且画出hash的物理结构图,以此向图这展示hash的设计、原理;同时借此向读者展示编译测试nginx代码的方法。

 

敬请关注后续的分析。谢谢!

 

Reference

Nginx代码研究计划 (RainX1982)

nginx-1.0.4源码分析—模块及其初始化 (阿波)

nginx-1.0.4源码分析—内存池结构ngx_pool_t及内存管理 (阿波)

nginx-1.0.4源码分析—数组结构ngx_array_t (阿波)

nginx-1.0.4源码分析—链表结构ngx_list_t (阿波)

nginx-1.0.4源码分析—队列结构ngx_queue_t (阿波)

 

nginx-1.0.4源码分析—模块及其初始化 (阿波)

nginx-1.0.4源码分析—内存池结构ngx_pool_t及内存管理 (阿波)

nginx-1.0.4源码分析—数组结构ngx_array_t (阿波)

nginx-1.0.4源码分析—链表结构ngx_list_t (阿波)

nginx-1.0.4源码分析—队列结构ngx_queue_t (阿波)

你可能感兴趣的:(数据结构,nginx,String,null,url,wildcard)