mysql数据库默认的字符集是latin1。默认情况下,我们编译的httpd插件是可以正常读取该类型的数据库,并且不会出现乱码。但是,如果我们的数据库变成其他格式,比如UTF8,那么默认读取出来的数据就是乱码,且无论我们怎么设置参数都不会起作用。(转载请指明出于breaksoftware的csdn博客)
我们看一个utf8类型数据库的例子,使用以下指令查看字符集
SHOW VARIABLES LIKE 'character_set_%';
“charset=utf8"我们尝试将这句话加入到连接数据库的参数中
status = apr_dbd_open(driver, pool_db, "host=localhost;user=user_name;pass=password;dbname=database_name;charset=utf8", &handle);这句api可以执行成功,但是读取的结果还是乱码!这很不科学,于是我浏览了下apr数据库相关函数,发现没有一个特定的接口可以设定字符集。可以想象apr-util库只是对libmysql++-dev 复杂接口的封装。那么存在一种可能:apr-util实现还不全面。我们阅读apr_dbd_open的实现
struct { const char *field; const char *value; } fields[] = { {"host", NULL}, {"user", NULL}, {"pass", NULL}, {"dbname", NULL}, {"port", NULL}, {"sock", NULL}, {"flags", NULL}, {"fldsz", NULL}, {"group", NULL}, {"reconnect", NULL}, {NULL, NULL} }; unsigned int port = 0; apr_dbd_t *sql = apr_pcalloc(pool, sizeof(apr_dbd_t)); sql->fldsz = FIELDSIZE; sql->conn = mysql_init(sql->conn); if ( sql->conn == NULL ) { return NULL; } for (ptr = strchr(params, '='); ptr; ptr = strchr(ptr, '=')) { /* don't dereference memory that may not belong to us */ if (ptr == params) { ++ptr; continue; } for (key = ptr-1; apr_isspace(*key); --key); klen = 0; while (apr_isalpha(*key)) { /* don't parse backwards off the start of the string */ if (key == params) { --key; ++klen; break; } --key; ++klen; } ++key; for (value = ptr+1; apr_isspace(*value); ++value); vlen = strcspn(value, delims); for (i = 0; fields[i].field != NULL; i++) { if (!strncasecmp(fields[i].field, key, klen)) { fields[i].value = apr_pstrndup(pool, value, vlen); break; } } ptr = value+vlen; } if (fields[4].value != NULL) { port = atoi(fields[4].value); } if (fields[6].value != NULL && !strcmp(fields[6].value, "CLIENT_FOUND_ROWS")) { flags |= CLIENT_FOUND_ROWS; /* only option we know */ } if (fields[7].value != NULL) { sql->fldsz = atol(fields[7].value); } if (fields[8].value != NULL) { mysql_options(sql->conn, MYSQL_READ_DEFAULT_GROUP, fields[8].value); } #if MYSQL_VERSION_ID >= 50013 if (fields[9].value != NULL) { do_reconnect = atoi(fields[9].value) ? 1 : 0; } #endif #if MYSQL_VERSION_ID >= 50013 /* the MySQL manual says this should be BEFORE mysql_real_connect */ mysql_options(sql->conn, MYSQL_OPT_RECONNECT, &do_reconnect); #endif real_conn = mysql_real_connect(sql->conn, fields[0].value, fields[1].value, fields[2].value, fields[3].value, port, fields[5].value, flags);
粗略读了一下这段代码,可以得出以下几点判断:
The user and passwd parameters use whatever character set has been configured for the MYSQL object. By default, this is latin1, but can be changed by calling mysql_options(mysql, MYSQL_SET_CHARSET_NAME, "charset_name") prior to connecting.
可以见得我们需要使用mysql_options,传递MYSQL_SET_CHARSET_NAME来设置字符集。但是通过对apr-util库的通篇搜索,mysql_options只是在apr_dbd_open中被使用了,且还搜索不到MYSQL_SET_CHARSET_NAME。那么我们可以认定apr-util还没实现”字符集选择“的功能。我们需要自己手工修改代码(/usr/src/apr-util-1.5.4/dbd/apr_dbd_mysql.c)
struct { const char *field; const char *value; } fields[] = { {"host", NULL}, {"user", NULL}, {"pass", NULL}, {"dbname", NULL}, {"port", NULL}, {"sock", NULL}, {"flags", NULL}, {"fldsz", NULL}, {"group", NULL}, {"reconnect", NULL}, {"charset", NULL}, {NULL, NULL} };先设定好需要解析的字段,再在mysql_real_connect之前插入
if (fields[10].value != NULL) { mysql_options(sql->conn, MYSQL_SET_CHARSET_NAME, fields[10].value); }如此,重新编译apr-util和httpd库,我们的插件便可以支持数据库字符集的选择了。