一个字节能表示0~255之间共256个数字,根据ASCII码英文字母A-Z和a-z按顺序排列,其中
'A' = 65 = 0b01000001 = 0x41 'B' = 66 = 0b01000010 = 0x42 ... 'Z' = 90 = 0b01011010 = 0x5a 'a' = 97 = 0b01100001 = 0x61 'b' = 98 = 0b01100010 = 0x62 ... 'z' = 122 = 0b01111010 = 0x7a
传统的判断方法是直接判断范围:
#define judgeletter_classic(ch) (((ch)>='A'&&ch<='Z')||((ch)>='a'&&(ch)<='z'))
但是仔细观察二进制部分会发现以下特点:
(1)所有字母最高两位一定是01
(2)从高位数第三位为0时为大写字母,1时为小写字母
(3)低5位从00001到11010共26种情况分别代表A-Z和a-z
所以得到以下通过分析位来判断的方法:
#define judgeletter_bit(ch) (((ch)>>6)==1)&&((((ch)-1)&31)<26)
还有一种方法叫查表法,首先构建一个表,把是字母的都标记为1,其他标记为0,这样就可以通过直接访问表中对应位置的数据得到判断:
#define judgeletter_table(ch) (*(table + (ch)))
建立表:
unsigned char table[256]; memset(table, 0, 256); for (i = 'A'; i <= 'Z'; i++) { *(table + i) = 1; *(table + i + 'a' - 'A') = 1; }
最后C标准库内也自带了isalpha宏,可以判断是否为字母,在ctype.h里有声明:
# define isalpha(c) __isctype((c), _ISalpha)
现在我们来测试一下三种方法的速度,我们分别用三种方法循环判断0-255之间所有数字是否为ASCII码的英文字母,每种方法10000000次,然后输出所用时间,程序如下:
#include <stdio.h> #include <memory.h> #include <ctype.h> #include <sys/time.h> #define TEST_TIMES 10000000 #define DEFINE_TIME / struct timeval time1, time2;/ #define START_RECORDING / gettimeofday(&time1, NULL);/ #define STOP_RECORDING / gettimeofday(&time2, NULL);/ #define PRINT_TIME / printf("%lu:%lu/n", time2.tv_sec - time1.tv_sec, time2.tv_usec - time1.tv_usec); #define judgeletter_bit(ch) (((ch)>>6)==1)&&((((ch)-1)&31)<26) #define judgeletter_classic(ch) (((ch)>='A'&&ch<='Z')||((ch)>='a'&&(ch)<='z')) #define judgeletter_table(ch) (*(table + (ch))) int main(int argc, const char *argv[]) { unsigned int ch; int i; int result; unsigned char table[256]; memset(table, 0, 256); for (i = 'A'; i <= 'Z'; i++) { *(table + i) = 1; *(table + i + 'a' - 'A') = 1; } DEFINE_TIME; //Classic printf("classic:"); START_RECORDING; for (i = 0; i < TEST_TIMES; i++) { for (ch = 0; ch < 256; ch++) { result = judgeletter_classic(ch); } } STOP_RECORDING; PRINT_TIME; //Bit printf("bit:"); START_RECORDING; for (i = 0; i < TEST_TIMES; i++) { for (ch = 0; ch < 256; ch++) { result = judgeletter_bit(ch); } } STOP_RECORDING; PRINT_TIME; //Table printf("table:"); START_RECORDING; for (i = 0; i < TEST_TIMES; i++) { for (ch = 0; ch < 256; ch++) { result = judgeletter_table(ch); } } STOP_RECORDING; PRINT_TIME; //ctype printf("isalpha:"); START_RECORDING; for (i = 0; i < TEST_TIMES; i++) { for (ch = 0; ch < 256; ch++) { result = isalpha(ch); } } STOP_RECORDING; PRINT_TIME; return 0; }
我的机器使用gcc 4.4.5,无优化选项编译,运行得到的结果为:
classic:15:701542 bit:11:172520 table:16:4294363296 isalpha:43:442001
显然改进的性能有一定的提升,因为这个任务所需要的计算非常简单,所以查表访问内存(或者缓存)的开销甚至超过了计算所需的时间。