bit fields(位域)指定struct,union和class(C++)中每个元素的位宽度,而不是按照该元素类型的缺省长度存储。由于位域中的元素不是缺省长度,因此在访问该元素时,不能用指针指向该元素来访问它。
由于bit fields的存储细节和对齐方式取决于不同的编译器,因此下面的讨论中使用的是gcc(版本4.2.4),测试输出都是在little-endian系统(比如x86)的结果。
1. 位域和endianness:
由于一般情况下,位宽度不会超过该元素的缺省长度,因此在解释内存中数据时,原子单元(atomic unit)将是bit field元素的位宽度和1-byte的最小值。比如看下面的联合体u:
union {
struct {
unsigned char a:2;
unsigned char b:3;
unsigned char c:3;
} x;
unsigned char d;
} u;
如果设置u.d=150, 由于150的二进制表示是10010110,如果在little-endian系统中,结构体x中c是MSB,而a是LSB,所以x.c=100=0x4, x.b=101=0x5, x.a=10=0x2。
如果某个元素成员的位宽度等于1, 而且该元素是一个有符号数,那么它的取值范围就是长度等于1-bit的二进制数用2的补码表示的范围,即-1~0。所以,如果该元素位置的bit位为1, 则它的值为-1,如果bit位为0,则它的值为0。比如,
union {
struct {
int a:1; //for signed one-bit field type, it can only be 0 or -1 in two's complement!
int b:2;
int c:3;
} x;
int d;
} v;
这里结构体x中的元素都是有符号整数, 如果设置v.d=1,那么在little-endian系统中会将x.a的bit置为1(因为x.a是LSB),所以x.a=-1,而不是1!
对于结构体元素总宽度等于多个字节的情况,看下面的例子,
union {
unsigned val;
unsigned char byte[4];
struct {
unsigned short a : 4;
unsigned short b : 4;
unsigned short c : 4;
unsigned short d : 4;
unsigned e : 8;
unsigned f : 2;
unsigned g : 2;
unsigned h : 2;
unsigned i : 2;
} st;
} w;
这里结构体st总长度是4 bytes,如果设置成员元素的值如下:
w.st.a=0xA;
w.st.b=0xB;
w.st.c=0xC;
w.st.d=0xD;
w.st.e=0x56;
w.st.f=0x0;
w.st.g=0x1;
w.st.h=0x2;
w.st.i=0x3;
在little-endian系统中,最有效字节(MSB)在高地址, 这样按有效字节从高到低的顺序是:
0x3 0x2 0x1 0x0 0x56 0xD 0xC 0xB 0xA
转换为二进制表示(注意每个元素的位宽度)就是:
11-10-01-00-01010110-1101-1100-1011-1010
其中e的原子单元宽度等于min(1,1)=1 byte,1-byte是系统的缺省原子单元宽度,e的实际宽度也是1-byte。所以0x56是一个整体,不能分开。
如果用上面的二进制表示一个整数的话,刚好就是从MSB到LSB,因此通过合并相邻的4个bits,得到下面的整数(=w.val):
1110.0100.0101.0110.1101.1100.1011.1010 = 0xE4 56 DC BA
但是在little-endian系统中的存储是将LSB存到低地址,因此内存中w.val的表示是(假设左边是低地址,右边是高地址):
0xBA DC 56 E4
那么w.byte值如果确定呢?
w.byte是单字节数组,因此就是内存中的字节顺序,即:
w.byte[0]=0xBA
w.byte[1]=0xDC
w.byte[2]=0x56
w.byte[3]=0xE4
如果结构体中bit宽度没有上例中那么均匀(上例中宽度都是2^k),处理方式也是类似的,看下面的例子:
union {
unsigned short val;
unsigned char byte[2];
struct {
unsigned short a : 1;
unsigned short b : 2;
unsigned short c : 3;
unsigned short d : 4;
unsigned short e : 5;
} st;
} z;
这里st的有效宽度是15-bits,需要1-bit填充(关于bit fields字节码对齐的规则参考后面的介绍)。所以sizeof(z.st)是2字节。
如果设置如下值:
z.st.a=0x1;
z.st.b=0x2;
z.st.c=0x3;
z.st.d=0x4;
z.st.e=0x5;
还是按最高有效位到最低有效位排列得到(最左边一个bit是填充位):
0x5 0x4 0x3 0x2 0x1 = 0-00101-0100-011-10-1
按照4-bit合并(不够4-bit补0):
0001.01-01.00-01.1-10-1 = 0x15 1D
所以z.val=0x151D, 在little-endian系统中内存组织方式是(假设左边是低地址,右边是高地址):
0x1D 15
所以:
z.byte[0]=0x1D
z.byte[1]=0x15
如果用C语言的指针转换,而不用union,也可以参照上面例子中的做法。例如,
typedef struct{
unsigned int a:4;
unsigned int b:4;
unsigned int c:8;
unsigned int d:16;
} S;
char data[]={0x12,0x34,0x56,0x78};
S *s = (S*)data;
首先确定S的大小:有效宽度是4字节,没有填充。
显然,s指向的地址中存放的内容是(假设左边是低地址,右边是高地址):
0x12 34 56 78
这个顺序与endianness无关(因为假设系统的缺省原子单元是1 byte)。
d的值是最高地址的2个字节,但是在little-endian系统中0x56是LSB,0x78是MSB,所以s->d=0x7856; c是一个字节,所以s->c=0x34;
如何确定a和b的值呢?
现在原子单元是4-bit(而不是系统缺省的1-byte)。因为0x12=0001-0010,而且0x1是MSB,所以存储在高地址,得到s->b=0x01, s->a=0x02。
2. 位域和字节码对齐:
bit fields中,如果某个元素指定的宽度小于它的缺省宽度,就不存在该元素的对齐(即可以跨越该元素缺省长度范围的边界)。但是struct本身还是需要设定一个对齐值,这样做是为了让该结构体的数组类型能够对齐数组元素,结构体的对齐方式还是使用上部分中介绍的规则(即使用元素的最大对齐宽度,如果有#pragma pack(n)编译指令,再取二者最小值)。还是用例子可以很好地说明问题。
例1:
struct {
unsigned short a;
int :0;
char b;
} t;
其中int :0; 表示后面的元素b需要按整数对齐宽度对齐,因此a后面需要填充2个字节。t的对齐宽度仍然使用short的对齐宽度值2(而不是int的对齐值)。所以需要在b后面填充1个字节,sizeof(t)=2+2+1+1=6。
例2:
struct {
unsigned char a:2;
int b;
} t;
b使用缺省长度,需要对齐,所以sizeof(t)=4+4=8。其中a的有效位是2 bits,其余4*8-2=30个bits为填充位。
例3:
struct {
unsigned char a:2;
int b:2;
} t;
a和b的有效位是4 bits,结构体的对齐值是int的对齐值(4字节)。所以sizeof(t)=4, 填充位数是4*8-4=28。
例4:
#pragma pack(1)
struct {
unsigned short a;
int :0;
char b;
} t;
首先由于有int :0,所以a的后面需要填充2个字节,让b按整数方式对齐。但是因为有了#pragma pack(1),b的后面不需要填充字节。所以sizeof(t)=5。
例5:
#pragma pack(2)
struct {
unsigned char a:2;
int b;
} t;
b的缺省对齐值是4,但是#pragma pack(2)指定为2,所以b的实际对齐值=min(2,4)=2。同样结构体t的对齐值等于min(2,4)=2。a的有效宽度是2 bits,需要填充2*8-2=14 bits。所以
sizeof(t)=2+4=6 bytes。
例6:
#pragma pack(2)
struct {
unsigned short a:15;
unsigned int b:2;
} st;
这里a和b的有效位是17 bits,但是st的对齐值等于min(2,4)=2,所以sizeof(st)=4,填充位数是4*8-17=15 bits。
例7:
#pragma pack(4)
struct T{
unsigned char a :3;
unsigned char b :1;
unsigned char c :4;
unsigned char d :5;
unsigned char e :3;
unsigned int f :24;
} t;
这里a~f的有效宽度是5 bytes,但是t的对齐值等于min(4,4)=4,所以在f后面填充3个字节,得到sizeof(t)=8。
以下是上面部分例子的程序代码:
/* copyrighted 2011 ljsspace; bitfields.c */
#include <stdio.h>
#include <memory.h>
const int i = 1; //big-endian: 00-00-00-01; little-endian: 01-00-00-00
#define bigendian() ( (*(char*)&i) == 0 )
#pragma pack(2)
struct {
unsigned short a:15;
unsigned int b:2;
} st;
#pragma pack(4)
struct T{
unsigned char a :3;
unsigned char b :1;
unsigned char c :4;
unsigned char d :5;
unsigned char e :3;
unsigned int f :24;
} t;
typedef struct{
unsigned int a:4;
unsigned int b:4;
unsigned int c:8;
unsigned int d:16;
} S;
union {
struct {
unsigned char a:2;
unsigned char b:3;
unsigned char c:3;
} x;
unsigned char d;
} u;
union {
struct {
int a:1; //for signed one-bit field type, it can only be 0 or -1 in two's complement!
int b:2;
int c:3;
} x;
int d;
} v;
union {
unsigned val;
unsigned char byte[4];
struct {
unsigned short a : 4;
unsigned short b : 4;
unsigned short c : 4;
unsigned short d : 4;
unsigned e : 8;
unsigned f : 2;
unsigned g : 2;
unsigned h : 2;
unsigned i : 2;
} st;
} w;
union {
unsigned short val;
unsigned char byte[2];
struct {
unsigned short a : 1;
unsigned short b : 2;
unsigned short c : 3;
unsigned short d : 4;
unsigned short e : 5;
} st;
} z;
int main() {
//test st:
int i;
printf ("Size of st is %d\n",
sizeof (st));
printf ("Size of T is %d\n",
sizeof (struct T));
unsigned char *ptr = (char *) &st; // byte pointer
memset(ptr,0,sizeof(st));
st.a=0x1234;
st.b=0x3;
printf("\nst -- ");
if(bigendian())
printf("By BIG-ENDIAN, ");
else
printf("By LITTLE-ENDIAN, ");
printf("st in bytes(hex): ");
for (i=0; i < sizeof(st); i++)
printf("%02X ", ptr[i]);
printf("\n");
//test T:
ptr = (char *) &t; // byte pointer
memset(ptr,0,sizeof(t));
t.a=0x5; //101
t.b=0x1; //1
t.c=0x9; //1001
t.d=0x10; //10000
t.e=0x3; //011
t.f=0x123456;
printf("\nt -- ");
if(bigendian())
printf("By BIG-ENDIAN, ");
else
printf("By LITTLE-ENDIAN, ");
printf("t in bytes(hex): ");
for (i=0; i < sizeof(t); i++)
printf("%02X ", ptr[i]);
printf("\n");
//test S:
printf ("\nS -- Size of S is %d\n",
sizeof (S));
if(bigendian())
printf("By BIG-ENDIAN: \n");
else
printf("By LITTLE-ENDIAN: \n");
char data[]={0x12,0x34,0x56,0x78};
S *s = (S*)data;
printf("a=0x%02x\n" ,s->a);
printf("b=0x%02x\n" ,s->b);
printf("c=0x%02x\n" ,s->c);
printf("d=0x%02x\n" ,s->d);
//test u:
printf("\nu -- ");
if(bigendian())
printf("By BIG-ENDIAN: \n");
else
printf("By LITTLE-ENDIAN: \n");
u.d=150; //1001 0110
printf("a=0x%02x\n" ,u.x.a);
printf("b=0x%02x\n" ,u.x.b);
printf("c=0x%02x\n" ,u.x.c);
//test v:
printf("\nv -- ");
if(bigendian())
printf("By BIG-ENDIAN: \n");
else
printf("By LITTLE-ENDIAN: \n");
v.d=1;
printf("a=%d\n" ,v.x.a);
printf("b=%d\n" ,v.x.b);
printf("c=%d\n" ,v.x.c);
//test w:
printf ("\nw -- Size of w is %d\n",
sizeof (w));
w.st.a=0xA;
w.st.b=0xB;
w.st.c=0xC;
w.st.d=0xD;
w.st.e=0x56;
w.st.f=0x0;
w.st.g=0x1;
w.st.h=0x2;
w.st.i=0x3;
//cannot take address of bit-field ‘c’
//unsigned short* ptr = &u.st.c;
printf("w.val=0x%X\n", w.val);
if(bigendian())
printf("By BIG-ENDIAN, ");
else
printf("By LITTLE-ENDIAN, ");
printf("w in bytes(hex): ");
for (i=0; i < sizeof(w); i++)
printf("%02X ", w.byte[i]);
printf("\n");
//test z:
printf ("\nz -- Size of z is %d\n",
sizeof (z));
printf ("Size of z.st is %d\n\n",
sizeof (z.st));
z.st.a=0x1;
z.st.b=0x2;
z.st.c=0x3;
z.st.d=0x4;
z.st.e=0x5;
printf("z.val=0x%X\n", z.val);
if(bigendian())
printf("By BIG-ENDIAN, ");
else
printf("By LITTLE-ENDIAN, ");
printf("z in bytes(hex): ");
for (i=0; i < sizeof(z); i++)
printf("%02X ", z.byte[i]);
printf("\n");
}
测试输出:
Size of st is 4
Size of T is 8
st -- By LITTLE-ENDIAN, st in bytes(hex): 34 92 01 00
t -- By LITTLE-ENDIAN, t in bytes(hex): 9D 70 56 34 12 00 00 00
S -- Size of S is 4
By LITTLE-ENDIAN:
a=0x02
b=0x01
c=0x34
d=0x7856
u -- By LITTLE-ENDIAN:
a=0x02
b=0x05
c=0x04
v -- By LITTLE-ENDIAN:
a=-1
b=0
c=0
w -- Size of w is 4
w.val=0xE456DCBA
By LITTLE-ENDIAN, w in bytes(hex): BA DC 56 E4
z -- Size of z is 2
Size of z.st is 2
z.val=0x151D
By LITTLE-ENDIAN, z in bytes(hex): 1D 15