Big-Endian, Little-Endian和字节码对齐方式(下)

bit fields(位域)指定struct,union和class(C++)中每个元素的位宽度,而不是按照该元素类型的缺省长度存储。由于位域中的元素不是缺省长度,因此在访问该元素时,不能用指针指向该元素来访问它。

由于bit fields的存储细节和对齐方式取决于不同的编译器,因此下面的讨论中使用的是gcc(版本4.2.4),测试输出都是在little-endian系统(比如x86)的结果。

1. 位域和endianness:
由于一般情况下,位宽度不会超过该元素的缺省长度,因此在解释内存中数据时,原子单元(atomic unit)将是bit field元素的位宽度和1-byte的最小值。比如看下面的联合体u:

union {
    struct {
      unsigned char a:2;
      unsigned char b:3;
      unsigned char c:3;
    } x;
    unsigned char d;
} u;
如果设置u.d=150, 由于150的二进制表示是10010110,如果在little-endian系统中,结构体x中c是MSB,而a是LSB,所以x.c=100=0x4, x.b=101=0x5, x.a=10=0x2。

如果某个元素成员的位宽度等于1, 而且该元素是一个有符号数,那么它的取值范围就是长度等于1-bit的二进制数用2的补码表示的范围,即-1~0。所以,如果该元素位置的bit位为1, 则它的值为-1,如果bit位为0,则它的值为0。比如,
union {
    struct {
      int a:1; //for signed one-bit field type, it can only be 0 or -1 in two's complement!
      int b:2;
      int c:3;
    } x;
    int d;
} v;
这里结构体x中的元素都是有符号整数, 如果设置v.d=1,那么在little-endian系统中会将x.a的bit置为1(因为x.a是LSB),所以x.a=-1,而不是1!

对于结构体元素总宽度等于多个字节的情况,看下面的例子,
union {
        unsigned val;
        unsigned char byte[4];
        struct {
            unsigned short a : 4;
            unsigned short b : 4;
            unsigned short c : 4;
            unsigned short d : 4;
            unsigned e : 8;
            unsigned f : 2;
            unsigned g : 2;
            unsigned h : 2;
            unsigned i : 2;
        } st;
} w;
这里结构体st总长度是4 bytes,如果设置成员元素的值如下:
   w.st.a=0xA;
   w.st.b=0xB;
   w.st.c=0xC;
   w.st.d=0xD;
   w.st.e=0x56;
   w.st.f=0x0;
   w.st.g=0x1;
   w.st.h=0x2;
   w.st.i=0x3;
在little-endian系统中,最有效字节(MSB)在高地址, 这样按有效字节从高到低的顺序是:
0x3 0x2 0x1 0x0 0x56 0xD 0xC 0xB 0xA
转换为二进制表示(注意每个元素的位宽度)就是:
11-10-01-00-01010110-1101-1100-1011-1010
其中e的原子单元宽度等于min(1,1)=1 byte,1-byte是系统的缺省原子单元宽度,e的实际宽度也是1-byte。所以0x56是一个整体,不能分开。
如果用上面的二进制表示一个整数的话,刚好就是从MSB到LSB,因此通过合并相邻的4个bits,得到下面的整数(=w.val):
1110.0100.0101.0110.1101.1100.1011.1010 = 0xE4 56 DC BA
但是在little-endian系统中的存储是将LSB存到低地址,因此内存中w.val的表示是(假设左边是低地址,右边是高地址):
0xBA DC 56 E4
那么w.byte值如果确定呢?
w.byte是单字节数组,因此就是内存中的字节顺序,即:
w.byte[0]=0xBA
w.byte[1]=0xDC
w.byte[2]=0x56
w.byte[3]=0xE4


如果结构体中bit宽度没有上例中那么均匀(上例中宽度都是2^k),处理方式也是类似的,看下面的例子:
union {
    unsigned short val;
    unsigned char byte[2];
    struct {
        unsigned short a   : 1;
        unsigned short b   : 2;
        unsigned short c   : 3;
        unsigned short d   : 4;
        unsigned short e   : 5;
    } st;
} z;
这里st的有效宽度是15-bits,需要1-bit填充(关于bit fields字节码对齐的规则参考后面的介绍)。所以sizeof(z.st)是2字节。
如果设置如下值:
   z.st.a=0x1;
   z.st.b=0x2;
   z.st.c=0x3;
   z.st.d=0x4;
   z.st.e=0x5;
还是按最高有效位到最低有效位排列得到(最左边一个bit是填充位):
0x5 0x4 0x3 0x2 0x1 = 0-00101-0100-011-10-1
按照4-bit合并(不够4-bit补0):
0001.01-01.00-01.1-10-1 = 0x15 1D
所以z.val=0x151D, 在little-endian系统中内存组织方式是(假设左边是低地址,右边是高地址):
0x1D 15
所以:
z.byte[0]=0x1D
z.byte[1]=0x15


如果用C语言的指针转换,而不用union,也可以参照上面例子中的做法。例如,
typedef struct{
    unsigned int a:4;
    unsigned int b:4;
    unsigned int c:8;
    unsigned int d:16;
} S;
char data[]={0x12,0x34,0x56,0x78};
S *s = (S*)data;
首先确定S的大小:有效宽度是4字节,没有填充。
显然,s指向的地址中存放的内容是(假设左边是低地址,右边是高地址):
0x12 34 56 78
这个顺序与endianness无关(因为假设系统的缺省原子单元是1 byte)。
d的值是最高地址的2个字节,但是在little-endian系统中0x56是LSB,0x78是MSB,所以s->d=0x7856; c是一个字节,所以s->c=0x34;
如何确定a和b的值呢?
现在原子单元是4-bit(而不是系统缺省的1-byte)。因为0x12=0001-0010,而且0x1是MSB,所以存储在高地址,得到s->b=0x01, s->a=0x02。


2. 位域和字节码对齐:
bit fields中,如果某个元素指定的宽度小于它的缺省宽度,就不存在该元素的对齐(即可以跨越该元素缺省长度范围的边界)。但是struct本身还是需要设定一个对齐值,这样做是为了让该结构体的数组类型能够对齐数组元素,结构体的对齐方式还是使用上部分中介绍的规则(即使用元素的最大对齐宽度,如果有#pragma pack(n)编译指令,再取二者最小值)。还是用例子可以很好地说明问题。

例1:
struct {
  unsigned short a;
  int :0;  
  char b;
} t;
其中int :0;  表示后面的元素b需要按整数对齐宽度对齐,因此a后面需要填充2个字节。t的对齐宽度仍然使用short的对齐宽度值2(而不是int的对齐值)。所以需要在b后面填充1个字节,sizeof(t)=2+2+1+1=6。


例2:
struct {
  unsigned char a:2;
  int b;
} t;
b使用缺省长度,需要对齐,所以sizeof(t)=4+4=8。其中a的有效位是2 bits,其余4*8-2=30个bits为填充位。

例3:
struct {
  unsigned char a:2;
  int b:2;
} t;
a和b的有效位是4 bits,结构体的对齐值是int的对齐值(4字节)。所以sizeof(t)=4, 填充位数是4*8-4=28。

例4:
#pragma pack(1)
struct {
  unsigned short a;
  int :0;
  char b;
} t;
首先由于有int :0,所以a的后面需要填充2个字节,让b按整数方式对齐。但是因为有了#pragma pack(1),b的后面不需要填充字节。所以sizeof(t)=5。


例5:
#pragma pack(2)
struct {
  unsigned char a:2;
  int b;
} t;
b的缺省对齐值是4,但是#pragma pack(2)指定为2,所以b的实际对齐值=min(2,4)=2。同样结构体t的对齐值等于min(2,4)=2。a的有效宽度是2 bits,需要填充2*8-2=14 bits。所以
sizeof(t)=2+4=6 bytes。

例6:
#pragma pack(2)
struct {
    unsigned short a:15;
    unsigned int b:2;
} st;
这里a和b的有效位是17 bits,但是st的对齐值等于min(2,4)=2,所以sizeof(st)=4,填充位数是4*8-17=15 bits。


例7:
#pragma pack(4)
struct T{
unsigned char a :3;
unsigned char b :1;
unsigned char c :4;
unsigned char d :5;
unsigned char e :3;
unsigned int f :24;
} t;
这里a~f的有效宽度是5 bytes,但是t的对齐值等于min(4,4)=4,所以在f后面填充3个字节,得到sizeof(t)=8。


以下是上面部分例子的程序代码:

/* copyrighted 2011 ljsspace;  bitfields.c */

#include <stdio.h>
#include <memory.h>

const int i = 1; //big-endian: 00-00-00-01; little-endian: 01-00-00-00
#define bigendian() ( (*(char*)&i) == 0 )


#pragma pack(2)
struct {
    unsigned short a:15;
    unsigned int b:2;
} st;

#pragma pack(4)
struct T{
unsigned char a :3;
unsigned char b :1;
unsigned char c :4;
unsigned char d :5;
unsigned char e :3;
unsigned int f :24;
} t;

typedef struct{
    unsigned int a:4;
    unsigned int b:4;
    unsigned int c:8;
    unsigned int d:16;
} S;

union {
    struct {
      unsigned char a:2;
      unsigned char b:3;
      unsigned char c:3;
    } x;
    unsigned char d;
} u;

union {
    struct {
      int a:1; //for signed one-bit field type, it can only be 0 or -1 in two's complement!
      int b:2;
      int c:3;
    } x;
    int d;
} v;

union {
        unsigned val;
        unsigned char byte[4];
        struct {
            unsigned short a : 4;
            unsigned short b : 4;
            unsigned short c : 4;
            unsigned short d : 4;
            unsigned e : 8;
            unsigned f : 2;
            unsigned g : 2;
            unsigned h : 2;
            unsigned i : 2;
        } st;
} w;

union {
	unsigned short val;
	unsigned char byte[2];
	struct {
		unsigned short a   : 1;
		unsigned short b   : 2;
		unsigned short c   : 3;
		unsigned short d   : 4;
		unsigned short e   : 5;
	} st;
} z;

int main() { 
   //test st:
   int i;
   printf ("Size of st is %d\n",
          sizeof (st));
   printf ("Size of T is %d\n",
          sizeof (struct T));

   unsigned char *ptr = (char *) &st; // byte pointer 
   memset(ptr,0,sizeof(st));
 
   st.a=0x1234;
   st.b=0x3;

   printf("\nst -- ");
   if(bigendian())
     printf("By BIG-ENDIAN, ");
   else
     printf("By LITTLE-ENDIAN, ");
    
   
   printf("st in bytes(hex): ");
   for (i=0; i < sizeof(st); i++)
        printf("%02X ", ptr[i]);
   printf("\n");


   //test T:
   ptr = (char *) &t; // byte pointer 
   memset(ptr,0,sizeof(t));
 
   t.a=0x5; //101
   t.b=0x1;  //1
   t.c=0x9; //1001
   t.d=0x10; //10000
   t.e=0x3; //011
   t.f=0x123456;
   
   
   printf("\nt -- ");
   if(bigendian())
     printf("By BIG-ENDIAN, ");
   else
     printf("By LITTLE-ENDIAN, ");
    
   
   printf("t in bytes(hex): ");
   for (i=0; i < sizeof(t); i++)
        printf("%02X ", ptr[i]);
   printf("\n");

   //test S:
   printf ("\nS -- Size of S is %d\n",
          sizeof (S));
   
   if(bigendian())
     printf("By BIG-ENDIAN: \n");
   else
     printf("By LITTLE-ENDIAN: \n");
   char data[]={0x12,0x34,0x56,0x78};
   S *s = (S*)data;
   printf("a=0x%02x\n" ,s->a);
   printf("b=0x%02x\n" ,s->b);
   printf("c=0x%02x\n" ,s->c);
   printf("d=0x%02x\n" ,s->d);
   
   //test u:
   printf("\nu -- ");
   if(bigendian())
     printf("By BIG-ENDIAN: \n");
   else
     printf("By LITTLE-ENDIAN: \n");
   u.d=150;  //1001 0110
   printf("a=0x%02x\n" ,u.x.a);
   printf("b=0x%02x\n" ,u.x.b);
   printf("c=0x%02x\n" ,u.x.c);

   //test v:
   printf("\nv -- ");
   if(bigendian())
     printf("By BIG-ENDIAN: \n");
   else
     printf("By LITTLE-ENDIAN: \n");
   v.d=1;
   printf("a=%d\n" ,v.x.a);  
   printf("b=%d\n" ,v.x.b);
   printf("c=%d\n" ,v.x.c);

   //test w:
   printf ("\nw -- Size of w is %d\n",
          sizeof (w));
      
   w.st.a=0xA;
   w.st.b=0xB;
   w.st.c=0xC;
   w.st.d=0xD;
   w.st.e=0x56;
   w.st.f=0x0;
   w.st.g=0x1;
   w.st.h=0x2;
   w.st.i=0x3;
   //cannot take address of bit-field ‘c’
   //unsigned short*  ptr = &u.st.c;

   printf("w.val=0x%X\n", w.val);


   if(bigendian())
     printf("By BIG-ENDIAN, ");
   else
     printf("By LITTLE-ENDIAN, ");
     
   printf("w in bytes(hex): ");
   for (i=0; i < sizeof(w); i++)
        printf("%02X ", w.byte[i]);
   printf("\n");
 
   //test z:
   printf ("\nz -- Size of z is %d\n",
          sizeof (z));
   printf ("Size of z.st is %d\n\n",
          sizeof (z.st));


   z.st.a=0x1;
   z.st.b=0x2;
   z.st.c=0x3;
   z.st.d=0x4;
   z.st.e=0x5; 

   printf("z.val=0x%X\n", z.val);


   if(bigendian())
     printf("By BIG-ENDIAN, ");
   else
     printf("By LITTLE-ENDIAN, ");
     
   printf("z in bytes(hex): ");
   for (i=0; i < sizeof(z); i++)
        printf("%02X ", z.byte[i]);
   printf("\n");


}
   



测试输出:
Size of st is 4
Size of T is 8

st -- By LITTLE-ENDIAN, st in bytes(hex): 34 92 01 00

t -- By LITTLE-ENDIAN, t in bytes(hex): 9D 70 56 34 12 00 00 00

S -- Size of S is 4
By LITTLE-ENDIAN:
a=0x02
b=0x01
c=0x34
d=0x7856

u -- By LITTLE-ENDIAN:
a=0x02
b=0x05
c=0x04

v -- By LITTLE-ENDIAN:
a=-1
b=0
c=0

w -- Size of w is 4
w.val=0xE456DCBA
By LITTLE-ENDIAN, w in bytes(hex): BA DC 56 E4

z -- Size of z is 2
Size of z.st is 2

z.val=0x151D
By LITTLE-ENDIAN, z in bytes(hex): 1D 15



你可能感兴趣的:(c,struct,存储,byte,hex,编译器)