字节顺序

endian

就是字节顺序的意思


big-endian

现在在纸上书写阿拉伯数字198,我们肯定是先写最高位1,直到写到最低位8,如果让计算机采用这个顺序存放数据到内存中,也就是先在低位起始地址放最高位1,然后写上9,最后是8.这样的顺序就是big endian.


little-endian

与其顺序相反的是,低位地址存放低位数字,高位地址存放高位数字。这种最合乎计算机内存地址的写法就是little endian.


如果没有明白我说的,看下面的表。

Endian First byte
(lowest address)
Middle bytes Last byte
(highest address)
Notes
big most significant ... least significant Similar to a number written on paper (in Arabic numerals as used in most Western scripts)
little least significant ... most significant Arithmetic calculation order (see carry propagation)

下面的函数帮助判断是big-endian还是little-endian.  很简单,取最低位字节的整数,比较是不是最低位数字ff,是就是little-endian.

bool IsLittleEndian(){
  short int x = 0x00ff;
  char* p = (char*)&x;
  return (short int)p[0] == -1;
}


另外根据chumine的提醒,下面这个来自Linux内核的实现效率更高:

static union {
    char c[4];
    unsigned char l;
} endian_test = {{'l','?','?','b'}};

#define IsLittleEndian2() \
(endian_test.l == 'l')


第一种方法也可以将x变成静态变量来提高性能,但是还是会多一次p指针的开销。第二种方法用来union结构的优点-内存重用,所以节省了p指针变量。

字节顺序是由CPU体系结构或者其他硬件决定的。下面的描述同样来自于wikipedia

Well-known processors that use the big-endian format include Motorola 6800 and 68k, Xilinx Microblaze, IBM POWER, and System/360 and its successors such as System/370, ESA/390, and z/Architecture. The PDP-10 also used big-endian addressing for byte-oriented instructions.SPARC historically used big-endian until version 9, which is bi-endian, similarly the ARM architecture was little-endian before version 3 when it became bi-endian, and the PowerPC and Power Architecture descendants of IBM POWER are also bi-endian (see below).

Serial protocols may also be regarded as either little or big-endian at the bit- and/or byte-levels (which may differ). Many serial interfaces, such as the ubiquitous USB, are little-endian at the bit-level. Physical standards like RS-232, RS-422 and RS-485 are also typically used with UARTsthat send the least significant bit first, such as in industrial instrumentation applications, lighting protocols (DMX512), and so on. The same could be said for digital current loop signaling systems such as MIDI. There are also several serial formats where the most significant bit is normally sent first, such as I²C and the related SMBus. However, the bit order may often be reversed (or is "transparent") in the interface between the UART or communication controller and the host CPU or DMA controller (and/or system memory), especially in more complex systems and personal computers. These interfaces may be of any type and are often configurable.


CPU的endian有些是允许设置的。因此唯一不变的是不确定性,运行时用上面的函数检查endian才是比较靠谱的做法,不能简单用CPU体系来判断。

浮点数是个比较复杂的问题,并不一定和同一系统的整数字节顺序一致,有的刚好相反。有的还存在half little-endian和half big-endian的组合。将在别的文章中描述。


下面的函数能够将整数转变成big-endian,如果有必要的话。

int16_t ToBigEndian(int16_t value) {
    uint8_t* p = reinterpret_cast<uint8_t*>(&value);
    int16_t v1 = static_cast<int16_t>(p[0]);
    int16_t v2 = static_cast<int16_t>(p[1]);
    if (IsLittleEndian2()) {
        return (v1 << 8) | v2;
    } else {
        return v1 | v2;
    }
}


测试代码:

    int16_t f = 7;
    string h1 = PrintIntAsBinaryString<int16_t>(f);
    cout << "h1:" << h1 << endl;
    
    int16_t f2 = ToBigEndian(f);
    string h2 = PrintIntAsBinaryString<int16_t>(f2);
    cout << "h2:" << h2 << endl;

运行结果:

h1:0000000000000111
h2:0000011100000000

下面是我的字节顺序的工具函数集合:

#ifndef UTIL_ENDIAN_H_
#define UTIL_ENDIAN_H_

#include <boost/cstdint.hpp>
#include <vector>
#include <sstream>

using namespace std;

// Get the bit value specified by the index
// index starts with 0
template<class T>
int Bit_Value(T value, uint8_t index) {
  return (value & (1 << index)) == 0 ? 0 : 1;
}

// T must be one of integer type
template<class T>
string PrintIntAsBinaryString(T v) {
  stringstream stream;
  int i = sizeof(T) * 8 - 1;
  while (i >= 0) {
    stream << Bit_Value(v, i);
    --i;
  }
    
  return stream.str();
}

bool IsLittleEndian() {
  short int x = 0x00ff;
  char* p = (char*)&x;
  return (short int)p[0] == -1;
}

static union {
  char c[4];
  unsigned char l;
} endian_test = {{'l','?','?','b'}};

#define IsLittleEndian2() (endian_test.l == 'l')

// Convert the following integer values to big-endian if necessary
template<class T>
T Int16ToBigEndian(T value) {
  if (IsLittleEndian2()) {
    uint8_t* p = reinterpret_cast<uint8_t*> (&value);
    T v1 = static_cast<T> (p[0]);
    T v2 = static_cast<T> (p[1]);
    return (v1 << 8) | v2;
  } else {
    return value;
  }
}

template<class T>
T Int32ToBigEndian(T value) {
  if (IsLittleEndian2()) {
    uint8_t* p = reinterpret_cast<uint8_t*> (&value);
    T v1 = static_cast<T> (p[0]);
    T v2 = static_cast<T> (p[1]);
    T v3 = static_cast<T> (p[2]);
    T v4 = static_cast<T> (p[3]);
    return (v1 << 24) | (v2 << 16) << (v3 << 8) | v4;
  } else {
    return value;
  }
}


// The following functions convert the byte arrays 
// that has big-endian into integers on local platform

template<class T>
T BigEndianBytesToInt16(vector<uint8_t> const& value) {
  if (IsLittleEndian2()) {
    T h = static_cast<T> (value[0]);
    T l = static_cast<T> (value[1]);
    return (h << 8) | l;
  } else {
    T tmp = 0;
    memcpy(&tmp, &value[0], 2);
    return tmp;
  }
}

template<class T>
T BigEndianBytesToInt32(uint8_t value[4]) {
  if (IsLittleEndian2()) {
    T a = static_cast<T> (value[0]);
    T b = static_cast<T> (value[1]);
    T c = static_cast<T> (value[2]);
    T d = static_cast<T> (value[3]);
    return (a << 24) | (b << 16) | (c << 8) | d;
  } else {
    T tmp = 0;
    memcpy(&tmp, &value[0], 4);
    return tmp;
  }
}

#endif

这个字节顺序转big-endian的代码有bug,请参考正确的代码:

http://stackoverflow.com/questions/2182002/convert-big-endian-to-little-endian-in-c-without-using-provided-func

并且这里有在线工具可以转换,帮助判断自己的算法是否正确。

http://www.darkfader.net/toolbox/convert/

我回头有时间了找出原因,修复bug.



你可能感兴趣的:(IBM,byte,interface,protocols,Standards)