IEEE给出了一套浮点数值的标准,即IEEE 754。该标准给出了单精度(32位)和双精度(64位)浮点数值的表示方法以及如何对其进行操作。本文简单的给出了从二进制的浮点数到十进制浮点数的转换算法。
根据IEEE 754,浮点数的二进制存储格式被分为三个部分:符号位,指数位和数据位。单精度和双精度浮点数的区别除了总位数长度的区别之外就是基于该长度对其三个部分(符号位,指数位和数据位)的划分。
l 单精度
单精度浮点数由32位组成,从左到右,第一位是符号位,后续的8位是指数位,最后23位是数据位,如下:
S EEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFF
1 8 23
对应表示的数值V可以解释为:
if (E =255 and F != 0) then V = NaN(“Not a Number”);
if (E =255 and F =0 and S = 1) then V = -Infinity;
if (E =255 and F =0 and S = 0) then V = Infinity;
if (255>E>0 ) then V = (-1)**S*2**(E-127)*(1.F)
if (E = 0 and F != 0) then V = (-1)**S*2**(-126)*(0.F)
if (E = 0 and F = 0 and S = 1) V = -0;
if(E = 0 and F = 0 and S = 0) V = 0;
例如:
0 00000000 00000000000000000000000 = 0
1 00000000 00000000000000000000000 = -0
0 11111111 00000000000000000000000 = Infinity
1 11111111 00000000000000000000000 = -Infinity
0 11111111 00000100000000000000000 = NaN
1 11111111 00100010001001010101010 = NaN
0 10000000 00000000000000000000000 = +1 * 2**(128-127) * 1.0 = 2
0 10000001 10100000000000000000000 = +1 * 2**(129-127) * 1.101 = 6.5
1 10000001 10100000000000000000000 = -1 * 2**(129-127) * 1.101 = -6.5
0 00000001 00000000000000000000000 = +1 * 2**(1-127) * 1.0 = 2**(-126)
0 00000000 10000000000000000000000 = +1 * 2**(-126) * 0.1 = 2**(-127)
0 00000000 00000000000000000000001 = +1 * 2**(-126) *
0.00000000000000000000001 =
2**(-149) (Smallest positive value)
l 双精度
双精度的浮点数长度是64位,其中一位是符号位,11位是指数位,以及52位的分数位。格式如下:
S EEEEEEEEEEE FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
1 11 52
双精度浮点数对应的十进制数V可以根据下面的算法计算。
If (E=2047 and F !=0) then V=NaN ("Not a number")
If (E=2047 and F =0 and S = 1) then V=-Infinity
If (E=2047 and F = 0 and S = 0) then V=Infinity
If (0
If(E=0 and F != 0) then V=(-1)**S * 2 ** (-1022) * (0.F)
If (E=0 and F = 0 and S = 1) then V=-0
If (E=0 and F =0 and S = 0) then V=0
注:在算法中的S,E,F分别表示相应格式中的如下数据
S = S的数值
E = EEE….E表示的整数
F = (F/2)+(F/4)+(F/8)+…..表示的小数
例如
1 10101010 11110100000000000000000
有
S = 1;
E = 128+32+8+2 = 170
F = 1/2 + 1/4 + 1/8 + 1/16 + 1/64
l 引用
1. ANSI/IEEE Standard 754-1985, Standard for Binary Floating Point Arithmetic