--------------------------------------------------------------------------------------------------
Author: YuManZI
2014/06/23 1.1-3.5
2014/06/24 3.6-3.8
2014/06/27 4.1
2014/06/28 4.2-4.5
--------------------------------------------------------------------------------------------------
1. Information Storage
1.1 Virtual Memory: a machine-level program views memory as a very large array of bytes.
1.2 Data Sizes (bytes)
32 bit: char 1; short [int] 2; int 4; long [int] 4; long long [int] 8; char *(any pointer) 4; float 4; double 8;
64 bit: char 1; short [int] 2; int 4; long [int]8; long long [int] 8; char *(any pointer)8; float 4; double 8;
Contents within square brackets [] are optional. The main difference between 32 bit and 64 bit machines are following two points: a) different sizes of data type long; b) different sizes of pointers.
1.3 Byte Ordering: big endian & little endian. (Takinh 0x01234567 as example.)
Big endian, the most significant byte comes first(lower address), bytes from low address to high address are 01 23 45 67, respectively;
Little endian, the least significant byte comes first(lower address), bytes from low address to high address are 67 45 23 01, respectively.
1.4 Shift Operations in C
Left shift << k: dropping off the k most significant bits and filling the right end with k zero;
Logical right shift >>k: dropping off the k least significant bits and filling the lest end with k zero;
Arithmetic right shift >>k: dropping off the k least significant bits and filling the lest end with k themost significant bit.
Arithmetic right shift uses the most significant bit as filling unit, because of the two's complement representation of negative integers. In some languages(e.g. java), the number of shifting bits can never be more than bit sizes of data types.
2. Integral Data Types
2.1 Unsigned Encodings (w bits, x=[x_(w-1), x_(w-2),...,x_0])
B2U(x)=sigma{i=[0,w-1]}(x_i*2^i)
B2U means Bits to Unsigned
It can represent integers between [0..2^w-1]
2.2 Two's-Complement Encodings (same setting as 2.1)
B2T(x)=-x_(w-1)*2^(w-1) + sigma{i=[0,w-2]}(x_i*2^i)
B2T means Bits to Two's
It's a signed encoding, can represent integers between [-2^(w-1), 2^(w-1)-1], the difference between it and Unsigned encoddings are the weight of the significant bit, i.e. positive for unsigned and negative for two's-complement.
2.3 Conversions
Signed<-->Unsigned with identical size: the key is to keep bit representation stable;
Large size-> Small size with same type of signed or unsigned: truncate directly;
Small size -> Large size with same type of signed or unsigned: fill 0 or the significant bit at left end for unsigned or signed, respectively;
Large size-> Small size with different types of signed and unsigned, respectively: transfer to small size according to rule 2 first and then convert according rule 1.
Small size -> Large size with different types of signed and unsigned, respectively: transfer to large size according to rule 3 first and then convert according rule 1.
2.4 Expanding the Bit Representation of a Number, two points:
a) Numbers will be regarded as a signed integers;
b) if an expression involves both types (i.e. signed and unsigned), all operands will be converted to unsigned first, followed by computing them.
2.5 Advice on Signed and Unsigned
A mixing use of signed data and unsigned data may cause some subtle errors.Always using signed data is a good habit. Indeed, some languages(e.g. java) do not support unsigned data types, as they think the benefits offering by signed data types are less than the dangers they may introduce.
3. Integer Arithmetic
3.1 Unsigned Addition (2 w-bits unsigned int x & y)
x + y = B2U(U2B(x) + U2B(y))
Overflow: x + y >= 2^w, then sum = x + y - 2^w.
Result on (w+1)th bit will be discarded. Overflow flag: sum < x && sum < y
3.2 Two's-Complement Addition (2 w-bits signed int x & y)
Principle: add x and y as adding two bit vectors, and interpret the truncated result as signed int.
x + y = B2T(T2B(x) + T2B(y)) = U2T(T2U(x) + T2U(y))
Three conditions:
Negative overflow: -2^(w-1) <= x + y <= 2^(w-1) - 1, then sum = x + y + 2^w;
Normal: -2^(w-1) <= x + y < 2^(w-1), then sum = x + y;
Positive overflow: 2^(w-1) <= x + y <= 2^w - 2, then sum = x + y - 2^w.
Discussion: the bit-level representation of addition operation is identical for both unsigned and two's-complement addition, but different interpretation of the result.
3.3 Two's-Complement Negation
For w-bits signed type, representable integers are within [-2^(w-1), 2^(w-1)-1]. when x = -2^(w-1), -x = -2^(w-1) = x(nonintuitive);
while for other x \in [-2^(w-1) + 1, 2^(w-1) - 1], -x = -x.
There are other vulnerabilities caused by the asymmetric bounds of signed data types.
Bit-level representation of two's-complement negation: complement the bits and then increment the result.
3.4 Unsigned Multiplication (2 w-bits unsigned int x & y)
Overflow: x * y >= 2^w, then mul = (x * y) % 2^w.
3.5 Two's-Complement Multiplication
Principle: multiple x and y as two bit vectors, and interpret the truncated result as signed int.
x * y = B2T((T2B(x) * T2B(y)) % 2^w) = U2T((T2U(x) * T2U(y)) % 2^w)
Discussion: the bit-level representation of product operation is identical for both unsigned and two's-complement multiplication, but different interpretation of the result.
3.6 Division & Modulo on Negative Integers
x = 7, y = 2, div = 3, mod = 1; // down round
x = 7, y = -2, div = -3, mod = 1; // up round
x = -7, y = 2, div = -3, mod = -1; // up round, while -7 >> 1 = 1001 >> 1 = 1100 = -4, directly shift isincorrect
x = -7, y = -2, div = 3, mod = -1. // down round
Conclusion: when real result is less than 0, division result is up round of real result; when real result is equal to or larger than 0, division result isdown round of real result. (trick: all absolute values of any division combination are same, 3 in the above case)
3.7 Multiplication & Division and Shift
A multiplication operation requires 10 or more clock cycles, while addition and shift only require 1 clock cycle. It indicates that we can instead constant multiplication by a combination of shift and addition. Indeed ,many compilers try to do preceding task as multiplication optimization.
A division operation is even slower than multiplication, requiring 30 or more clock cycles. Dividing by powers of two can be replaced by right shift operation.
(x < 0 ? x + (1 << k) + 1 : x) >> k; the bold section play the role of bias to keep final result correct in the situation where x is negative.
3.8 Conclusion on Integer Arithmetic
a) regard subtraction operation as a combination of negation and addition;
b) the addition, subtraction and multiplication operations on unsigned arithmetic have the exact same effect as addition, subtraction and multiplication on two's-complement at the bit level, respectively.Simply compute at bit level and interpret the truncated result according to the specific encoding.
c) integer arithmetic is a form of modular arithmetic, due to the finite word size used to representing integers.
4 Floating Point
4.1 IEEE Floating-Point Representation
IEEE floating-point standard: V = (-1)^s * M * 2^E, where s is sign bit, M is a fractional binary number whose numeric value between 1 and 2- or between 0 and 1-, and E is exponent number of power 2. (1- and 2- means values that approximate to 1 or 2, and always less than 1 or 2)
float(32) = s(1) + E(8) + M(23), the order of three components is indeed the bit sequence of IEEE floating-point bit representation, i.e. s_0 e_(k-1) ... e_0 M_(n-1) ... M_0, where k and n are numbers of bits of E and M, respectively;
double(64) = s(1) + E(11) + M(52).
There are mainly two kinds of cases of floating-point numeric value computation:
Normalized Cases and E & M Computation: for cases where E(8) or E(11) is neither all zero nor all one, where k = 8 for float and 11 for double precision, numeric valueE = e - Bias = (unsigned) e_(k-1)...e_0 - (2^(k-1) - 1). This yields exponent ranges from -126 to 127 for float, and from -1022 to 1023 for double precision,M = 1.f_(n-1)...f(0).
Special Cases: a) E is all zero, it represents numeric value 0 or near 0, andE = 1 - Bias, M = 0.f_(n-1)...f(0) (note the difference between Normalized Cases, which has an implied leading 1); b) E is all one and M is all zero, it represents numeric value infinity; c) E is all one and M is nonzero, it represents NaN (Not a Number);
A interesting Property: IEEE format was designed so that floating-point numbers could be sorted using an integer sorting routine. Detailed analysis is omitted.
4.2 Rounding
Four rounding modes. Task: round floating-point number x to x'.
Round-to-even: finding a representable value x' s.t. minimize |x' - x|, and for exactly halfway value, making the least significant bit be even.
Round-toward-zero: finding a representable value x' s.t. |x'| <= |x|;
Round-down: finding a representable value x' s.t. x' = x- <= x ;
Round-up: finding a representable value x' s.t. x' = x+ >= x.
4.3 Floating-point Arithmetic
x + y = Round(x + y)
x * y = Round(x * y)
4.4 Properties of Floating-point and Integer Arithmetic
Integer arithmetic: abelian group with commutativity, associativity and distributivity;
Floating-point arithmetic: abelian group with commutativity but without associativity and distributivity. e.g. (3.14 + 1e10) - 1e10 = 0, while 3.14 + (1e10 - 1e10) = 3.14. In addition, floating-point arithmetic satisfies monotonicity which does not satisfy by integer arithmetic.
4.5 Casting
Only convert int or float to double is safe, other castings may cause either overflow or round, or both two.
Floating-point numbers will be represented as float type by default.