Chapter 2.1-2.2
Chapter 2 mainly introduces the built-in types and the mechanisms for defining a class. Chapter 2.1 and 2.2 focus on built-in types and variables. The characters of built-in types are closely tied to their representation on the machine’s hardware. Once we find out how the computer hardware works, we can understand these complicated rules on using built-in types.
The arithmetic types are decided into two categories: integral types (which include character and boolean types) and floating-point types.
It should be noticed that character and boolean types belong to integral types.
The size of–that is, the number of bits–in arithmetic types varies accross machines.
Although the size different from machine to machine, there are still a few principles that every machine should follow:
A char is the same size as a single machine byte
The language garantees that an int will be at least as large as short, a long at least as large as int, and long long at least as large as long.
The size of the types is determined by the compiler. According to C90/C99 Note 2,
A byte is composed of a contiguous sequence of bits, the number of which is implementation defined.
Tha language provides a set of fundamental built-in types such as int and char, which are closely tied to their representation on the machine’s hardware.
Most compilers of 32-bit and 64-bit machine take 8 bits as a byte, which is also the size of a char, and take four bytes as the size of an int.
type | minimum size | common size | feature |
---|---|---|---|
bool | NA | 1 byte | |
char | 1 byte | 1 byte | |
wchar_t | 2 bytes | ||
char16_t | 2 bytes | Unicode | |
char32_t | 4 bytes | Unicode | |
short | 2 bytes | ||
int | 2 bytes | 4 bytes | |
long | 4 bytes | 4 bytes | |
long long | 8 bytes | 8 bytes | |
float | 4 bytes | 6 significant digits | |
double | 8 bytes | 10 significant digits | |
long double | 12 or 16 bytes | 10 significant digits |
Most computers associate a number (called an “address”) with each byte in memory.
The addresses are given for each byte instead of each bit. It can be imagined that the computer takes a byte as the smallest unit that can be manipulated. We cannot define a variable that has the size smaller than a byte. We can name a byte but we cannot name a bit. There is not any operator that can operate a single bit. These are all because of the above statement.
Besides, it can be understood that the size of all kinds of type, including class type, must be a multiple of one byte.
To give meaning to memory at a given address, we must know the type of the value stored there. The type determins how many bits are used and how to interpret those bits.
That is to say when defining a variable with a specific type, the computer would assign enough memory (one or more bytes) to this variable according to the type classification at the same time.
The float and double types typically yield about 7 and 16 significant digits, respectively.
A float is represented by two numbers named m (mantissa) and e (exponent) as the following equation:
a = m × 2 e a=m\times2^e a=m×2e
The base number is always 2 in every computer binary representation. The range of m m m should be [ 1.0 , 2.0 ] [1.0, 2.0] [1.0,2.0]. 32 bits, which amount to 4 bytes, is used to represent a float. The highest bit is the sign bit. The 8 bits in the middle are used to represent e e e. The lowest 23 bits are used to represent &m&. The largest number 23 bits can represent is 2 23 − 1 = 8388607 2^{23}-1=8388607 223−1=8388607. Therefore, a float type can have 7 significant digits at most. Because not all 7 significant digits number can be represented by float, we should say that a float type can yield 6 significant digits. It is the same for the double type. The 11 bits in the middle are used to represent e e e. The lowest 52 bits are used to represent &m&. The largest number 52 bits can represent is 2 52 − 1 2^{52}-1 252−1.
Use double for float-point computations; float usually does not have enough precision, and the cost of double-precision calculations versus single-precision is negligible. In fact, on some machines, double-precision operations are faster than single. The precision offered by long double usually is unnecessary and often entails considerable run-time cost.
This is because, for both float and double, their operators deal with numbers using double-precision. The computer has to transfer a float to a double before it performs its operation, which takes some extra time.
Except for bool and the extended character types, the integral types may be signed or unsigned.
For int types, it is very simple to distinguish signed and unsigned.
types | feature |
---|---|
int, short, long, long long | signed |
unsigned, unsigned short, unsigned long, unsigned long long | unsigned |
For char types, to distinguish the two types is not so straight forward.
Although there are three character types, there are only two representations: signed and unsigned. The (plain) char type uses one of these representation. Which of the two character representations is equivalent to char depends on the compiler.
types | feature |
---|---|
char | signed or unsigned |
signed char | signed |
unsigned char | unsigned |
Caution:
Do not use plain char or bool in arithmetic expressions. Computations using char are especially problematic because char is signed on some machines and unsigned on others. If you need a tiny integer, explicitly specify either signed char or unsigned char.
Type conversions happen automatically when we use an object of one type where an object of another type is expected.
from | to | result |
---|---|---|
nonbool arithmetic types | bool | false, if the value is 0; true, if the value is not 0 |
bool | nonbool arithmetic types | 1, if the value is true; 0, if the value is false |
float-point | integral | truncated |
integral | float-point | the fractional part is zero; precision may be lost if the integral has more bits than the float-point object can accommodate |
out-of-range value | unsigned type | the remainder of the value modulo the number of values the target type can hold, eg: -1 -> 255 |
out-of-range value | signed type | undefined |
We should be careful to make sure the automatic type conversion is what we expect. Avoid assigning an out-of-range value to a signed or unsigned type.
Caution:
If we use both unsigned and int values in an arithmetic expression, the int value ordinarily is converted to unsigned.
It is essential to remember that signed values are automatically converted to unsigned.
When the computer converts a negative value to an object of unsigned type, which can be seen as the situation that an out-of-range value is converted to an unsigned type, the machine code in the memory does not change. For example,
undigned char u = 10;
signed char i = -42;
std::cout << i + i << std::endl;
std::cout << i + u << std::endl;
In the second cout statement, i would be converted to an unsigned char before the calculation begins. The machine code of i of signed char type is 11010110 11010110 11010110, where the the highest bit is a sign bit. Now that this machine code should be treated as an unsigned char, the highest bit is not a sign bit any more. Therefore, it is 86 in unsigned char type. This result is exactly the same to what we can get if we calculate like this: − 42 % 2 8 = 86 -42\%2^8=86 −42%28=86, which is the remainder of the value modulo the number of values the target type can hold as we mention above.
That fact that an unsigned cannot be less than zero also affects how we write loops.
For example, here is the correct statement.
for (int i = 10; i >= 0; --1)
std::cout << i << std::endl;
If we use unsigned instead of int, the loop will never end because an unsigned number can never be less than 0. That is to say, the ending condition can never be satisfied.
for (unsigned i = 10; i >= 0; --1)
std::cout << i << std::endl;
We can write an integer literal using octal, decimal, or hexadecimal notation.
begin with | integer form | example |
---|---|---|
0 | octal | 024 |
– | decimal | 20 |
0x, 0X | hexadecimal | 0x14 |
By default, decimal literals are signed whereas octal and hexadecimal literals can be either signed or unsigned types.
Besides, we can specify the type of a literal using a prefix or suffix.
For integer literals,
prefix or suffix | type |
---|---|
-u, -U | unsiged |
- l, -L | long |
-ll, -LL | long long |
Caution:
When you write a long literal, use the uppercase L; the lowercase letter l is too easily mistaken for the digit 1.
For float-point literals,
prefix or suffix | type |
---|---|
-f, -F | float |
-l, -L | long |
For char or string literals,
prefix or suffix | type |
---|---|
u- | char16_t |
U- | char32_t |
L- | wchar_t |
u8- | char (utf-8, string literals only) |
Some characters, such as backspace or control characters, have no visible image. Such characters are nonprintable. Other characters (single and double quotation marks, question mark, and backslash) have special meaning in the language. Our programs cannot use any of these characters directly. Instead, we use an escape sequence to represent such characters.
character | escape sequence |
---|---|
new line | \n |
horizontal tab | \t |
vertical tab | \v |
backspace | \b |
double quote | \" |
single quote | \’ |
backslash | \ |
question mark | \? |
carriage return | \r |
character in hexadecimal form | \x followed by one or more hexadecimal digits |
charaacter in octal form | \ followed by at most three octal digits |
int units_sold = 0;
int units_sold = {0}; // list initialization
int units_sold{0}; // list initialization
int units_sold(0);
List initialization is a new feature of C++11.
Initialization and assignment are different operations in C++
When we use list initialization,
the compiler will not let us list initialize variables of built-in type if the initializer might lead to the loss of information.
That is to say, when we use list initialization, there will be no automatic conversion of types and thus cause an error. It works in a different way from assignment.
Variables defined outside any function body are initialized to zero.
Variables of built-in type defined inside a function are uninitialized, The value of an uninitialized variable of built-in type is undefined. It is an error to copy or otherwise try to access the value of a variable whose value is undefined.
A declaration makes a name known to the program.
the variable with an initial value.
A variable declaration specifies the type and name of a variable.
More generally, a declaration is a base type followed by a list of declarators. Each declarators names a variable and give the variable a type that is related to the base type.
A definition creates the associated entity.
In addition to specifying the name and type, a definition also allocates storage and may provide the variable with an initial value.
A variable definition is a declaration.
S d e f S_{def} Sdef is the set of all variable definitions
S d e c S_{dec} Sdec is the set of all variable declarations
S d e f ⊂ S d e c S_{def} \subset S_{dec} Sdef⊂Sdec
To obtain a declaration that is not also a definition, we add the extern keyword and may not provide an explicit initializer.
If we want a declaration x x x that x ∈ S d e c x \in S_{dec} x∈Sdec and x ∉ S d e f x \notin S_{def} x∈/Sdef, we should add extern, and not to initialize.
The global scope has no name. Hence, when the scope operator has an empty left-hand side, it is a request to fetch the name on the right-hand side from the global scope.
integral types 整形
literals 字面值
octal 八进制
hexadecimal 十六进制
escape sequence 转义字符
identifier 标识符