Consider the Differences Between the 32-bit and 64-bit Data ModelsThe biggest difference between the 32-bit and the 64-bit compilation environments is the change in data-type models. The C data-type model for 32-bit applications is the ILP32 model, so named because the It is not unusual for current 32-bit applications to assume that the Use the lint Utility to Detect Problems with 64-bit |
char *p; |
The following version will function correctly when compiled to both 32-bit and 64-bit targets:
char *p;
p = (char *) ((uintptr_t)p & PAGEOFFSET);
Because integers and longs are never really distinguished in the ILP32 data-type model, your existing code probably uses them indiscriminately. Modify any code that uses integers and longs interchangeably so it conforms to the requirements of both the ILP32 and LP64 data-type models. While an integer and a long are both 32-bits in the ILP32 data-type model, a long is 64 bits in the LP64 data-type model.
Consider the following example:
int waiting;
long w_io;
long w_swap;
...
waiting = w_io + w_swap;
% cc
warning: assignment of 64-bit integer to 32-bit integer
Sign extension is a common problem when you convert to the 64-bit compilation environment because the type conversion and promotion rules are somewhat obscure. To prevent sign-extension problems, use explicit casting to achieve the intended results.
To understand why sign extension occurs, it helps to understand the conversion rules for ISO C. The conversion rules that seem to cause the most sign extension problems between the 32-bit and the 64-bit compilation environment come into effect during the following operations:
You can use a char, short, enumerated type, or bit-field, whether signed or unsigned, in any expression that calls for an integer. If an integer can hold all possible values of the original type, the value is converted to an integer; otherwise, the value is converted to an unsigned integer.
When an integer with a negative sign is promoted to an unsigned integer of the same or larger type, it is first promoted to the signed equivalent of the larger type, then converted to the unsigned value.
When the following example is compiled as a 64-bit program, the addr variable becomes sign-extended, even though both addr and a.base are unsigned types.
%cat test.c
struct foo {
unsigned int base:19, rehash:13;
};
main(int argc, char *argv[])
{
struct foo a;
unsigned long addr;
a.base = 0x40000;
addr = a.base << 13; /* Sign extension here! */
printf("addr 0x%lx\n", addr);
addr = (unsigned int)(a.base << 13); /* No sign extension here! */
printf("addr 0x%lx\n", addr);
}
This sign extension occurs because the conversion rules are applied as follows:
The structure member a.base
is converted from an unsigned int
bit field to an int
because of the integral promotion rule. In other words, because the unsigned 19-bit field fits within a 32-bit integer, the bit field is promoted to an integer rather than an unsigned integer. Thus, the expression a.base << 13
is of type int
. If the result were assigned to an unsigned int
, this would not matter because no sign extension has yet occurred.a.base << 13
is of type int
, but it is converted to a long
and then to an unsigned long
before being assigned to addr
, because of signed and unsigned integer promotion rules. The sign extension occurs when performing the int
to long
conversion.Thus, when compiled as a 64-bit program, the result is as follows:
% cc -o test64 -xarch=v9 test.c
% ./test64
addr 0xffffffff80000000
addr 0x80000000
%
When compiled as a 32-bit program, the size of an unsigned long
is the same as the size of an int
, so there is no sign extension.
% cc -o test test.c
% ./test
addr 0x80000000
addr 0x80000000
%
Check the internal data structures in an applications for holes; that is, extra padding appearing between fields in the structure to meet alignment requirements. This extra padding is allocated when long
or pointer fields grow to 64 bits for the LP64 data-type model, and appear after an int
that remains at 32 bits in size. Since long
and pointer types are 64-bit aligned in the LP64 data-type model, padding appears between the int
and long
or pointer type. In the following example, member p
is 64-bit aligned, and so padding appears between the member k
and member p
.
struct bar {
int i;
long j;
int k;
char *p;
}; /* sizeof (struct bar) = 32 bytes */
Also, structures are aligned to the size of the largest member within them. Thus, in the above structure, padding appears between member i
and member j
.
When you repack a structure, follow the simple rule of moving the long and pointer fields to the beginning of the structure. Consider the following structure definition:
struct bar {
char *p;
long j;
int i;
int k;
}; /* sizeof (struct bar) = 24 bytes */
Be sure to check the members of unions because their fields can change size between the ILP32 and the LP64 data-type models, making the size of the members different. In the following union, member _d
and member array _l
are the same size in the ILP32 model, but different in the LP64 model because long
types grow to 64 bits in the LP64 model, but double
types do not.
typedef union {
double _d;
long _l[2];
} llx_
The size of the members can be rebalanced by changing the type of the _l
array member from type long
to type int
.
A lack of precision can cause the loss of data in some constant expressions. Be explicit when you specify the data types in your constant expression. Specify the type of each integer constant by adding some combination of {u,U,l,L
}. You can also use casts to specify the type of a constant expression. Consider the following example:
int i = 32;
long j = 1 << i; /* j will get 0 because RHS is integer expression */
The above code can be made to work as intended, by appending the type to the constant, 1
, as follows:
int i = 32;
long j = 1L << i; /* now j will get 0x100000000, as intended */
Make sure the format strings for printf
(3S), sprintf
(3S), scanf
(3S), and sscanf
(3S) can accommodate long or pointer arguments. For pointer arguments, the conversion operation given in the format string should be %p
to work in both the 32-bit and 64-bit compilation environments. For long
arguments, the long size specification, l
, should be prepended to the conversion operation character in the format string.
Also, check to be sure that buffers passed to the first argument in sprintf
contain enough storage to accommodate the expanded number of digits used to convey long and pointer values. For example, a pointer is expressed by 8 hex digits in the ILP32 data model but expands to 16 in the LP64 data model.
sizeof()
Operator is an unsigned long
In the LP64 data-type model, sizeof()
has the effective type of an unsigned long
. If sizeof()
is passed to a function expecting an argument of type int
, or assigned or cast to an int
, the truncation could cause a loss of data. This is only likely to be problematic in large database programs containing extremely long arrays.
For data structures that are shared between 32-bit and 64-bit versions of an application, stick with data types that have a common size between ILP32 and LP64 programs. Avoid using long
data types and pointers. Also, avoid using derived data types that change in size between 32-bit and 64-bit applications. For example, the following types defined in <sys/types.h>
change in size between the ILP32 and LP64 data models:
clock_t
, which represents the system time in clock ticksdev_t
, which is used for device numbersoff_t
, which is used for file sizes and offsetsptrdiff_t
, which is the signed integral type for the result of subtracting two pointerssize_t
, which reflects the size, in bytes, of objects in memoryssize_t
, which is used by functions that return a count of bytes or an error indicationtime_t
, which counts time in secondsUsing the derived data types in <sys/types.h>
is a good idea for internal data, because it helps to insulate the code from data-model changes. However, preccisely because the size of these types are prone to change with the data model, using them is not recommended in data that is shared between 32-bit and 64-bit applications, or in other situations where the data size must remain fixed. Nevertheless, as with the sizeof() operator discussed above, before making any changes to the code, consider whether the loss of precision will actually have any practical impact on the program.
For binary interface data, consider using the fixed-width integer types in <inttypes.h>. These types are good for explicit binary representations of the following:
Be aware that a type change in one area can result in an unexpected 64-bit conversion in another area. For example, check all the callers of a function that previously returned an int
and now returns an ssize_t
.
long
Arrays on PerformanceLarge arrays of long
or unsigned long
types, can cause serious performance degradation in the LP64 data-type model as compared to arrays of int
or unsigned int
types. Large arrays of long
types cause significantly more cache misses and consume more memory. Therefore, if int
works just as well as long
for the application purposes, it's better to use int
rather than long
. This is also an argument for using arrays of int
types instead of arrays of pointers. Some C applications suffer from serious performance degradation after conversion to the LP64 data-type model because they rely on many, large, arrays of pointers.
将 32 位应用程序转换成 64 位应用程序时的主要问题是 int 类型相对 long 和指针类型的大小发生了变化。将 32 位程序转换成 64 位程序时,只有 long 类型和指针类型的大小从 32 位转换成 64 位;整数类型 int 的大小仍然保持为 32 位。这导致将指针类型或 long 类型赋值给 int 类型时会发生数据截断问题。而且,将使用小于 int 类型的表达式赋值给 unsigned long 或指针时,可能会发生符号位扩展问题。本文讨论如何避免或消除这些问题。
考虑 32 位和 64 位数据模型之间的差异
32 位和 64 位编译环境之间最大的不同之处在于数据类型模型的变化。32 位应用程序的 C 数据类型模型是 ILP32 模型,如此命名是因为 int 类型、 long 类型和指针都是 32 位数据类型。 64 位应用程序的数据类型模型是 LP64 数据模型,如此命名是因为 long 和指针类型变成了 64 位。其余 C 整数类型和浮点类型在这两种数据类型模型中相同。
当前的 32 位应用程序通常假设 int 类型、 long 类型和指针大小相同。因为在 LP64 数据模型中, long 和 pointer 发生了变化,这种变化正是导致 ILP32 到 LP64 转换问题的主要原因。
使用 lint 实用工具检测 64 位 long 类型和指针类型是否存在问题
使用 lint 检查同时针对 32 位编译和 64 位编译环境编写的代码。指定 -errchk=longptr64 选项以生成 LP64 警告。还能使用 -errchk=longptr64 标记检查对以下环境的可移植性:其中 long 整数和指针的大小为 64 位,而无格式整数的大小为 32 位。 -errchk=longptr64 标记检查从指针表达式和 long 整数表达式到无格式整数表达式的赋值(即使使用显式转换)。
使用 -errchk=longptr64,signext 选项可发现满足以下条件的代码:其中常规 ISO C 值保留规则允许在无符号整型表达式中使用带符号整数值的符号扩展。要检查仅打算在 64 位 SPARC 编译环境中运行的代码,请使用 lint 的 -xarch=v9 选项。要检查打算在 x86 64 位环境中运行的代码,请使用 -xarch=amd64。
lint 生成警告时,它将打印出问题代码的行编号、描述该问题的消息以及是否涉及到指针。警告消息还会指出所涉及的数据类型大小。知道涉及到指针并且知道数据类型大小之后,就可以发现具体的 64 位问题,并且避免在 32 位类型和更小类型之间转换时早已存在的问题。
通过在前一行添加一个 "NOTE(LINTED())" 形式的注释,可以取消指定代码行的警告。如果您希望 lint 忽略类型转换和赋值之类的特定代码行,那么这种方法很有用。使用 "NOTE(LINTED())" 注释时需要特别谨慎,因为它可能掩盖真正的问题。使用 NOTE 时,还要包含 #include。请参考 lint man 手册页了解更多信息。
检查指针大小相对于无格式整数大小的变化
由于无格式整数和指针在 ILP32 编译环境中的大小相同,所以 32 位代码通常以这个假设为基准。指针经常被转换成 int 或 unsigned int 以进行地址运算。还可以将指针转换成 unsigned long,因为在 ILP32 和 LP64 数据类型模型中, long 和指针类型的大小相同。然而,并非显式地使用 unsigned long,而是使用 uintptr_t,因为它能更确切地表达您的意图,并使代码更容易移植,使其免受将来变化的影响。要使用 uintptr_t 和 intptr_t,必须添加 #include 。
考虑以下示例:
char *p;
p = (char *) ((int)p & PAGEOFFSET);
% cc ..
warning: conversion of pointer loses bits
以下版本在编译到 32 位和 64 位目标文件时都能正常工作:
char *p;
p = (char *) ((uintptr_t)p & PAGEOFFSET);
检查 Long 整数大小相对于无符号整数大小的变化
因为在 ILP32 数据类型模型中整数和 long 没有实质不同,所以现有代码可能无差别地使用它们。请修改整数和 long 可以互换使用的代码,使其符合 ILP32 和 LP64 数据类型模型的要求。在 ILP32 数据类型模型中整数和 long 都是 32 位,而在 LP64 数据类型模型中 long 为 64 位。
考虑以下示例:
int waiting;
long w_io;
long w_swap;
...
waiting = w_io + w_swap;
% cc
warning: assignment of 64-bit integer to 32-bit integer
检查符号扩展
转换到 64 位编译环境时,符号扩展是常见问题,因为类型转换和提升规则比较含糊。要防止符号扩展问题,请使用显式类型转换以获得期望结果。
了解发生符号扩展的原因,有助于了解 ISO C 的转换规则。在以下操作期间,一些生效的转换规则可能会导致 32 位和 64 位编译环境之间的大多数符号扩展问题: