32位程序移植到64位平台前的准备工作

Converting 32-bit Applications Into 64-bit Applications: Things to Consider

 
By The Sun Studio Team, January 2005  
The principal cause of problems when converting 32-bit applications to 64-bit applications is the change in size of the int type with respect to the long and pointer types. When converting 32-bit programs to 64-bit programs, only long types and pointer types change in size from 32 bits to 64 bits; integers of type int stay at 32 bits in size. This can cause trouble with data truncation when assigning pointer or long types to int types. Also, problems with sign extension can occur when assigning expressions using types shorter than the size of an int to an unsigned long or a pointer. This article discusses how to avoid or eliminate these problems.
 

Consider the Differences Between the 32-bit and 64-bit Data Models

The biggest difference between the 32-bit and the 64-bit compilation environments is the change in data-type models. The C data-type model for 32-bit applications is the ILP32 model, so named because the int and longtypes, and pointers, are 32-bit data types. The data-type model for 64-bit applications is the LP64 data model, so named because long and pointer types grow to 64 bits. The remaining C integer types and the floating-point types are the same in both data-type models.

It is not unusual for current 32-bit applications to assume that the int type, long type, and pointers are the same size. Because the size of long and pointer change in the LP64 data model, this change alone is the principal cause of ILP32-to-LP64 conversion problems.

Use the lint Utility to Detect Problems with 64-bit long and Pointer Types

Use lint to check code that is written for both the 32-bit and the 64-bit compilation environment. Specify the -errchk=longptr64 option to generate LP64 warnings. Also use the -errchk=longptr64 flag which checks portability to an environment for which the size of long integers and pointers is 64 bits and the size of plain integers is 32 bits. The -errchk=longptr64 flag checks assignments of pointer expressions and long integer expressions to plain integers, even when explicit casts are used.

Use the -errchk=longptr64,signext option to find code where the normal ISO C value-preserving rules allow the extension of the sign of a signed-integral value in an expression of unsigned-integral type. Use the -xarch=v9 option of lint when you want to check code that you intend to run in the Solaris 64-bit SPARC compilation environment only. Use -xarch=amd64 when you want to check code you intend to run in the x86 64-bit environment.

When lint generates warnings, it prints the line number of the offending code, a message that describes the problem, and whether or not a pointer is involved. The warning message also indicates the sizes of the involved data types. When you know a pointer is involved and you know the size of the data types, you can find specific 64-bit problems and avoid the pre-existing problems between 32-bit and smaller types.

You can suppress the warning for a given line of code by placing a comment of the form "NOTE(LINTED(<optional message>))" on the previous line. This is useful when you want lint to ignore certain lines of code such as casts and assignments. Exercise extreme care when you use the "NOTE(LINTED(<optional message>))" comment because it can mask real problems. When you use NOTE, also include #include<note.h>. Refer to the lint man page for more information.

Check for Changes of Pointer Size With Respect to the Size of Plain Integers

Since plain integers and pointers are the same size in the ILP32 compilation environment, 32-bit code commonly relies on this assumption. Pointers are often cast to int or unsigned int for address arithmetic. You can cast your pointers to unsigned long because long and pointer types are the same size in both ILP32 and LP64 data-type models. However, rather than explicitly using unsigned long, use uintptr_t instead because it expresses your intent more closely and makes the code more portable, insulating it against future changes. To use the uintptr_t and intptr_t you need to #include <inttypes.h>.

Consider the following example:

char *p;
p = (char *) ((int)p & PAGEOFFSET);
% cc ..
warning: conversion of pointer loses bits

The following version will function correctly when compiled to both 32-bit and 64-bit targets:

char *p;
p = (char *) ((uintptr_t)p & PAGEOFFSET);

Check for Changes in Size of Long Integers With Respect to the Size of Plain Integers

Because integers and longs are never really distinguished in the ILP32 data-type model, your existing code probably uses them indiscriminately. Modify any code that uses integers and longs interchangeably so it conforms to the requirements of both the ILP32 and LP64 data-type models. While an integer and a long are both 32-bits in the ILP32 data-type model, a long is 64 bits in the LP64 data-type model.

Consider the following example:

int waiting;
long w_io;
long w_swap;
...
waiting = w_io + w_swap;

% cc
warning: assignment of 64-bit integer to 32-bit integer

Check for Sign Extensions

Sign extension is a common problem when you convert to the 64-bit compilation environment because the type conversion and promotion rules are somewhat obscure. To prevent sign-extension problems, use explicit casting to achieve the intended results.

To understand why sign extension occurs, it helps to understand the conversion rules for ISO C. The conversion rules that seem to cause the most sign extension problems between the 32-bit and the 64-bit compilation environment come into effect during the following operations:

  • Integral promotion

    You can use a char, short, enumerated type, or bit-field, whether signed or unsigned, in any expression that calls for an integer. If an integer can hold all possible values of the original type, the value is converted to an integer; otherwise, the value is converted to an unsigned integer.

  • Conversion between signed and unsigned integers

    When an integer with a negative sign is promoted to an unsigned integer of the same or larger type, it is first promoted to the signed equivalent of the larger type, then converted to the unsigned value.

When the following example is compiled as a 64-bit program, the addr variable becomes sign-extended, even though both addr and a.base are unsigned types.

%cat test.c
struct foo {
  unsigned int base:19, rehash:13;
};
main(int argc, char *argv[])
{
  struct foo a;
  unsigned long addr;
  a.base = 0x40000;
  addr = a.base << 13; /* Sign extension here! */
  printf("addr 0x%lx\n", addr);
  addr = (unsigned int)(a.base << 13); /* No sign extension here! */
  printf("addr 0x%lx\n", addr);
}

This sign extension occurs because the conversion rules are applied as follows:

  • The structure member a.base is converted from an unsigned int bit field to an int because of the integral promotion rule. In other words, because the unsigned 19-bit field fits within a 32-bit integer, the bit field is promoted to an integer rather than an unsigned integer. Thus, the expression a.base << 13 is of type int. If the result were assigned to an unsigned int, this would not matter because no sign extension has yet occurred.
  • The expression a.base << 13 is of type int, but it is converted to a long and then to an unsigned long before being assigned to addr, because of signed and unsigned integer promotion rules. The sign extension occurs when performing the int to long conversion.

Thus, when compiled as a 64-bit program, the result is as follows:

% cc -o test64 -xarch=v9 test.c
% ./test64
addr 0xffffffff80000000
addr 0x80000000
%

 When compiled as a 32-bit program, the size of an unsigned long is the same as the size of an int, so there is no sign extension.

% cc -o test test.c
% ./test
addr 0x80000000
addr 0x80000000
%

Check Structure Packing

Check the internal data structures in an applications for holes; that is, extra padding appearing between fields in the structure to meet alignment requirements. This extra padding is allocated when long or pointer fields grow to 64 bits for the LP64 data-type model, and appear after an int that remains at 32 bits in size. Since long and pointer types are 64-bit aligned in the LP64 data-type model, padding appears between the int and longor pointer type. In the following example, member p is 64-bit aligned, and so padding appears between the member k and member p.

struct bar {
  int i;
  long j;
  int k;
  char *p;
}; /* sizeof (struct bar) = 32 bytes */

Also, structures are aligned to the size of the largest member within them. Thus, in the above structure, padding appears between member i and member j.

When you repack a structure, follow the simple rule of moving the long and pointer fields to the beginning of the structure. Consider the following structure definition:

struct bar {
  char *p;
  long j;
  int i;
  int k;
}; /* sizeof (struct bar) = 24 bytes */

Check for Unbalanced Size of Union Members

Be sure to check the members of unions because their fields can change size between the ILP32 and the LP64 data-type models, making the size of the members different. In the following union, member _d and member array _l are the same size in the ILP32 model, but different in the LP64 model because long types grow to 64 bits in the LP64 model, but double types do not.

typedef union {
  double _d;
  long _l[2];
} llx_

The size of the members can be rebalanced by changing the type of the _l array member from type long to type int.

Make Sure Constant Types are Used in Constant Expressions

A lack of precision can cause the loss of data in some constant expressions. Be explicit when you specify the data types in your constant expression. Specify the type of each integer constant by adding some combination of {u,U,l,L}. You can also use casts to specify the type of a constant expression. Consider the following example:

int i = 32;
long j = 1 << i; /* j will get 0 because RHS is integer expression */

The above code can be made to work as intended, by appending the type to the constant, 1, as follows:

int i = 32;
long j = 1L << i; /* now j will get 0x100000000, as intended */

Check Format String Conversions

Make sure the format strings for printf(3S), sprintf(3S), scanf(3S), and sscanf(3S) can accommodate long or pointer arguments. For pointer arguments, the conversion operation given in the format string should be %pto work in both the 32-bit and 64-bit compilation environments. For long arguments, the long size specification, l, should be prepended to the conversion operation character in the format string.

Also, check to be sure that buffers passed to the first argument in sprintf contain enough storage to accommodate the expanded number of digits used to convey long and pointer values. For example, a pointer is expressed by 8 hex digits in the ILP32 data model but expands to 16 in the LP64 data model.

Type Returned by sizeof() Operator is an unsigned long

In the LP64 data-type model, sizeof() has the effective type of an unsigned long. If sizeof() is passed to a function expecting an argument of type int, or assigned or cast to an int, the truncation could cause a loss of data. This is only likely to be problematic in large database programs containing extremely long arrays.

Use Portable Data Types or Fixed Integer Types for Binary Interface Data

For data structures that are shared between 32-bit and 64-bit versions of an application, stick with data types that have a common size between ILP32 and LP64 programs. Avoid using long data types and pointers. Also, avoid using derived data types that change in size between 32-bit and 64-bit applications. For example, the following types defined in <sys/types.h> change in size between the ILP32 and LP64 data models:

  • clock_t, which represents the system time in clock ticks
  • dev_t, which is used for device numbers
  • off_t, which is used for file sizes and offsets
  • ptrdiff_t, which is the signed integral type for the result of subtracting two pointers
  • size_t, which reflects the size, in bytes, of objects in memory
  • ssize_t, which is used by functions that return a count of bytes or an error indication
  • time_t, which counts time in seconds

Using the derived data types in <sys/types.h> is a good idea for internal data, because it helps to insulate the code from data-model changes. However, preccisely because the size of these types are prone to change with the data model, using them is not recommended in data that is shared between 32-bit and 64-bit applications, or in other situations where the data size must remain fixed. Nevertheless, as with the sizeof() operator discussed above, before making any changes to the code, consider whether the loss of precision will actually have any practical impact on the program.

For binary interface data, consider using the fixed-width integer types in <inttypes.h>. These types are good for explicit binary representations of the following:

  • Binary interface specifications
  • On-disk data
  • Over the data wire
  • Hardware registers
  • Binary data structures

Check for Side Effects

Be aware that a type change in one area can result in an unexpected 64-bit conversion in another area. For example, check all the callers of a function that previously returned an int and now returns an ssize_t.

Consider the Effect of long Arrays on Performance

Large arrays of long or unsigned long types, can cause serious performance degradation in the LP64 data-type model as compared to arrays of int or unsigned int types. Large arrays of long types cause significantly more cache misses and consume more memory. Therefore, if int works just as well as long for the application purposes, it's better to use int rather than long. This is also an argument for using arrays of int types instead of arrays of pointers. Some C applications suffer from serious performance degradation after conversion to the LP64 data-type model because they rely on many, large, arrays of pointers.


将 32 位应用程序转换成 64 位应用程序时的主要问题是 int 类型相对 long 和指针类型的大小发生了变化。将 32 位程序转换成 64 位程序时,只有 long 类型和指针类型的大小从 32 位转换成 64 位;整数类型 int 的大小仍然保持为 32 位。这导致将指针类型或 long 类型赋值给 int 类型时会发生数据截断问题。而且,将使用小于 int 类型的表达式赋值给 unsigned long 或指针时,可能会发生符号位扩展问题。本文讨论如何避免或消除这些问题。  
考虑 32 位和 64 位数据模型之间的差异
32 位和 64 位编译环境之间最大的不同之处在于数据类型模型的变化。32 位应用程序的 C 数据类型模型是 ILP32 模型,如此命名是因为 int 类型、 long 类型和指针都是 32 位数据类型。 64 位应用程序的数据类型模型是 LP64 数据模型,如此命名是因为 long 和指针类型变成了 64 位。其余 C 整数类型和浮点类型在这两种数据类型模型中相同。
当前的 32 位应用程序通常假设 int 类型、 long 类型和指针大小相同。因为在 LP64 数据模型中, long 和 pointer 发生了变化,这种变化正是导致 ILP32 到 LP64 转换问题的主要原因。
使用 lint 实用工具检测 64 位 long 类型和指针类型是否存在问题
使用 lint 检查同时针对 32 位编译和 64 位编译环境编写的代码。指定 -errchk=longptr64 选项以生成 LP64 警告。还能使用 -errchk=longptr64 标记检查对以下环境的可移植性:其中 long 整数和指针的大小为 64 位,而无格式整数的大小为 32 位。 -errchk=longptr64 标记检查从指针表达式和 long 整数表达式到无格式整数表达式的赋值(即使使用显式转换)。
使用 -errchk=longptr64,signext 选项可发现满足以下条件的代码:其中常规 ISO C 值保留规则允许在无符号整型表达式中使用带符号整数值的符号扩展。要检查仅打算在 64 位 SPARC 编译环境中运行的代码,请使用 lint 的 -xarch=v9 选项。要检查打算在 x86 64 位环境中运行的代码,请使用 -xarch=amd64。
lint 生成警告时,它将打印出问题代码的行编号、描述该问题的消息以及是否涉及到指针。警告消息还会指出所涉及的数据类型大小。知道涉及到指针并且知道数据类型大小之后,就可以发现具体的 64 位问题,并且避免在 32 位类型和更小类型之间转换时早已存在的问题。
通过在前一行添加一个 "NOTE(LINTED())" 形式的注释,可以取消指定代码行的警告。如果您希望 lint 忽略类型转换和赋值之类的特定代码行,那么这种方法很有用。使用 "NOTE(LINTED())" 注释时需要特别谨慎,因为它可能掩盖真正的问题。使用 NOTE 时,还要包含 #include。请参考 lint man 手册页了解更多信息。
检查指针大小相对于无格式整数大小的变化
由于无格式整数和指针在 ILP32 编译环境中的大小相同,所以 32 位代码通常以这个假设为基准。指针经常被转换成 int 或 unsigned int 以进行地址运算。还可以将指针转换成 unsigned long,因为在 ILP32 和 LP64 数据类型模型中, long 和指针类型的大小相同。然而,并非显式地使用 unsigned long,而是使用 uintptr_t,因为它能更确切地表达您的意图,并使代码更容易移植,使其免受将来变化的影响。要使用 uintptr_t 和 intptr_t,必须添加 #include 。
考虑以下示例:
char *p; 
p = (char *) ((int)p & PAGEOFFSET); 
% cc .. 
warning: conversion of pointer loses bits 
以下版本在编译到 32 位和 64 位目标文件时都能正常工作:
char *p; 
p = (char *) ((uintptr_t)p & PAGEOFFSET); 
检查 Long 整数大小相对于无符号整数大小的变化
因为在 ILP32 数据类型模型中整数和 long 没有实质不同,所以现有代码可能无差别地使用它们。请修改整数和 long 可以互换使用的代码,使其符合 ILP32 和 LP64 数据类型模型的要求。在 ILP32 数据类型模型中整数和 long 都是 32 位,而在 LP64 数据类型模型中 long 为 64 位。
考虑以下示例:
int waiting; 
long w_io; 
long w_swap; 
... 
waiting = w_io + w_swap; 
% cc 
warning: assignment of 64-bit integer to 32-bit integer 
检查符号扩展
转换到 64 位编译环境时,符号扩展是常见问题,因为类型转换和提升规则比较含糊。要防止符号扩展问题,请使用显式类型转换以获得期望结果。
了解发生符号扩展的原因,有助于了解 ISO C 的转换规则。在以下操作期间,一些生效的转换规则可能会导致 32 位和 64 位编译环境之间的大多数符号扩展问题:


  • 整型提升 
    在任何要求使用整型的表达式中,都可使用有符号或无符号的 char、short、枚举类型或位字段(bit-field)。如果整数可以容纳原始类型的所有可能的值,那么该值将被转换成整数;否则,该值将被转换成无符号的整数。
  • 有符号和无符号整型之间的转换 
    将带有负号的整数提升为相同大小或更大的无符号整数类型时,首先将其提升为更大类型的有符号当量,然后再转换成无符号值。

将以下示例编译成 64 位程序时,addr 变量将发生符号扩展,尽管 addr 和 a.base 都是无符号类型。
%cat test.c 
struct foo { 
  unsigned int base:19, rehash:13; 
}; 
main(int argc, char *argv[]) 

  struct foo a; 
  unsigned long addr; 
  a.base = 0x40000; 
  addr = a.base  
发生符号扩展的原因是应用了以下转换规则:

  • 结构成员 a.base 从 unsigned int 位域转换成 int 是因为整型提升规则。换而言之,因为 32 位整数可以容纳无符号的 19 位域,所以该位域提升为整数,而不是无符号的整数。 因此,表达式 a.base  的类型为 int。如果将结果赋值为 unsigned int,没有关系,因为没有发生符号扩展。 
  • 表达式 a.base  是 int 类型,但是在将其赋值给 addr 之前,它被转换成 long 然后又被转换成 unsigned long,这是因为有符号和无符号整数提升规则。执行 int 到 long 转换时,将发生符号扩展。

因此,编译成 64 位程序时,结果如下:
% cc -o test64 -xarch=v9 test.c 
% ./test64 
addr 0xffffffff80000000 
addr 0x80000000 

编译为 32 位程序时, unsigned long 的大小与 int 的大小相同,因此不发生符号扩展。
% cc -o test test.c 
% ./test 
addr 0x80000000 
addr 0x80000000 

检查结构封装
检查应用程序中的内部数据结构以查找漏洞;确切而言,就是结构中的域之间出现额外填充以满足对齐要求。当 long 或指针域变成 LP64 数据类型模型的 64 位时,并且出现在大小仍然为 32 位的 int 之后,就会分配额外填充。由于 long 和指针类型在 LP64 数据类型模型中为 64 位对齐,所以填充出现在 int 和 long 或指针类型之间。在以下示例中,成员 p 为 64 位对齐的,因此成员 k 和成员 p 之间出现了填充。
struct bar { 
  int i; 
  long j; 
  int k; 
  char *p; 
}; /* sizeof (struct bar) = 32 bytes */ 
并且,结构与其中最大的成员大小对齐。因此,在以上结构中,成员 i 和成员 j 之间出现了填充。
重新封装结构时,请遵循将 long 和指针域移动到结构开始部分的简单规则。考虑以下结构定义:
struct bar { 
  char *p; 
  long j; 
  int i; 
  int k; 
}; /* sizeof (struct bar) = 24 bytes */ 
检查联合成员的大小是否均衡
一定要检查联合成员,因为其域大小在 ILP32 和 LP64 数据类型模型之间转换时可能会发生变化,从而使成员的大小变得不同。在以下联合中,成员 _d 和成员数组 _l 在 ILP32 模型中大小相同,但是在 LP64 模型中不同,因为 long 类型在 LP64 模型中变成 64 位,但是 double 类型没有变化。
typedef union { 
  double _d; 
  long _l[2]; 
} llx_ 
通过将 _l 数组成员从 long 类型变成 int 类型,可以重新使成员大小变得均衡。
确保在常量表达式中使用常量类型
精度损失可能导致一些常量表达式损失数据。指定常量表达式中的数据类型时,请显式地指定。通过添加一些 { u,U,l,L} 组合指定每个整数常量。还可以使用类型转换来指定常量表达式的类型。请考虑以下示例:
int i = 32; 
long j = 1  
通过如下所示向常量 1 附加类型,可使以上代码按照预期的方式工作:
int i = 32; 
long j = 1L  
检查格式字符串转换
确保 printf(3S)、 sprintf(3S)、 scanf(3S) 和 sscanf(3S) 可容纳 long 或指针类型的自变量。对于指针自变量,格式字符串中给出的转换操作应该为 %p,以便能同时在 32 位和 64 位编译环境中工作。对于 long 自变量,应该优先考虑在格式字符串中使用指明 long 大小的 l 作为转换运算字符。
还要检查确保传递给 sprintf 中第一个自变量的缓存包含足够的存储空间,以容纳扩展之后用来表示 long 和指针类型值的数字。例如,在 ILP32 数据模型中,指针是使用 8 个十六进制数字表示的,但是在 LP64 数据模型中扩展到了 16 个。
sizeof() 运算符返回的类型为 unsigned long
在 LP64 数据类型模型中, sizeof() 的有效类型为 unsigned long。如果将 sizeof() 传递给期望 int 类型自变量的函数,或通过赋值或类型转换将其变为 int,那么截断可能会丢失数据。仅在包含非常长数组的大型数据库程序中,这才可能成为问题。
对于二进制接口数据使用可移植的数据类型或固定的整数类型
对于 32 位和 64 位应用程序版本共享的数据结构,请坚持使用 ILP32 和 LP64 程序中大小相同的数据类型。避免使用 long 数据类型和指针。并且,避免使用在 32 位和 64 位应用程序中大小发生变化的派生数据类型。例如,  中定义的以下类型在 ILP32 和 LP64 数据模型中大小发生了变化:

  • clock_t,以时钟计时单元数表示的系统时间 
  • dev_t,用于表示设备编号 
  • off_t,用于表示文件大小和偏移量 
  • ptrdiff_t,用于表示两个指针相减所得结果的有符号整型 
  • size_t,以字节为单位反应内存中对象的大小 
  • ssize_t,用于返回字节计数或错误指示的函数 
  • time_t,以秒为单位计时

对于内部数据来说,使用  中的派生数据类型是个不错的主意,因为它有助于使代码免受数据模型变化的影响。然而,正是因为这些类型的大小易于随着数据模型发生变化,所以不推荐在 32 位和 64 位应用程序中共享的数据中使用,也不推荐在其他数据大小必须固定的情况下使用。然而,对于前面讨论的 sizeof() 运算符,在更改代码之前,考虑精度损失是否能对程序产生实质影响。
对于二进制接口数据,考虑使用  中固定宽度的整数。这些类型适用于以下显式的二进制表示:

  • 二进制接口规范 
  • 磁盘数据 
  • 数据线传输 
  • 硬件寄存器 
  • 二进制数据结构

检查副作用
注意一个区域中的类型变化可能导致其他区域发生意想不到的 64 位转换。例如,检查所有调用以下函数的内容:该函数以前返回 int 而现在返回 ssize_t。
考虑 long 数组对性能的影响
相对于 int 或 unsigned int 类型的数组, long 或 unsigned long 类型的大数组,在 LP64 数据类型模型中,可能严重地降低性能。 long 类型的大数组导致缓存命中率大幅下降,并且消耗更多内存。因此,如果 int 能够和 long 一样实现应用程序的用途,则最好使用 int 而不是 long。这个论点也适用于使用 int 类型的数组,而不是指针类型的数组。一些 C 应用程序在转换到 LP64 数据类型模型之后发生严重的性能退化,这是因为它们依赖于很多较大的指针数组。

你可能感兴趣的:(Integer,平台,extension,Types,compilation,Pointers)