Effective C# 原则17:装箱和拆箱的最小化
Item 17: Minimize Boxing and Unboxing
值类型是数据的容器,它们不具备多太性。另一方面就是说,.Net框架被设计成单一继承的引用类型,System.Object,在整个继承关系中做为根对象存在。设计这两种类型的目的是截然不同的,.Net框架使用了装箱与拆箱来链接两种不同类型的数据。装箱是把一个值类型数据放置在一个无类型的引用对象上,从而使一个值类型在须要时可以当成引用类型来使用。拆箱则是额外的从“箱”上拷贝一份值类型数据。装箱和拆箱可以让你在须要使用System.Object对象的地方使用值类型数据。但装箱与拆箱操作却是性能的强盗,在些时候装箱与拆箱会产生一些临时对象,它会导致程序存在一些隐藏的BUG。应该尽可能的避免使用装箱与拆箱。
装箱可以把一个值类型数据转化也一个引用类型,一个新的引用对象在堆上创建,它就是这个“箱子”,值类型的数据就在这个引用类型中存储了一份拷贝。参见图2.3,演示了装箱的对象是如何访问和存储的。箱子中包含一份这个值类型对象的拷贝,并且复制实现了已经装箱对象的接口。当你想从这个箱子中取回任何内容时,一个值类型数据的拷贝会被创建并返回。这就是装箱与拆箱的关键性概念:对象的一个拷贝存放到箱子中,而不管何时你再访问这个箱子时,另一个拷贝又会被创建。
图2.3,值类型数据在箱子中。把一个值类型数据转化成一个System.Object的引用,一个无名的引用类型会被创建。值类型的数据就存储在这个无名的引用对象中,所有的访问方法都要通过这个箱子才能到达值类型数据存储的地方。
最阴险的地方是这个装箱与拆箱很多时候是自动完成的!当你在任何一个期望类型是System.Object的地方使用值类型数据时,编译器会生成装箱与拆箱的语句。另外,当你通过一个接口指针来访问值类型数据时,装箱与拆箱也会发生。当你装箱时不会得到任何警告,即使是最简单的语句也一样。例如下面这个:
Console.WriteLine("A few numbers:{0}, {1}, {2}",
25, 32, 50);
使用重载的Console.WriteLine函数须要一个System.Object类型的数组引用,整型是值类型,所以必须装箱后才能传给重载的WriteLine方法。唯一可以强制这三个整数成为System.Object对象的方法就是把它们装箱。另外,在WriteLine内部,通过调用箱子对象上的ToString()方法来到达箱子内部。某种意义上讲,你生成了这样的结构:
int i =25;
object o = i; // box
Console.WriteLine(o.ToString());
在WriteLine内部,下面的执行了下面的代码:
object o;
int i = ( int )o; // unbox
string output = i.ToString( );
你可能自己从来不会写这样的代码,但是,却让编译器自动从一个指定的类型转化为System.Object,这确实是你做的。编译器只是想试着帮助你,它想让你成功(调用函数),它也很乐意在必要时候为你生成装箱和拆箱语句,从而把一个值类型数据转化成System.Object的实例。为了避免这么挑剔的惩罚,在使用它们来调用WriteLine之前,你自己应该把你的类型转化成字符串的实例。
Console.WriteLine("A few numbers:{0}, {1}, {2}",
25.ToString(), 32.ToString(), 50.ToString());
(译注:注意,在自己调用ToString方法时,还是会在堆上创建一个引用实例,但它的好处是不用拆箱,因为对象已经是一个引用类型了。)
这段代码使用已知的整数类型,而且值类型再也不会隐式的转化为System.Object类型。这个常见的例子展示了避免装箱的第一个规则:注意隐式的转化为System.Object,如果可以避免,值类型不应该被System.Object代替。
另一个常见情况就是,在使用.Net 1.x的集合时,你可能无意的把一个值类型转化成System.Object类型。任何时候,当你添加一个值类型数据到集合时中,你就创建了一个箱子。任何时候从集合中移出一个对象时,你得到的是箱子里的一个拷贝。从箱子里取一个对象时,你总是要创建一个拷贝。这会在应用程序中产生一些隐藏的BUG。编译器是不会帮你查找这些BUG的。这都是装箱惹的祸。让我们开始创建一个简单的结构,可以修改其中一个字段,并且把它的一些实例对象放到一个集合中:
public struct Person
{
private string _Name;
public string Name
{
get
{
return _Name;
}
set
{
_Name = value;
}
}
public override string ToString( )
{
Return _Name;
}
}
// Using the Person in a collection:
ArrayList attendees = new ArrayList( );
Person p = new Person( "Old Name" );
attendees.Add( p );
// Try to change the name:
// Would work if Person was a reference type.
Person p2 = (( Person )attendees[ 0 ] );
p2.Name = "New Name";
// Writes "Old Name":
Console.WriteLine(
attendees[ 0 ].ToString( ));
Person是一个值类型数据,在存储到ArrayList之前它被装箱。这会产生一个拷贝。而在移出的Persone对象上通过访问属性做一些修改时,另一个拷贝被创建。而你所做的修改只是针对的拷贝,而实际上还有第三个拷贝通过ToString()方法来访问attendees[0]中的对象。
正因为这以及其它一些原因,你应该创建一些恒定的值类型(参见原则7)。如果你非要在集合中使用可变的值类型,那就使用System.Array类,它是类型安全的。
如果一个数组不是一个合理的集合,以C#1.x中你可以通过使用接口来修正这个错误。尽量选择一些接口而不是公共的方法,来访问箱子的内部去修改数据:
public interface IPersonName
{
string Name
{
get; set;
}
}
struct Person : IPersonName
{
private string _Name;
public string Name
{
get
{
return _Name;
}
set
{
_Name = value;
}
}
public override string ToString( )
{
return _Name;
}
}
// Using the Person in a collection:
ArrayList attendees = new ArrayList( );
Person p = new Person( "Old Name" );
attendees.Add( p ); // box
// Try to change the name:
// Use the interface, not the type.
// No Unbox needed
(( IPersonName )attendees[ 0 ] ).Name = "New Name";
// Writes "New Name":
Console.WriteLine(
attendees[ 0 ].ToString( )); // unbox
装箱后的引用类型会实现原数据类型上所有已经实现的接口。这就是说,不用做拷贝,你可以通过调用箱子上的IPersonaName.Name方法来直接访问请求到箱子内部的值类型数据。在值类型上创建的接口可以让你访问集合里的箱子的内部,从而直接修改它的值。在值类型上实现的接口并没有让值类型成为多态的,这又会引入装箱的惩罚(参见原则20)。
在C#2.0中对泛型简介中,很多限制已经做了修改(参见原则49)。泛型接口和泛型集合会时同处理好集合与接口的困境。在那之前,我们还是要避免装箱。是的,值类型可以转化为System.Object或者其它任何的接口引用。这些转化是隐式的,使得发现它们成为繁杂的工作。这些也就是环境和语言的规则,装箱与拆箱操作会在你不经意时做一些对象的拷贝,这会产生一些BUG。同样,把值类型多样化处理会对性能有所损失。时刻注意那些把值类型转化成System.Object或者接口类型的地方:把值类型放到集合里,调用定义参数为System.Object类型的方法,或者强制转化为System.Object。能够避免就尽量避免!
===============================
Item 17: Minimize Boxing and Unboxing
Value types are containers for data. They are not polymorphic types. On the other hand, the .NET Framework was designed with a single reference type, System.Object, at the root of the entire object hierarchy. These two goals are at odds. The .NET Framework uses boxing and unboxing to bridge the gap between these two goals. Boxing places a value type in an untyped reference object to allow the value type to be used where a reference type is expected. Unboxing extracts a copy of that value type from the box. Boxing and unboxing are necessary for you to use value types where the System.Object type is expected. But boxing and unboxing are always performance-robbing operations. Sometimes, when boxing and unboxing also create temporary copies of objects, it can lead to subtle bugs in your programs. Avoid boxing and unboxing when possible.
Boxing converts a value type to a reference type. A new reference object, the box, is allocated on the heap, and a copy of the value type is stored inside that reference object. See Figure 2.3 for an illustration of how the boxed object is stored and accessed. The box contains the copy of the value type object and duplicates the interfaces implemented by the boxed value type. When you need to retrieve anything from the box, a copy of the value type gets created and returned. That's the key concept of boxing and unboxing: A copy of the object goes in the box, and another gets created whenever you access what's in the box.
Figure 2.3. Value type in a box. To convert a value type into a System.Object reference, an unnamed reference type is created. The value type is stored inline inside the unnamed reference type. All methods that access the value type are passed through the box to the stored value type.
The insidious problem with boxingand unboxing is that it happens automatically. The compiler generates the boxing and unboxing statements whenever you use a value type where a reference type, such as System.Object is expected. In addition, the boxing and unboxing operations occur when you use a value type through an interface pointer. You get no warningsboxing just happens. Even a simple statement such as this performs boxing:
Console.WriteLine("A few numbers:{0}, {1}, {2}",
25, 32, 50);
The referenced overload of Console.WriteLine takes an array of System.Object references. Ints are value types and must be boxed so that they can be passed to this overload of the WriteLine method. The only way to coerce the three integer arguments into System.Object is to box them. In addition, inside WriteLine, code reaches inside the box to call the ToString() method of the object in the box. In a sense, you have generated this construct:
int i =25;
object o = i; // box
Console.WriteLine(o.ToString());
Inside WriteLine, the following code executes:
object o;
int i = ( int )o; // unbox
string output = i.ToString( );
You would never write this code yourself. However, by letting the compiler automatically convert from a specific value type to System.Object, you did let it happen. The compiler was just trying to help you. It wants you to succeed. It happily generates the boxing and unboxing statements necessary to convert any value type into an instance of System.Object. To avoid this particular penalty, you should convertyour types tostring instances yourself before you send them to WriteLine:
Console.WriteLine("A few numbers:{0}, {1}, {2}",
25.ToString(), 32.ToString(), 50.ToString());
This code uses the known type of integer, and value types (integers) are never implicitly converted to System.Object. This common example illustrates the first rule to avoid boxing: Watch for implicit conversions to System.Object. Value types should not be substituted for System. Object if you can avoid it.
Another common case in which you might inadvertently substitute a value type for System.Object is when you place value types in .NET 1.x collections. This incarnation of the .NET Framework collections store references to System.Object instances. Anytime you add a value type to acollection, it goes in a box. Anytime you remove an object from a collection, it gets copied from the box. Taking an object out of the box always makes a copy. That introduces some subtle bugs in your application. The compiler does not help you find these bugs. It's all because of boxing. Start with a simple structure that lets you modify one of its fields, and put some of those objects in a collection:
public struct Person
{
private string _Name;
public string Name
{
get
{
return _Name;
}
set
{
_Name = value;
}
}
public override string ToString( )
{
Return _Name;
}
}
// Using the Person in a collection:
ArrayList attendees = new ArrayList( );
Person p = new Person( "Old Name" );
attendees.Add( p );
// Try to change the name:
// Would work if Person was a reference type.
Person p2 = (( Person )attendees[ 0 ] );
p2.Name = "New Name";
// Writes "Old Name":
Console.WriteLine(
attendees[ 0 ].ToString( ));
Person is a value type; it gets placed in a box before being stored in the ArrayList. That makes a copy. Then another copy gets made when you remove the Person object to access the Name property to change. All you did was change the copy. In fact, a third copy was made to call the ToString() function through the attendees[0] object.
For this and many other reasons, you should create immutable value types (see Item 7). If you must have a mutable value type in a collection, use the System.Array class, which is type safe.
If an array is not the proper collection, you can fix this error in C# 1.x by using interfaces. By coding to interfaces rather than the type's public methods, you can reach inside the box to make the change to the values:
public interface IPersonName
{
string Name
{
get; set;
}
}
struct Person : IPersonName
{
private string _Name;
public string Name
{
get
{
return _Name;
}
set
{
_Name = value;
}
}
public override string ToString( )
{
return _Name;
}
}
// Using the Person in a collection:
ArrayList attendees = new ArrayList( );
Person p = new Person( "Old Name" );
attendees.Add( p ); // box
// Try to change the name:
// Use the interface, not the type.
// No Unbox needed
(( IPersonName )attendees[ 0 ] ).Name = "New Name";
// Writes "New Name":
Console.WriteLine(
attendees[ 0 ].ToString( )); // unbox
The box reference typeimplements all the interfaces implemented by the original object. That means no copy is made, but you call the IPersonName.Name method on the box, which forwards the request to the boxed value type. Creating interfaces on your value types enables you to reach inside the box to change the value stored in the collection. Implementing an interface is not really treating a value type polymorphically, which reintroduces the boxing penalty (see Item 20).
Many of these limitations change with the introduction of generics in C# 2.0 (see Item 49). Generic interfaces and generic collections will address the both the collection and the interface situations. Until then, though, avoid boxing. Yes, value types can be converted to System.Object or any interface reference. That conversion happens implicitly, complicating the task of finding them. Those are the rules of the environment and the language. The boxing and unboxing operations make copies where you might not expect. That causes bugs. There is also a performance cost to treating value types polymorphically. Be on the lookout for any constructs that convert value types to either System.Object or interface types: placing values in collections, calling methods defined in System.Object, and casts to System.Object. Avoid these whenever you can.