Effective C# 原则7: 选择恒定的原子值类型数据
Prefer immutable automic value type
恒定类型(immutable types)其实很简单,就是一但它们被创建,它们(的值)就是固定的。如果你验证一些准备用于创建一个对象的参数,你知道它在验证状态从前面的观点上看。你不能修改一个对象的内部状态使之成为无效的。在一个对象被创建后,你必须自己小心翼翼的保护对象,否则你不得不做错误验证来禁止改变任何状态。恒定类型天生就具有线程完全性的特点:多访问者可同时访问相同的内容。如果内部状态不能修改,那么就不能给不同的线程提供查看不一致的数据视图的机会。恒定类型可以从你的类上安全的暴露出来。调用者不能修改对象的内部状态。恒定类型可以很好的在基于哈希代码的集合上工作。以Object.GetHashCode()方法返回的值,对同一个实例是必须相同的(参见原则10),而这正是恒定类型总能成功的地方。
并不是所有的类型都能成为恒定类型的。如果它可以,你需要克隆一个对象用于修改任何程序的状态了。这就是为什么同时推荐使用恒定类型和原子类型数据了。把你的对象分解为自然的单一实体结构。一个Address类型就是的,它就是一个简单的事,由多个相关的字段组成。改变其中一个字段就很可能意味着修改了其它字段。一个客户类型不是一个原子类型,一个客户类型可能包含很多小的信息块:地址,名字,一个或者多个电话号码。任何一个互不关联的信息块都可以改变。一个客户可能会在不搬家的情况下改变电话号码。而另一个客户可能在搬了家的情况下保留原来的电话号码。还有可能,一个客户改变了他(她)的名字,而没有搬家也没有改电话号码。一个客户类型就不是原子类型;它是由多个不同的恒定的组成部份构成的:地址,名字,以及一个成对出现的电话号码集合。原子类型是单一实体:你很自然的用原子类型来取代实体内容。这一例外会改变它其中的一个组成字段。
下面就是一个典型的可变地址类的实现:
// Mutable Address structure.
public struct Address
{
private string _line1;
private string _line2;
private string _city;
private string _state;
private int _zipCode;
// Rely on the default system-generated
// constructor.
public string Line1
{
get { return _line1; }
set { _line1 = value; }
}
public string Line2
{
get { return _line2; }
set { _line2 = value; }
}
public string City
{
get { return _city; }
set { _city= value; }
}
public string State
{
get { return _state; }
set
{
ValidateState(value);
_state = value;
}
}
public int ZipCode
{
get { return _zipCode; }
set
{
ValidateZip( value );
_zipCode = value;
}
}
// other details omitted.
}
// Example usage:
Address a1 = new Address( );
a1.Line1 = "111 S. Main";
a1.City = "Anytown";
a1.State = "IL";
a1.ZipCode = 61111 ;
// Modify:
a1.City = "Ann Arbor"; // Zip, State invalid now.
a1.ZipCode = 48103; // State still invalid now.
a1.State = "MI"; // Now fine.
内部状态的改变意味着它很可能违反了对象的不变性,至少是临时的。当你改变了City这个字段后,你就使a1处于无效状态。城市的改变使得它与洲字段及以区码字段不再匹配。代码的有害性看上去还不足以致命,但这对于多线程程序来说只是一小部份。在城市变化以后,洲变化以前的任何内容转变,都会潜在的使另一个线程看到一份矛盾的数据视图。
Okay,所以你不准备去写多线程程序。你仍然处于困境当中。想象这样的问题,区代码是无效的,并且设置抛出了一个异常。你只是完成了一些你想做的事,可你却使系统处于一个无效的状态当中。为了修正这个问题,你须要在地址类里面添加一个相当大的内部验证码。这个验证码应该须要相当大的空间,并且很复杂。为了完全实现期望的安全性,当你修改多个字段时,你须要在你的代码块周围创建一个被动的数据COPY。线程安全性可能要求添加一个明确的线程同步用于检测每一个属性访问器,包括set和get。总而言之,这将是一个意义重大的行动--并且这很可能在你添加新功能时被过分的扩展。
取而代之,把address结构做为一个恒定类型。开始把所有的字段都改成只读的吧:
public struct Address
{
private readonly string _line1;
private readonly string _line2;
private readonly string _city;
private readonly string _state;
private readonly int _zipCode;
// remaining details elided
}
你还要移除所有的属性设置功能:
public struct Address
{
// ...
public string Line1
{
get { return _line1; }
}
public string Line2
{
get { return _line2; }
}
public string City
{
get { return _city; }
}
public string State
{
get { return _state; }
}
public int ZipCode
{
get { return _zipCode; }
}
}
现在,你就拥有了一个恒定类型。为了让它有效的工作,你必须添加一个构造函数来完全初始化address结构。这个address结构只须要额外的添加一个构造函数,来验证每一个字段。一个拷贝构造函数不是必须的,因为赋值运算符还算高效。记住,默认的构造函数仍然是可访问的。这是一个默认所有字符串为null,ZIP代码为0的地址结构:
public struct Address
{
private readonly string _line1;
private readonly string _line2;
private readonly string _city;
private readonly string _state;
private readonly int _zipCode;
public Address( string line1,
string line2,
string city,
string state,
int zipCode)
{
_line1 = line1;
_line2 = line2;
_city = city;
_state = state;
_zipCode = zipCode;
ValidateState( state );
ValidateZip( zipCode );
}
// etc.
}
在使用这个恒定数据类型时,要求直接用不同的调用来一顺的修改它的状态。你更宁愿创建一个新的对象而不是去修改某个实例:
// Create an address:
Address a1 = new Address( "111 S. Main",
"", "Anytown", "IL", 61111 );
// To change, re-initialize:
a1 = new Address( a1.Line1,
a1.Line2, "Ann Arbor", "MI", 48103 );
a1的值是两者之一:它的原始位置Anytown,或者是后来更新后的位置Ann Arbor。你再不用像前面的例子那样,为了修改已经存在的地址而使对象产生临时无效状态。这里只有一些在构造函数执行时才存在的临时状态,而在构造函数外是无法访问内部状态的。很快,一个新的地址对象很快就产生了,它的值就一直固定了。这正是期望的安全性:a1要么是默认的原始值,要么是新的值。如果在构造对象时发生了异常,那么a1保持原来的默认值不变。
(译注:为什么在构造时发生异常不会影响a1的值呢?因为只要构造函数没有正确返回,a1都只保持原来的值。因为是那是一个赋值语句。这也就是为什么要用构造函数来实现对象更新,而不是另外添加一个函数来更新对象,因为就算用一个函数来更新对象,也有可能更新到一半时,发生异常,也会使得对象处于不正确的状态当中。大家可以参考一下.Net里的日期时间结构,它就是一个典型的恒定常量例子。它没有提供任何的对单独年,月,日或者星期进行修改的方法。因为单独修改其中一个,可能导致整个日期处于不正确的状态:例如你把日期单独的修改为31号,但很可能那个月没有31号,而且星期也可能不同。它同样也是没提供任何方法来同时设置所以参数,读了条原则后就明白为什么了吧。参考一下DateTime结构,可以更好的理解为什么要使用恒定类型。注:有些书把immutable type译为不变类型。)
为了创建一个恒定类型,你须要确保你的用户没有任何机会来修改内部状态。值类型不支持派生类,所以你不必定义担心派生类来修改它的内部状态。但你须要注意任何在恒定类型内的可变的引用类型字段。当你为这些类型实现了构造函数后,你须要被动的把可变的引用类型COPY一遍(译注:被动COPY,defensive copy,文中应该是指为了保护数据,在数据赋值时不得不进行的一个COPY,所以被认为是“防守”拷贝,我这里译为:被动拷贝,表示拷贝不是自发的,而是不得以而为之的)。
所有这些例子,都是假设Phone是一个恒定的值类型,因为我们只涉及到值类型的恒定性:
// Almost immutable: there are holes that would
// allow state changes.
public struct PhoneList
{
private readonly Phone[] _phones;
public PhoneList( Phone[] ph )
{
_phones = ph;
}
public IEnumerator Phones
{
get
{
return _phones.GetEnumerator();
}
}
}
Phone[] phones = new Phone[10];
// initialize phones
PhoneList pl = new PhoneList( phones );
// Modify the phone list:
// also modifies the internals of the (supposedly)
// immutable object.
phones[5] = Phone.GeneratePhoneNumber( );
这个数组是一个引用类型。PhoneList内部引用的数组,引用了分配在对象外的数组存储空间上。开发人员可以通过另一个引用到这个存储空间上的对象来修改你的恒定结构。为了避免这种可能,你须要对这个数组做一个被动拷贝。前面的例子显示了可变集合的弊端。如果电话类型是一个可变的引用类型,它还会有更多危害存在的可能。客户可以修改它在集合里的值,即使这个集合是保护,不让任何人修改。这个被动的拷贝应该在每个构造函数里被实现,而不管你的恒定类型里是否存在引用对象:
// Immutable: A copy is made at construction.
public struct PhoneList
{
private readonly Phone[] _phones;
public PhoneList( Phone[] ph )
{
_phones = new Phone[ ph.Length ];
// Copies values because Phone is a value type.
ph.CopyTo( _phones, 0 );
}
public IEnumerator Phones
{
get
{
return _phones.GetEnumerator();
}
}
}
Phone[] phones = new Phone[10];
// initialize phones
PhoneList pl = new PhoneList( phones );
// Modify the phone list:
// Does not modify the copy in pl.
phones[5] = Phone.GeneratePhoneNumber( );
当你返回一个可变类型的引用时,也应该遵守这一原则。如果你添加了一个属性用于从PhoneList结构中取得整个数组的链表,这个访问器也必须实现一个被动拷贝。详情参见原则23。
这个复杂的类型表明了三个策略,这是你在初始化你的恒定对象时应该使用的。这个Address结构定义了一个构造函数,让你的客户可以初始化一个地址,定义合理的构造函数通常是最容易达到的。
你同样可以创建一个工厂方法来实现一个结构。工厂使得创建一个通用的值型数据变得更容易。.Net框架的Color类型就是遵从这一策略来初始化系统颜色的。这个静态的方法Color.FromKnownColor()和Color.FromName()从当前显示的颜色中拷贝一个给定的系统颜色,返回给用户。
第三,你可以为那些需要多步操作才能完成构造函数的恒定类型添加一个伴随类。.Net框架里的字符串类就遵从这一策略,它利用了伴随类System.Text.StringBuilter。你是使用StringBuliter类经过多步操作来创建一个字符串。在完成了所有必须步骤生成一个字符串类后,你从StringBuilter取得了一个恒定的字符串。
(译注:.net里的string是一但初始化,就不能再修改,对它的任何改动都会生成新的字符串。因此多次操作一个string会产生较多的垃圾内存碎片,你可以用StringBuliter来平衡这个问题。)
恒定类型是更简单,更容易维护的。不要盲目的为你的每一个对象的属性创建get和set访问器。你对这些类型的第一选择是把这些数存储为恒定类型,原子类型。从这些实体中,你可以可以容易的创建更多复杂的结构。
=================================
小结:翻译了几篇原则,有些句子确实很难理解,自己也感觉翻译的七不像八不像的。如果读者遇到这样的一些不清楚的句子,可以跳过去,或者看原文。感觉实在是能力有限。
而且,对于书中的内容,我也并不是完全清楚,很多东西我自己也是在学习。所以添加的一些译注也不见得就是完全正确的。例如这一原则中的DateTime结构,它是不是一个恒定类型,我不敢确定,但从我读了这一原则后,加上我对DataTime以及这一原则的理解,觉得这个DateTime结构确实就是这一原则的实例。后面的原则我大概翻阅了一下,有的深有的浅,后期的翻译也会是有些艰难的,但不管怎样,我都会尽我最大的能力,尽快翻译完所有原则。
Item 7: Prefer Immutable Atomic Value Types
Immutable types are simple: After they are created, they are constant. If you validate the parameters used to construct the object, you know that it is in a valid state from that point forward. You cannot change the object's internal state to make it invalid. You save yourself a lot of otherwise necessary error checking by disallowing any state changes after an object has been constructed. Immutable types are inherently thread safe: Multiple readers can access the same contents. If the internal state cannot change, there is no chance for different threads to see inconsistent views of the data. Immutable types can be exported from your objects safely. The caller cannot modify the internal state of your objects. Immutable types work better in hash-based collections. The value returned by Object.GetHashCode() must be an instance invariant (see Item 10); that's always true for immutable types.
Not every type can be immutable. If it were, you would need to clone objects to modify any program state. That's why this recommendation is for both atomic and immutable value types. Decompose your types to the structures that naturally form a single entity. An Address type does. An address is a single thing, composed of multiple related fields. A change in one field likely means changes to other fields. A customer type is not an atomic type. A customer type will likely contain many pieces of information: an address, a name, and one or more phone numbers. Any of these independent pieces of information might change. A customer might change phone numbers without moving. A customer might move, yet still keep the same phone number. A customer might change his or her name without moving or changing phone numbers. A customer object is not atomic; it is built from many different immutable types using composition: an address, a name, or a collection of phone number/type pairs. Atomic types are single entities: You would naturally replace the entire contents of an atomic type. The exception would be to change one of its component fields.
Here is a typical implementation of an address that is mutable:
// Mutable Address structure.
public struct Address
{
private string _line1;
private string _line2;
private string _city;
private string _state;
private int _zipCode;
// Rely on the default system-generated
// constructor.
public string Line1
{
get { return _line1; }
set { _line1 = value; }
}
public string Line2
{
get { return _line2; }
set { _line2 = value; }
}
public string City
{
get { return _city; }
set { _city= value; }
}
public string State
{
get { return _state; }
set
{
ValidateState(value);
_state = value;
}
}
public int ZipCode
{
get { return _zipCode; }
set
{
ValidateZip( value );
_zipCode = value;
}
}
// other details omitted.
}
// Example usage:
Address a1 = new Address( );
a1.Line1 = "111 S. Main";
a1.City = "Anytown";
a1.State = "IL";
a1.ZipCode = 61111 ;
// Modify:
a1.City = "Ann Arbor"; // Zip, State invalid now.
a1.ZipCode = 48103; // State still invalid now.
a1.State = "MI"; // Now fine.
Internal state changes means that it's possible to violate object invariants, at least temporarily. After you have replaced the City field, you have placed a1 in an invalid state. The city has changed and no longer matches the state or ZIP code fields. The code looks harmless enough, but suppose that this fragment is part of a multithreaded program. Any context switch after the city changes and before the state changes would leave the potential for another thread to see an inconsistent view of the data.
Okay, so you're not writing a multithreaded program. You can still get into trouble. Imagine that the ZIP code was invalid and the set threw an exception. You've made only some of the changes you intended, and you've left the system in an invalid state. To fix this problem, you would need to add considerable internal validation code to the address structure. That validation code would add considerable size and complexity. To fully implement exception safety, you would need to create defensive copies around any code block in which you change more than one field. Thread safety would require adding significant thread-synchronization checks on each property accessor, both sets and gets. All in all, it would be a significant undertakingand one that would likely be extended over time as you add new features.
Instead, make the Address structure an immutable type.Start by changing all instance fields to read-only:
public struct Address
{
private readonly string _line1;
private readonly string _line2;
private readonly string _city;
private readonly string _state;
private readonly int _zipCode;
// remaining details elided
}
You'll also want to remove all set accessors to each property:
public struct Address
{
// ...
public string Line1
{
get { return _line1; }
}
public string Line2
{
get { return _line2; }
}
public string City
{
get { return _city; }
}
public string State
{
get { return _state; }
}
public int ZipCode
{
get { return _zipCode; }
}
}
Now you have an immutable type. To make it useful, you need to add all necessary constructors to initialize the Address structure completely. The Address structure needs only one additional constructor, specifying each field. A copy constructor is not needed because the assignment operator is just as efficient. Remember that the default constructor is still accessible. There is a default address where all the strings are null, and the ZIP code is 0:
public struct Address
{
private readonly string _line1;
private readonly string _line2;
private readonly string _city;
private readonly string _state;
private readonly int _zipCode;
public Address( string line1,
string line2,
string city,
string state,
int zipCode)
{
_line1 = line1;
_line2 = line2;
_city = city;
_state = state;
_zipCode = zipCode;
ValidateState( state );
ValidateZip( zipCode );
}
// etc.
}
Using the immutable type requires a slightly different calling sequence to modify its state. You create a new object rather than modify the existing instance:
// Create an address:
Address a1 = new Address( "111 S. Main",
"", "Anytown", "IL", 61111 );
// To change, re-initialize:
a1 = new Address( a1.Line1,
a1.Line2, "Ann Arbor", "MI", 48103 );
The value of a1 is in one of two states: its original location in Anytown, or its updated location in Ann Arbor. You do not modify the existing address to create any of the invalid temporary states from the previous example. Those interim states exist only during the execution of the Address constructor and are not visible outside of that constructor. As soon as a new Address object is constructed, its value is fixed for all time. It's exception safe: a1 has either its original value or its new value. If an exception is thrown during the construction of the new Address object, the original value of a1 is unchanged.
To create an immutable type, you need to ensure that there are no holes that would allow clients to change your internal state. Value types do not support derived types, so you do not need to defend against derived types modifying fields. But you do need to watch for any fields in an immutable type that are mutable reference types. When you implement your constructors for these types, you need to make a defensive copy of that mutable type. All these examples assume that Phone is an immutable value type because we're concerned only with immutability in value types:
// Almost immutable: there are holes that would
// allow state changes.
public struct PhoneList
{
private readonly Phone[] _phones;
public PhoneList( Phone[] ph )
{
_phones = ph;
}
public IEnumerator Phones
{
get
{
return _phones.GetEnumerator();
}
}
}
Phone[] phones = new Phone[10];
// initialize phones
PhoneList pl = new PhoneList( phones );
// Modify the phone list:
// also modifies the internals of the (supposedly)
// immutable object.
phones[5] = Phone.GeneratePhoneNumber( );
The array class is a reference type. The array referenced inside the PhoneList structure refers to the same array storage (phones) allocated outside of the object. Developers can modify your immutable structure through another variable that refers to the same storage. To remove this possibility, you need to make a defensive copy of the array. The previous example shows the pitfalls of a mutable collection. Even more possibilities for mischief exist if the Phone type is a mutable reference type. Clients could modify the values in the collection, even if the collection is protected against any modification. This defensive copy should be made in all constructors whenever your immutable type contains a mutable reference type:
// Immutable: A copy is made at construction.
public struct PhoneList
{
private readonly Phone[] _phones;
public PhoneList( Phone[] ph )
{
_phones = new Phone[ ph.Length ];
// Copies values because Phone is a value type.
ph.CopyTo( _phones, 0 );
}
public IEnumerator Phones
{
get
{
return _phones.GetEnumerator();
}
}
}
Phone[] phones = new Phone[10];
// initialize phones
PhoneList pl = new PhoneList( phones );
// Modify the phone list:
// Does not modify the copy in pl.
phones[5] = Phone.GeneratePhoneNumber( );
You need to follow the same rules when you return a mutable reference type. If you add a property to retrieve the entire array from the PhoneList struct, that accessor would also need to create a defensive copy. See Item 23 for more details.
The complexity of a type dictates which of three strategies you will use to initialize your immutable type. The Address structure defined one constructor to allow clients to initialize an address. Defining the reasonable set of constructors is often the simplest approach.
You can also create factory methods to initialize the structure. Factories make it easier to create common values. The .NET Framework Color type follows this strategy to initialize system colors. The static methods Color.FromKnownColor() and Color.FromName() return a copy of a color value that represents the current value for a given system color.
Third, you can create a mutable companion class for those instances in which multistep operations are necessary to fully construct an immutable type. The .NET string class follows this strategy with the System.Text.StringBuilder class. You use the StringBuilder class to create a string using multiple operations. After performing all the operations necessary to build the string, you retrieve the immutable string from the StringBuilder.
Immutable types are simpler to code and easier to maintain. Don't blindly create get and set accessors for every property in your type. Your first choice for types that store data should be immutable, atomic value types. You easily can build more complicated structures from these entities.