Effective C# 原则41:选择DataSet而不是自定义的数据结构
Item 41: Prefer DataSets to Custom Structures
因为两个原则,把DataSet的名声搞的不好。首先就是使用XML序列化的DataSet与其它的非.Net代码进行交互时不方便。如果在Web服务的API中使用DataSet时,在与其它没有使用.Net框架的系统进行交互时会相当困难。其次,它是一个很一般的容器。你可以通过欺骗.Net框架里的一些安全类型来错误DataSet。但在现代软件系统中,DataSet还可以解决很多常规的问题。如果你明白它的优势,避免它的缺点,你就可以扩展这个类型了。
DataSet类设计出来是为了离线使用一些存储在相关数据库里的数据。你已经知道它是用来存储DataTable的,而DataTable就是一个与数据库里的结构在行和列上进行匹配的内存表。或许你已经看到过一些关于DataSet支持在内部的表中建立关系的例子。甚至还有可能,你已经见过在DataSet里验证它所包含的数据,进行数据约束的例子。
但不仅仅是这些,DataSet还支持AcceptChanges 和RejectChanges 方法来进行事务处理,而且它们可以做为DiffGrams存储,也就是包含曾经修改过的数据。多个DataSet还可以通过合并成为一个常规的存储库。DataSet还支持视图,这就是说你可以通过标准的查询来检测数据里的部份内容。而且视图是可以建立在多个表上的。
然而,有些人想开发自己的存储结构,而不用DataSet。因DataSet是一个太一般的容器,这会在性能上有所损失。一个DataSet并不是一个强类型的存储容器,其实存储在里面的对象是一个字典。而且在里的表中的列也是字典。存储在里的元素都是以System.Object的引用形式存在。这使得我们要这样写代码:
int val = ( int )MyDataSet.Tables[ "table1" ].
Rows[ 0 ][ "total" ];
以C#强类型的观点来看,这样的结构是很麻烦的。如果你错误使用table1 或者total的类型,你就会得到一个运行时错误。访问里面的数据元素要进行强制转化。而这样的麻烦事情是与你访问里面的元素的次数成正比的,与其这样,我们还真想要一个类型化的解决方法。那就让我们来试着写一个DataSet吧,基于这一点,我们想要的是:
int val = MyDataSet.table1.Rows[ 0 ].total;
当你看明白了类型化的DataSet内部的C#实现时,就会知道这是完美的。它封装了已经存在的DataSet,而且在弱类型的访问基础上添加了强类型访问。你的用户还是可以用弱类型API。但这并不是最好的。
与它同时存在的,我会告诉你我们放弃了多少东西。我会告诉你DataSet类里面的一些功能是如何实现的,也就是在我们自己创建的自定义集合中要使用的。你可能会觉得这很困难,或者你觉得我们根本用上不同DataSet的所有功能,所以,代码并不会很长。OK,很好,我会写很长的代码。
假设你要创建一个集合,用于存储地址。每一个独立的元素必须支持数据绑定,所以你我创建一个具有下面公共属性的结构:
public struct AddressRecord
{
private string _street;
public string Street
{
get { return _street; }
set { _street = value; }
}
private string _city;
public string City
{
get { return _city; }
set { _city = value; }
}
private string _state;
public string State
{
get { return _state; }
set { _state = value; }
}
private string _zip;
public string Zip
{
get { return _zip; }
set { _zip = value; }
}
}
下面,你要创建这个集合。因为我们要类型安全的集合,所以我们要从CollectionsBase派生:
public class AddressList : CollectionBase
{
}
CollectionBase 支持IList 接口,所以你可以使用它来进行数据绑定。现在,你就发现了你的第一个问题:如果地址为空,你的所有数据绑定行就失败了。而这在DataSet里是不会发生的。数据绑定是由基于反射的迟后绑定代码组成的。控件使用反射来加载列表里的第一个元素,然后使用反射来决定它的类型以及这个类型上的所有成员属性。这就是为什么DataGrid可以知道什么列要添加。它会在集合中的第一个元素上发现所有的公共属性,然后显示他们。当集合为空时,这就不能工作了。你有两种可能来解决这个问题。第一个方法有点丑,但是一个简单的方法:那就是不充许有空列表存在。第二个好一些,但要花点时间:那就是实现ITypedList 接口。ITypedList 接口提供了两个方法来描述集合中的类型。GetListName 返回一个可读的字符串来描述这个列表。GetItemProperties 则返回PropertyDescriptors 列表,这是用于描述每个属性的,它要格式化在表格里的:
public class AddressList : CollectionBase
{
public string GetListName(
PropertyDescriptor[ ] listAccessors )
{
return "AddressList";
}
public PropertyDescriptorCollection
GetItemProperties(
PropertyDescriptor[ ] listAccessors)
{
Type t = typeof( AddressRecord );
return TypeDescriptor.GetProperties( t );
}
}
这稍微好一点了,现在你你已经有一个集合可以支持简单的数据绑定了。尽管,你失去了很多功能。下一步就是要实现数据对事务的支持。如果你使用过DataSet,你的用户可以通过按Esc键来取消DataGrid中一行上所有的修改。例如,一个用户可能输入了错误的城市,按了Esc,这时就要原来的值恢复过来。DataGrid同样还支持错误提示。你可以添加一个ColumnChanged 事件来处理实际列上的验证原则。例如,州的区号必须是两个字母的缩写。使用框架里的DataSet,可以这样写代码:
ds.Tables[ "Addresses" ].ColumnChanged +=new
DataColumnChangeEventHandler( ds_ColumnChanged );
private void ds_ColumnChanged( object sender,
DataColumnChangeEventArgs e )
{
if ( e.Column.ColumnName == "State" )
{
string newVal = e.ProposedValue.ToString( );
if ( newVal.Length != 2 )
{
e.Row.SetColumnError( e.Column,
"State abbreviation must be two letters" );
e.Row.RowError = "Error on State";
}
else
{
e.Row.SetColumnError( e.Column,
"" );
e.Row.RowError = "";
}
}
}
为了在我们自己定义的集合上也实现这样的概念,我们很要做点工作。你要修改你的AddressRecord 结构来支持两个新的接口,IEditableObject 和IDataErrorInfo。IEditableObject 为你的类型提供了对事务的支持。IDataErrorInfo 提供了常规的错误处理。为了支持事务,你必须修改你的数据存储来提供你自己的回滚功能。你可能在多个列上有错误,因此你的存储必须包含一个包含了每个列的错误集合。这是一个为AddressRecord做的更新的列表:
public class AddressRecord : IEditableObject, IDataErrorInfo
{
private struct AddressRecordData
{
public string street;
public string city;
public string state;
public string zip;
}
private AddressRecordData permanentRecord;
private AddressRecordData tempRecord;
private bool _inEdit = false;
private IList _container;
private Hashtable errors = new Hashtable();
public AddressRecord( AddressList container )
{
_container = container;
}
public string Street
{
get
{
return ( _inEdit ) ? tempRecord.street :
permanentRecord.street;
}
set
{
if ( value.Length == 0 )
errors[ "Street" ] = "Street cannot be empty";
else
{
errors.Remove( "Street" );
}
if ( _inEdit )
tempRecord.street = value;
else
{
permanentRecord.street = value;
int index = _container.IndexOf( this );
_container[ index ] = this;
}
}
}
public string City
{
get
{
return ( _inEdit ) ? tempRecord.city :
permanentRecord.city;
}
set
{
if ( value.Length == 0 )
errors[ "City" ] = "City cannot be empty";
else
{
errors.Remove( "City" );
}
if ( _inEdit )
tempRecord.city = value;
else
{
permanentRecord.city = value;
int index = _container.IndexOf( this );
_container[ index ] = this;
}
}
}
public string State
{
get
{
return ( _inEdit ) ? tempRecord.state :
permanentRecord.state;
}
set
{
if ( value.Length == 0 )
errors[ "State" ] = "City cannot be empty";
else
{
errors.Remove( "State" );
}
if ( _inEdit )
tempRecord.state = value;
else
{
permanentRecord.state = value;
int index = _container.IndexOf( this );
_container[ index ] = this;
}
}
}
public string Zip
{
get
{
return ( _inEdit ) ? tempRecord.zip :
permanentRecord.zip;
}
set
{
if ( value.Length == 0 )
errors["Zip"] = "Zip cannot be empty";
else
{
errors.Remove ( "Zip" );
}
if ( _inEdit )
tempRecord.zip = value;
else
{
permanentRecord.zip = value;
int index = _container.IndexOf( this );
_container[ index ] = this;
}
}
}
public void BeginEdit( )
{
if ( ( ! _inEdit ) && ( errors.Count == 0 ) )
tempRecord = permanentRecord;
_inEdit = true;
}
public void EndEdit( )
{
// Can't end editing if there are errors:
if ( errors.Count > 0 )
return;
if ( _inEdit )
permanentRecord = tempRecord;
_inEdit = false;
}
public void CancelEdit( )
{
errors.Clear( );
_inEdit = false;
}
public string this[string columnName]
{
get
{
string val = errors[ columnName ] as string;
if ( val != null )
return val;
else
return null;
}
}
public string Error
{
get
{
if ( errors.Count > 0 )
{
System.Text.StringBuilder errString = new
System.Text.StringBuilder();
foreach ( string s in errors.Keys )
{
errString.Append( s );
errString.Append( ", " );
}
errString.Append( "Have errors" );
return errString.ToString( );
}
else
return "";
}
}
}
花了几页的代码来支持一些已经在DataSet里实现的了的功能。实际上,这还不能像DataSet那样恰当的工作。例如,交互式的添加一个新记录到集合中,以及支持事务所要求的BeginEdit, CancelEdit, 和EndEdit等。 你要在CancelEdit 调用时检测一个新的对象而不是一个已经修改了的对象。CancelEdit 必须从集合上移除这个新的对象,该对象应该是上次调用BeginEdit时创建的。对于AddressRecord 来说,还有很多修改要完成,而且一对事件还要添加到AddressList 类上。
最后,就是这个IBindingList接口。这个接口至少包含了20个方法和属性,用于控件查询列表上的功能描述。你必须为只读列表实现IBindingList 或者交互排序,或者支持搜索。在你取得内容之前就陷于层次关系和导航关系中了。我也不准备为上面所有的代码添加任何例子了。
几页过后,再问问你自己,还准备创建你自己的特殊集合吗?或者你想使用一个DataSet吗?除非你的集合是一个基于某些算法,对性能要求严格的集合,或者必须有轻便的格式,就要使用自己的DataSet,特别是类型化的DataSet。这将花去你大量的时间,是的,你可以争辩说DataSet并不是一个基于面向对象设计的最好的例子。类型化的DataSet甚至会破坏更多的规则。但,使用它所产生的代码开发效率,比起自己手写更优美的代码所花的时间,这只是其中一小部份。
============================================
Item 41: Prefer DataSets to Custom Structures
DataSets have gotten a bad reputation for two reasons. First, XML serialized DataSets do not interact well with non-.NET code. Using DataSets as part of a web service API makes it more difficult to interact with systems that don't use the .NET Framework. Second, they are a very generic container. You can misuse a DataSet by circumventing some of the .NET Framework's type safety. But the DataSet still solves a large number of common requirements for modern systems. If you understand its strengths and avoid its weaknesses, you can make extensive use of the type.
The DataSet class is designed to be an offline cache of data stored in a relational database. You already know that it stores DataTables, which store rows and columns of data that can match the layout of a database. You know that the DataSet and its members support data binding. You might even have seen examples of how the DataSet supports relations between the DataTables it contains. It's even possible that you've seen examples of constraints that validate the data being placed in a DataSet.
But there's even more than that. Datasets also support transactions through the AcceptChanges and RejectChanges methods, and they can be stored as DiffGrams that contain the history of changes to the data. Multiple Datasets can be merged to provide a common storage repository. DataSets support views, which enable you to examine portions of your data that satisfy search criteria. You can create views that cross several tables.
Yet, some of us want to develop our own storage structures rather than use the DataSet. The DataSet is a general container. Performance suffers a little to support that generality. A DataSet is not a strongly typed container. The collection of DataTables is a dictionary. The collection of columns in a table is also a dictionary. Items are stored as System.Object references. That leads us to write these kinds of constructs:
int val = ( int )MyDataSet.Tables[ "table1" ].
Rows[ 0 ][ "total" ];
To the strongly typed C# mind, this construct is troublesome. If you mistype table1 or total, you get a runtime error. An access to the data element requires a cast. If you multiply these problems by the number of times you access the elements of a DataSet, you can really want to find a strongly typed solution. So we try typed DataSets. On the surface, it's what we want:
int val = MyDataSet.table1.
Rows[ 0 ].total;
It's perfectuntil you look inside the generated C# that comprises the typed DataSet. It wraps the existing DataSet and provides strongly typed access in addition to the weakly typed access in the DataSet class. Your clients can still access the weakly typed API. That's less than optimal.
Live with it. To illustrate how much you give up, I'll show you how some of the features inside the DataSet class are implemented, in the context of creating your own custom collection. You're thinking that it can't be that hard. You're thinking that you don't need all the features of the DataSet, so it won't take that long. Okay, fine, I'll play along.
Imagine that you need to create a collection that stores addresses. An individual item must support data binding, so you create a struct with public properties:
public struct AddressRecord
{
private string _street;
public string Street
{
get { return _street; }
set { _street = value; }
}
private string _city;
public string City
{
get { return _city; }
set { _city = value; }
}
private string _state;
public string State
{
get { return _state; }
set { _state = value; }
}
private string _zip;
public string Zip
{
get { return _zip; }
set { _zip = value; }
}
}
Next, you need to create the collection. You want a type-safe collection, so you derive one from CollectionsBase:
public class AddressList : CollectionBase
{
}
CollectionBase supports the IList interface, so you can use it as a data-binding source. Now you discover your first serious problem: All your data-binding actions fail when your list of addresses is empty. That did not happen with the Dataset. Data binding consists of late-binding code built on reflection. The control uses reflection to load the first element in the list, and then uses reflection to determine its type and all the properties that are members of that type. That's how a DataGrid learns what columns to add. It finds all the public properties of the first element in the collection, and those are displayed. When the collection is empty, that won't work. You have two possible solutions to this problem. The first is the ugly but simple solution: Never allow an empty list. The second is the elegant but more time-consuming solution: Implement the ITypedList interface. ITypedList provides two methods that describe the types in the collection. GetListName returns a human-readable string that describes the list. GetItemProperties returns a list of PropertyDescriptors that describe each property that should form a column in the grid:
public class AddressList : CollectionBase
{
public string GetListName(
PropertyDescriptor[ ] listAccessors )
{
return "AddressList";
}
public PropertyDescriptorCollection
GetItemProperties(
PropertyDescriptor[ ] listAccessors)
{
Type t = typeof( AddressRecord );
return TypeDescriptor.GetProperties( t );
}
}
It's getting a little better. Now you have a collection that supports simple binding. You're missing a lot of features, though. The next requested feature is transaction support. If you had used a DataSet, your users would be able to cancel all changes to a single row in the DataGrid by pressing the Esc key. For example, a user could type the wrong city, press Esc, and have the original value restored. The DataGrid also supports error notification. You could attach a ColumnChanged event handler to perform any validation rules you need on a particular column For instance, the state code must be a two-letter abbreviation. Using the DataSet framework, that's coded like this:
ds.Tables[ "Addresses" ].ColumnChanged +=new
DataColumnChangeEventHandler( ds_ColumnChanged );
private void ds_ColumnChanged( object sender,
DataColumnChangeEventArgs e )
{
if ( e.Column.ColumnName == "State" )
{
string newVal = e.ProposedValue.ToString( );
if ( newVal.Length != 2 )
{
e.Row.SetColumnError( e.Column,
"State abbreviation must be two letters" );
e.Row.RowError = "Error on State";
}
else
{
e.Row.SetColumnError( e.Column,
"" );
e.Row.RowError = "";
}
}
}
To support both concepts on your custom collection, you have quite a bit more work ahead of you. You need to modify your AddressRecord structure to support two new interfaces, IEditableObject and IDataErrorInfo. IEditableObject provides transaction support for your object. IDataErrorInfo provides the error-handling routines. To support the transactions, you must modify your data storage to provide your own rollback capability. You might have errors on multiple columns, so your storage must also include a collection of errors for each column. Here's the updated listing for the AddressRecord:
public class AddressRecord : IEditableObject, IDataErrorInfo
{
private struct AddressRecordData
{
public string street;
public string city;
public string state;
public string zip;
}
private AddressRecordData permanentRecord;
private AddressRecordData tempRecord;
private bool _inEdit = false;
private IList _container;
private Hashtable errors = new Hashtable();
public AddressRecord( AddressList container )
{
_container = container;
}
public string Street
{
get
{
return ( _inEdit ) ? tempRecord.street :
permanentRecord.street;
}
set
{
if ( value.Length == 0 )
errors[ "Street" ] = "Street cannot be empty";
else
{
errors.Remove( "Street" );
}
if ( _inEdit )
tempRecord.street = value;
else
{
permanentRecord.street = value;
int index = _container.IndexOf( this );
_container[ index ] = this;
}
}
}
public string City
{
get
{
return ( _inEdit ) ? tempRecord.city :
permanentRecord.city;
}
set
{
if ( value.Length == 0 )
errors[ "City" ] = "City cannot be empty";
else
{
errors.Remove( "City" );
}
if ( _inEdit )
tempRecord.city = value;
else
{
permanentRecord.city = value;
int index = _container.IndexOf( this );
_container[ index ] = this;
}
}
}
public string State
{
get
{
return ( _inEdit ) ? tempRecord.state :
permanentRecord.state;
}
set
{
if ( value.Length == 0 )
errors[ "State" ] = "City cannot be empty";
else
{
errors.Remove( "State" );
}
if ( _inEdit )
tempRecord.state = value;
else
{
permanentRecord.state = value;
int index = _container.IndexOf( this );
_container[ index ] = this;
}
}
}
public string Zip
{
get
{
return ( _inEdit ) ? tempRecord.zip :
permanentRecord.zip;
}
set
{
if ( value.Length == 0 )
errors["Zip"] = "Zip cannot be empty";
else
{
errors.Remove ( "Zip" );
}
if ( _inEdit )
tempRecord.zip = value;
else
{
permanentRecord.zip = value;
int index = _container.IndexOf( this );
_container[ index ] = this;
}
}
}
public void BeginEdit( )
{
if ( ( ! _inEdit ) && ( errors.Count == 0 ) )
tempRecord = permanentRecord;
_inEdit = true;
}
public void EndEdit( )
{
// Can't end editing if there are errors:
if ( errors.Count > 0 )
return;
if ( _inEdit )
permanentRecord = tempRecord;
_inEdit = false;
}
public void CancelEdit( )
{
errors.Clear( );
_inEdit = false;
}
public string this[string columnName]
{
get
{
string val = errors[ columnName ] as string;
if ( val != null )
return val;
else
return null;
}
}
public string Error
{
get
{
if ( errors.Count > 0 )
{
System.Text.StringBuilder errString = new
System.Text.StringBuilder();
foreach ( string s in errors.Keys )
{
errString.Append( s );
errString.Append( ", " );
}
errString.Append( "Have errors" );
return errString.ToString( );
}
else
return "";
}
}
}
That's several pages of codeall to support features already implemented in the DataSet. In fact, this still doesn't have all the DataSet features working properly. Interactively adding new records to the collection and supporting transactions require some more hoops for BeginEdit, CancelEdit, and EndEdit. You need to detect when CancelEdit is called on a new object rather than a modified object. CancelEdit must remove the new object from the container if the object was created after that last BeginEdit. It requires more modification to the AddressRecord and a couple event handlers added to the AddressList class.
Finally, there's the IBindingList interface. This interface contains more than 20 methods and properties that controls query to describe the capabilities of the list. You must implement IBindingList for read-only lists or interactive sorting, or to support searching. That's before you get to anything involving navigation and hierarchies. I'm not even going to add an example of all that code.
Several pages later, ask yourself, do you still want to create your own specialized collection? Or do you want to use a DataSet? Unless your collection is part of a performance-critical set of algorithms or must have a portable format, use the DataSetespecially the typed DataSet. It will save you tremendous amounts of time. Yes, you can argue that the DataSet is not the best example of object-oriented design. Typed DataSets break even more rules. But this is one of those times when your productivity far outweighs what might be a more elegant hand-coded design.