Objective-C 实现Equality and Hashing

 Implementing Equality and Hashing
by Mike Ash  

Welcome back to a late edition of Friday Q&A. WWDC pushed the schedule back one week, but it's finally time for another one. This week, I'm going to discuss the implementation of equality and hashing in Cocoa, a topic suggested by Steven Degutis.


Equality
Object equality is a fundamental concept that gets used all over the place. In Cocoa, it's implemented with the isEqual:method. Something as simple as [array indexOfObject:] will use it, so it's important that your objects support it.

对象比较是相当基本的内容,代码中随处可见。在Cocoa编程中,可以通过 isEqual: 方法进行比较。这个方法的使用就像 [array indexOfObject:] 一样简单。所以自定义类对象也应该支持该方法。


It's so important that Cocoa actually gives us a default implementation of it on NSObject. The default implementation just compares pointers. In other words, an object is only equal to itself, and is never equal to another object. The implementation is functionally identical to:

在Cocoa编程中,NSObject类已经提供了该方法的默认实现。这个默认实现的方法仅仅进行指针比较。换言之,一个对象只能和它本身相等,不可能和其他对象相等。其实现过程类似于:

    - (BOOL)isEqual: (id)other
    {
        return self == other;
    }


While oversimplified in many cases, this is actually good enough for a lot of objects. For example, an NSView is never considered equal to another NSView, only to itself. For NSView, and many other classes which behave that way, the default implementation is enough. That's good news, because it means that if your class has that same equality semantic, you don't have to do anything, and get the correct behavior for free.

尽管这个方法对于大多数情况都过于简单,但事实上,对于很多对象都是十分有用的。例如,一个 NSView 对象是不可能和其他的 NSView 对象相等的,只能和其本身相等。对于NSView,或者其他具有该特性的类对象来说,这个默认的 isEqual: 方法实现已经足够。这或许是个好消息,因为如果你的类有着与此相同的语义,那么你不需要额外的工作,就可以直接使用 isEqual: 。


Implementing Custom Equality
Sometimes you need a deeper implementation of equality. It's common for objects, typically what you might refer to as a "value object", to be distinct from another object but be logically equal to it. For example:

有时候,你需要自定义实现这个方法。通常对于大多数对象,尤其是指值对象("value object")的时候,用于区分逻辑上相同,但是不同的两个对象,例如:

    // use mutable strings because that guarantees distinct objects
    NSMutableString *s1 = [NSMutableString stringWithString: @"Hello, world"];
    NSMutableString *s2 = [NSMutableString stringWithFormat: @"%@, %@", @"Hello", @"world"];
    BOOL equal = [s1 isEqual: s2]; // gives you YES!


Of course NSMutableString implements this for you in this case. But what if you have a custom object that you want to be able to do the same thing?

当然,在这个例子中 NSMutableString 已经实现了isEqual: 。但是如果是自定义的对象呢?

    MyClass *c1 = ...;
    MyClass *c2 = ...;
    BOOL equal = [c1 isEqual: c2];


In this case you need to implement your own version of isEqual:.

在这个例子中,你需要自己实现 isEqual: 方法


Testing for equality is fairly straightforward most of the time. Gather up the relevant properties of your class, and test them all for equality. If any of them are not equal, then return NO. Otherwise, return YES.

检测对象是否相等是相当简单的。收集类对象中的相关属性,依次检测它们是否相等。如果它们中有一个不相等,就返回 NO;否则,返回 YES。


One subtle point with this is that the class of your object is an important property to test as well. It's perfectly valid to test a MyClass for equality with an NSString, but that comparison should never return YES (unless MyClass is a subclass of NSString, of course).

对于对象的比较有一个比较有趣的是,将一个自定义类 MyClass 对象和 NSString 对象进行比较是完全有效的,但是这个比较不可能返回 YES。(除非 MyClass 是 NSString 的子类)


A somewhat less subtle point is to ensure that you only test properties that are actually important to equality. Things like caches that do not influence your object's externally-visible value should not be tested.

在比较中,还有一点需要保证的是:用于检测的属性必须是相当重要的。例如:缓存属性(cache),由于不会影响到对象的外部可见值,所以没必要进行比较。


Let's say your class looks like this:

看看下面的这个例子:

    @interface MyClass : NSObject
    {
        int _length;
        char *_data;
        NSString *_name;
        NSMutableDictionary *_cache;
    }


Your equality implementation would then look like this:

isEqual:方法的实现如下:

    - (BOOL)isEqual: (id)other
    {
        return ([other isKindOfClass: [MyClass class]] &&
                [other length] == _length &&
                memcmp([other data], _data, _length) == 0 &&
                [[other name] isEqual: _name])
                // note: no comparison of _cache
    }



Hashing

Hash tables are a commonly used data structure which are used to implement, among other things, NSDictionary and NSSet. They allow fast lookups of objects no matter how many objects you put in the container.

哈希表是用来实现诸如:NSDictionary 和 NSSet的一种常用数据结构。它允许对对象进行快速查找,无论容器中有多少对象。


If you're familiar with how hash tables work, you may want to skip the next paragraph or two.

如果你熟悉哈希表是如何工作的,你可以跳过这两个段落。


A hash table is basically a big array with special indexing. Objects are placed into an array with an index that corresponds to their hash. The hash is essentially a pseudorandom number generated from the object's properties. The idea is to make the index random enough to make it unlikely for two objects to have the same hash, but have it be fully reproducible. When an object is inserted, the hash is used to determine where it goes. When an object is looked up, its hash is used to determine where to look.

哈希表是一个有着特殊索引的大数组。对象放置到数组中,其下标为与之对应的哈希值。哈希本质上是从对象的属性产生的伪随机数。这样做的目的是使索引尽可能的随机,使得不可能两个对象不可能具有相同的哈希值,但是它是完全可以重复的。当插入一个对象时,哈希值决定其位置;当查找一个对象是,哈希值确定其位置。


In more formal terms, the hash of an object is defined such that two objects have an identical hash if they are equal. Note that the reverse is not true, and can't be: two objects can have an identical hash and not be equal. You want to try to avoid this as much as possible, because when two unequal objects have the same hash (called a collision) then the hash table has to take special measures to handle this, which is slow. However, it's provably impossible to avoid it completely.

在更正式的术语中,如果两个对象具有相同的哈希值,那么它们应该是相等的。注意,反之则不正确。而且不可能两个对象具有相同的哈希值,但是两个对象不相等。你应该尽量避免这种可能出现的情况 -- 即两个不同的对象具有相同的哈希值(称之为 碰撞),出现碰撞,哈希表必须采取特殊的措施来处理这个问题。然而,这被证明是无法完全避免的。


In Cocoa, hashing is implemented with the hash method, which has this signature:

在Cocoa编程中,哈希函数通过 hash 方法实现,其方法声明为:

    - (NSUInteger)hash;


As with equality, NSObject gives you a default implementation that just uses your object's identity. Roughly speaking, it does this:

正如相等性比较的方法,NSObject 已经提供了一个默认的实现,如下:

    - (NSUInteger)hash
    {
        return (NSUInteger)self;
    }

The actual value may differ, but the essential point is that it's based on the actual pointer value of self. And just as with equality, if object identity equality is all you need, then the default implementation will do fine for you.

实际的值可能有所不同的,但重点是,它是基于自身实际的指针值 self 。并且正如相等性比较,如果一个对象标识就是你所需的,那么默认的实现就可以了。


Implementing Custom Hashing
Because of the semantics of hash, if you override isEqual: then you must override hash. If you don't, then you risk having two objects which are equal but which don't have the same hash. If you use these objects in a dictionary, set, or something else which uses a hash table, then hilarity will ensue.

因为 hash 函数的语义,所以如果你重载了 isEqual 方法,你就必须重载 hash 方法。如果你没有,就有可能出现两个对象相等,但是却有着不同的哈希值。如果你在字典或者集合等等中使用这些对象,就有可能出错。


Because the definition of the object's hash follows equality so closely, the implementation of hash likewise closely follows the implementation of isEqual:.

因为对象哈希值的定义和相等性比较关系密切,所以 hash 方法的实现和 isEqual 方法的实现有关。


An exception to this is that there's no need to include your object's class in the definition of hash. That's basically a safeguard in isEqual: to ensure the rest of the check makes sense when used with a different object. Your hash is likely to be very different from the hash of a different class simply by virtue of hashing different properties and using different math to combine them.



Generating Property Hashes
Testing properties for equality is usually straightforward, but hashing them isn't always. How you hash a property depends on what kind of object it is.

检测对象属性是否相等是简单的,但是计算哈希值通常却不是简单的。如何计算一个属性的哈希值依赖于什么样的数据。


For a numeric property, the hash can simply be the numeric value.

对于一个数值型的属性,哈希值可以是该数值。


For an object property, you can send the object the hash method, and use what it returns.

对于一个对象的属性,你可以使用这个对象的哈希方法返回的值。


For data-like properties, you'll want to use some sort of hash algorithm to generate the hash. You can use CRC32, or even something totally overkill like MD5. Another approach, somewhat less speedy but easy to use, is to wrap the data in an NSData and ask it for its hash, essentially offloading the work onto Cocoa. In the above example, you could compute the hash of _data like so:

对于数据类的属性,你需要使用某种哈希算法来生成哈希值。你可以使用 CRC32,或者MD5。另一种方法虽然低效但是方便的是,将数据封装在NSData中,调用 hash 方法即可。你可以计算 _data 的哈希值,像这样:

    [[NSData dataWithBytes: _data length: _length] hash]



Combining Property Hashes

So you know how to generate a hash for each property, but how do you put them together?

现在你知道如何生成不同属性的哈希值,但是如何将他们放到一起呢?


The easiest way is to simply add them together, or use the bitwise xor property. However, this can hurt your hash's uniqueness, because these operations are symmetric, meaning that the separation between different properties gets lost. As an example, consider an object which contains a first and last name, with the following hash implementation:

最简单的方法是将他们简单的加起来,或者使用位运算XOR异或。然而,这可能会影响哈希的唯一性,因为这些操作是对称的,这意味着不同属性之间的差异性的丢失。例如,一个对象有first name 和 last name,有如下的哈希实现:

    - (NSUInteger)hash
    {
        return [_firstName hash] ^ [_lastName hash];
    }


Now imagine you have two objects, one for "George Frederick" and one for "Frederick George". They will hash to the same value even though they're clearly not equal. And, although hash collisions can't be avoided completely, we should try to make them harder to obtain than this!

现在假如有这样两个对象,一个是 "George Frederick",另一个是 "Frederick George" 。这将会导致二者的哈希值是相同的,尽管这是两个不同的对象。虽然,哈希碰撞是不可避免的,但是我们应该设法不出现这种情况。


How to best combine hashes is a complicated subject without any single answer. However, any asymmetric way of combining the values is a good start. I like to use a bitwise rotation in addition to the xor to combine them:

如何最好的将所有的哈希值合并起来是复杂的,并且答案不唯一。然而,任何非对称的结合方式是一个好的主意。我喜欢使用移位和异或

    #define NSUINT_BIT (CHAR_BIT * sizeof(NSUInteger))
    #define NSUINTROTATE(val, howmuch) ((((NSUInteger)val) << howmuch) | (((NSUInteger)val) >> (NSUINT_BIT - howmuch)))
    
    - (NSUInteger)hash
    {
        return NSUINTROTATE([_firstName hash], NSUINT_BIT / 2) ^ [_lastName hash];
    }



Custom Hash Example

Now we can take all of the above and use it to produce a hash method for the example class. It follows the basic form of the equality method, and uses the above techniques to obtain and combine the hashes of the individual properties:

现在,我们可以使用上面提到的内容来产生一个哈希方法。如下:

    - (NSUInteger)hash
    {
        NSUInteger dataHash = [[NSData dataWithBytes: _data length: _length] hash];
        return NSUINTROTATE(dataHash, NSUINT_BIT / 2) ^ [_name hash];
    }


If you have more properties, you can add more rotation and more xor operators, and it'll work out just the same. You'll want to adjust the amount of rotation for each property to make each one different.

如果有更多的属性,你可以添加移位和异或操作来计算哈希值。你需要使用移位来调整每一个属性。


A Note on Subclassing
You have to be careful when subclassing a class which implements custom equality and hashing. In particular, your subclass should not expose any new properties which equality is dependent upon. If it does, then it must not compare equal with any instances of the superclass.

当你子类化某一类,在自定义实现 isEqual 和 hash 的时候需要注意。特别是,你的子类不应该暴露任何新的与 isEqual 方法相关的属性。


To see why, consider a subclass of the first/last name class which includes a birthday, and includes that as part of its equality computation. It can't include it when comparing equality with an instance of the superclass, though, so its equality method would look like this:

想知道为什么,假设子类化一个有 first 和 last name的类,该子类有一个 birthday 属性。其对象的比较方法 isEqual 代码如下:

    - (BOOL)isEqual: (id)other
    {
        // if the superclass doesn't like it then we're not equal
        if(![super isEqual: other])
            return NO;
        
        // if it's not an instance of the subclass, then trust the superclass
        // it's equal there, so we consider it equal here
        if(![other isKindOfClass: [MySubClass class]])
            return YES;
        
        // it's an instance of the subclass, the superclass properties are equal
        // so check the added subclass property
        return [[other birthday] isEqual: _birthday];
    }


Now you have an instance of the superclass for "John Smith", which I'll call A, and an instance of the subclass for "John Smith" with a birthday of 5/31/1982, which I'll call B. Because of the definition of equality above, A equals B, and B also equals itself, which is expected.

现在有一个超类对象 A :"John Smith";还有一个子类对象 B:"John Smith",其birthday 属性:5/31/1982。因为有如上的 isEqual 方法,所以 A 和 B相等,B和其本身也相等。


Now consider an instance of the subclass for "John Smith" with a birthday of 6/7/1994, which I'll call CC is not equal to B, which is what we expect. C is equal to A, also expected. But now there's a problem. A equals both B and C, but B and C do not equal each other! This breaks the standard transitivity of the equality operator, and leads to extremely unexpected results.

现在,有一个子类对象 C:"John Smith" ,其birthday属性:6/7/1994。C和B不相等,但C和A相等。但是现在就出现一个问题了:A和B、C相等,但是B和C却不相等!这和相等运算符的传递性不相符,导致处理不可预期的错误。


In general this should not be a big problem. If your subclass adds properties which influence object equality, that's probably an indication of a design problem in your hierarchy anyway. Rather than working around it with weird implementations of isEqual:, consider redesigning your class hierarchy.

一般来说,这个不算是什么大问题。如果子类增加了属性而影响了对象的相等性,这很有可能是在设计类继承上的问题。不用总围绕着 isEqual 的方法实现,考虑一下重新设计你的类继承。


A Note on Dictionaries
If you want to use your object as a key in an NSDictionary, you need to implement hashing and equality, but you also need to implement -copyWithZone:. Techniques for doing that are beyond the scope of today's post, but you should be aware that you need to go a little bit further in that case.

如果你想将你自定义的对象作为 NSDictionary 的key值,你就需要实现 hash 方法和 isEqual 方法,而且还需要实现 copyWithZone 方法。这方面的内容已经超出本文的介绍,你可以通过其他途径了解有关内容。


Conclusion
Cocoa provides default implementations of equality and hashing which work for many objects, but if you want your objects to be considered equal even when they're distinct objects in memory, you have to do a bit of extra work. Fortunately, it's not difficult to do, and once you implement them, your class will work seamlessly with many Cocoa collection classes.

Cocoa编程中已经提供了默认的 isEqual 方法和 hash 方法实现,在很多对象中都很有用,但是如果你想要你自定义的对象可以在内存级别进行相等性比较,你应该有一些额外的处理。幸运的是,这些都比较简单,一旦你实现了它们,可以在 Cocoa中的集合类中使用这些自定义类对象。


你可能感兴趣的:(Objective-C,hash,isEqual)