IGListKit中的diff算法详解

近期，我们项目里面引入了IGListKit的第三方库，它是对collectionView的一层封装，主要用于feed流的实现，它的其中一个优势就是刷新视图的时候并不是刷新的整个collectionView，而是通过diff算法算出新老数组的差异，根据这个差异collectionView进行部分更新，这个更新的逻辑在UICollectionView+IGListBatchUpdateData.m这个分类中，函数如下：

- (void)ig_applyBatchUpdateData:(IGListBatchUpdateData *)updateData {
    [self deleteItemsAtIndexPaths:updateData.deleteIndexPaths];
    [self insertItemsAtIndexPaths:updateData.insertIndexPaths];

    for (IGListMoveIndexPath *move in updateData.moveIndexPaths) {
        [self moveItemAtIndexPath:move.from toIndexPath:move.to];
    }

    for (IGListMoveIndex *move in updateData.moveSections) {
        [self moveSection:move.from toSection:move.to];
    }

    [self deleteSections:updateData.deleteSections];
    [self insertSections:updateData.insertSections];
}

这个函数会在-performBatchUpdates:completion:的batchUpdatesBlock中被调用。可以看出，每次更新只会涉及到部分视图的插入、删除、移动，非常高效。下面分析这个diff算法是如何将这类差异算出来的。

前置工作

diff函数简化

整个diff算法相关的流程都放在IGListDiff.mm这个类里了，其核心的函数的声明如下：

static id IGListDiffing(BOOL returnIndexPaths,
                        NSInteger fromSection,
                        NSInteger toSection,
                        NSArray> *oldArray,
                        NSArray> *newArray,
                        IGListDiffOption option,
                        IGListExperiment experiments)

这个函数参数有点多，而实际上核心的两个参数是oldArray和newArray，returnIndexPaths在一般情况下传NO，可以用NO代替,而fromSection和toSection在分析算法中可以删掉（默认在同一个section上操作）option一般传IGListDiffEquality，因此可以用IGListDiffEquality代替，而experiments整个流程都没用到因此可以直接删除，经过一番代码替换/删除之后，这个函数的声明就简化成了

static id IGListDiffing(NSArray> *oldArray,
                        NSArray> *newArray)

IGListDiffing函数的算法流程

下面开始逐步剖析IGListDiffing这个函数

变量的声明

    const NSInteger newCount = newArray.count;
    const NSInteger oldCount = oldArray.count;
    
    NSMapTable *oldMap = [NSMapTable strongToStrongObjectsMapTable];
    NSMapTable *newMap = [NSMapTable strongToStrongObjectsMapTable];
    
    unordered_map, IGListEntry, IGListHashID, IGListEqualID> table;

newCount，oldCount方便后面使用，table是后面初始化的哈希表，为了方便讲解把它挪到前面来，它以diffIdentifier为键，entry为值，其查找复杂度为o(1)。而oldMap和newMap并不参与这个diff算法，它们到最后就是已数组的index为key,数组的元素为值的哈希表而已。不过因为优化算法(减少循环的次数)而把它的初始化操作写到diff算法的循环里面。把初始化操作拎出来就是

     for (NSInteger i = 0; i < oldCount; i++) {
        addIndexToMap(i, oldArray[i], oldMap);
    }
    for (NSInteger i = 0; i < newCount; i++) {
        addIndexToMap( i, newArray[i], newMap);
    }

处理特殊情况

如果newCount或oldCount为0,则可以判断为删除所有旧元素或者增加所有新元素，就不需要走diff算法了

    if (newCount == 0) {
            [oldArray enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
                addIndexToMap( idx, obj, oldMap);
            }];
            return [[IGListIndexSetResult alloc] initWithInserts:[NSIndexSet new]
                                                         deletes:[NSIndexSet indexSetWithIndexesInRange:NSMakeRange(0, oldCount)]
                                                         updates:[NSIndexSet new]
                                                           moves:[NSArray new]
                                                     oldIndexMap:oldMap
                                                     newIndexMap:newMap];
        
    }
    
    if (oldCount == 0) {
            [newArray enumerateObjectsUsingBlock:^(id obj, NSUInteger idx, BOOL *stop) {
                addIndexToMap(idx, obj, newMap);
            }];
            return [[IGListIndexSetResult alloc] initWithInserts:[NSIndexSet indexSetWithIndexesInRange:NSMakeRange(0, newCount)]
                                                         deletes:[NSIndexSet new]
                                                         updates:[NSIndexSet new]
                                                           moves:[NSArray new]
                                                     oldIndexMap:oldMap
                                                     newIndexMap:newMap];
        
    }

diff算法Step1

遍历新数组,为每个新数组的元素创建一个entry，并增加entry的newCounter

    vector newResultsArray(newCount);
    for (NSInteger i = 0; i < newCount; i++) {
        id key = IGListTableKey(newArray[i]);
        IGListEntry &entry = table[key];
        entry.newCounter++;
        
        //增加NSNotFound是为了防止oldIndexed为空，NSNotFound相当于栈底的标志位
        entry.oldIndexes.push(NSNotFound);
        
        
        newResultsArray[i].entry = &entry;
    }

需要注意的是IGListEntry &entry = table[key]这句代码返回的是entry的地址（如果没有table里没有这个key就创建）,如果数组中有相同的key的时候，newResultsArray存放的索引中的entry会指向同一个地址。

这一步过后，会建立一个用于存放IGListRecord的newResultsArray，每个IGListRecord的index仍未NSNotFound，entry为新创建的IGListEntry，其newCounter都是大于0的。

diff算法Step2

遍历旧数组，为每个旧数组的元素创建entry，并增加它们的oldCounter，将对应的索引压入oldIndexes栈中。

    vector oldResultsArray(oldCount);
    for (NSInteger i = oldCount - 1; i >= 0; i--) {
        id key = IGListTableKey(oldArray[i]);
        IGListEntry &entry = table[key];
        entry.oldCounter++;
        
        // 将i入栈
        entry.oldIndexes.push(i);
        
        oldResultsArray[i].entry = &entry;
    }

这里的循环采用倒序的方式，在多个key相同的时候，oldIndexes会有一系列的索引压栈，倒序就会确保栈顶的索引是最小的。

这一步过后，会建立一个用于存放IGListRecord的oldResultsArray，每个IGListRecord的index仍未NSNotFound，对于oldResultsArray和newResultsArray其中的entry，分三种情况：

该元素只有新数组有，则entry的newCounter>0，oldCounter=0,oldIndexes栈顶为NSNotFound
该元素只有旧数组有，则entry的newCounter=0，oldCounter>0,oldIndexes栈顶不为NSNotFound，而是元素在旧数组中的最小索引
该元素新旧数组有，则entry的newCounter>0，oldCounter>0,oldIndexes栈顶不为NSNotFound，而是元素在旧数组中的最小索引,而oldResultsArray和newResultsArray都指向同一个entry

diff算法Step3

遍历新数组，新旧数组都出现的元素，其IGListRecord的index会赋上其在新/旧数组的索引

    for (NSInteger i = 0; i < newCount; i++) {
        IGListEntry *entry = newResultsArray[i].entry;
        NSCAssert(!entry->oldIndexes.empty(), @"Old indexes is empty while iterating new item %li. Should have NSNotFound", (long)i);
        ///拿到oldIndexes的栈顶，也就是拿到改元素在oldArray的第一个索引，然后pop出来
        const NSInteger originalIndex = entry->oldIndexes.top();
        entry->oldIndexes.pop();
        
        if (originalIndex < oldCount) {
            const id n = newArray[i];
            const id o = oldArray[originalIndex];
            if (n != o && ![n isEqualToDiffableObject:o]) {
            //标记为update的条件，只有在key相同且n和o不一样且isEqualToDiffableObject不相同的时候
            //才会走进这个条件
              entry->updated = YES;
            }
        }
        //给两边的index赋上对应的索引，如果originalIndex是NSNotFound，则不会走到这个条件
        if (originalIndex != NSNotFound
            && entry->newCounter > 0
            && entry->oldCounter > 0) {
            
            newResultsArray[i].index = originalIndex;
            oldResultsArray[originalIndex].index = i;
        }
    }

PS: entry->updated = YES这个条件很难触发，而且触发了也没看出什么作用，在前面的- (void)ig_applyBatchUpdateData:(IGListBatchUpdateData *)updateData中，是没有reload这个操作的，究其原因，在前面_flushCollectionView的方法里面为了规避一个bug而将update的操作统一换成delete和insert了。

这一步主要的作用在于最后，这一步过后，如果一个元素两边的数组都存在，newResultsArray中对应的元素的index就会指向该元素在oldArray中的索引，oldResultsArray对应的元素的index就会指向该元素在newArray中的索引。这个赋值主要是用于统计移动元素的操作。

如果newArray和oldArray中又相同的元素，且出现了数次会怎么样呢？在实际的IGListKit的使用中一般会规避这种情况。如果真的发生了，分析这一步的算法不难发现：该元素在oldArray中的第i次出现的索引会跟在newArray中的第i次出现的索引相匹配，这种算法得出来的结果并不是最佳的，这个在后面讲。

diff算法Step4

接下来就是增删改查数组的生成了，为了优化算法，IGListKit把这些算法都放在两个循环里，这里为了方便理解将其拆开。

首先，定义对应的数组

    id mInserts, mMoves, mUpdates, mDeletes;

    mInserts = [NSMutableIndexSet new];
    mUpdates = [NSMutableIndexSet new];
    mDeletes = [NSMutableIndexSet new];
    mMoves = [NSMutableArray new];

delete数组的生成

    for (NSInteger i = 0; i < oldCount; i++) {
        const IGListRecord record = oldResultsArray[i];
        if (record.index == NSNotFound) {
            addIndexToCollection( mDeletes, i);
        }
    }

很好理解，通过上面的操作，如果oldResultsArray的index还是NSNotFound，则说明newArray中没有这个元素，就代表需要删除。

insert数组的生成

    for (NSInteger i = 0; i < newCount; i++) {
        const IGListRecord record = newResultsArray[i];
        if (record.index == NSNotFound) {
            addIndexToCollection(mInserts, i);
        } 
    }

这个也很好理解，通过上面的操作，如果newResultsArray的index还是NSNotFound，则说明oldArray中没有这个元素，就代表需要添加。

update数组的生成

    for (NSInteger i = 0; i < newCount; i++) {
        const IGListRecord record = newResultsArray[i];
        const NSInteger oldIndex = record.index;
        if (record.index == NSNotFound) {
        } else {
            if (record.entry->updated) {
                addIndexToCollection( mUpdates, oldIndex);
            }
        }
    }

之前已经标记过update的，就表示需要update。之所以是这个oldIndex应该是跟collectionView的badgeUpdate的规则有关，后面会将update替换成insert和delete。

moves数组生成

moves数组的核心实现如下：

        id move;
        move = [[IGListMoveIndex alloc] initWithFrom:oldIndex to:newIndex];          
        [mMoves addObject:move];

之前的算法中，oldIndex和newIndex都已经得出了，可以直接使用，但是，在一些情况里面，我们是不需要move操作的。比如：

oldArray = @[@"1",@"2",@"3"];
newArray = @[@"2",@"3"];

这个情况我们只需执行一次delete操作就可以从oldArray变到newArray了，同理，有些情况下只需要insert操作就行了，对于此，IGListKit引入了runningOffset,整体算法如下

    vector deleteOffsets(oldCount), insertOffsets(newCount);
    NSInteger runningOffset = 0;
    for (NSInteger i = 0; i < oldCount; i++) {
        deleteOffsets[i] = runningOffset;
        //如果需要删除，则runningOffset++
        if (record.index == NSNotFound) {
            runningOffset++;
        }
    }
    runningOffset = 0;
    
    for (NSInteger i = 0; i < newCount; i++) {
        insertOffsets[i] = runningOffset;
        如果需要插入，则runningOffset++
        if (record.index == NSNotFound) {
            runningOffset++;
        }
    }
    for (NSInteger i = 0; i < newCount; i++) {
        const IGListRecord record = newResultsArray[i];
        const NSInteger oldIndex = record.index;
        if (record.index == NSNotFound) {
        } else {
            //对应插入的偏移量
            const NSInteger insertOffset = insertOffsets[i];
          //对应删除的偏移量
            const NSInteger deleteOffset = deleteOffsets[oldIndex];
            if ((oldIndex - deleteOffset + insertOffset) != i) {
                id move;
                move = [[IGListMoveIndex alloc] initWithFrom:oldIndex to:i];         
                [mMoves addObject:move];
            }
        }
    }

大意就是，如果前面出现的删除，则后面元素的位置都是要往左移，如果前面出现了插入，后面元素的位置都是要往右移，oldIndex - deleteOffset + insertOffset是执行了删除，插入后元素的最新位置，如果它与i相等，则没必要move了。

函数返回

    return [[IGListIndexSetResult alloc] initWithInserts:mInserts
                                                     deletes:mDeletes
                                                     updates:mUpdates
                                                       moves:mMoves
                                                 oldIndexMap:oldMap
                                                 newIndexMap:newMap];

算完diff之后，每个数组的元素都有值了，便可以封装IGListIndexSetResult返回了。

数组含有多个相同元素的情况

前面说过，如果newArray和oldArray中又相同的元素，且出现了数次，该元素在oldArray中的第i次出现的索引会跟在newArray中的第i次出现的索引相匹配。这种匹配方式并不是最佳的，举个例子：

oldArray = @[@"2",@"3",@"1"];
newArray = @[@"1"@"2",@"1"];

肉眼看，oldArray只需delete @"3",insert @"1"到索引为0的位置就变成了newArray了，而这个diff算法则需要个操作（@"2"从0移到1，@"1"从索引2移到0，删除@"3",插入@"1"到索引2）这是因为oldArray中的索引2跟newArray中的索引0匹配了，导致了@"1"进行不必要的移动。

实际开发中，我们也很少出现这种情况，IGListKit也不鼓励这种情况出现（会作去重且assert掉）

总结

diff算法是一个非常高效的算法，如果不把关键的代码抽出来，IGListDiffing只是进行了5次for循环而已，时间复杂度和空间复杂度都是o(n)。在前面3次循环中将元素的状态都标记出来，后面两次循环计算出数组从旧到新所需的操作。IGListKit使用它进行collectionView的部分更新，也提升了app的性能。

IGListKit中的diff算法详解

前置工作

diff函数简化

相关函数/结构体/方法介绍

IGListIndexSetResult

IGListMoveIndex

IGListDiffable

IGListEntry

IGListRecord

其它工具函数