Swift - Dictionary

[TOC]

前言

Dictionary是一种无序的集合，它存储的是键值对之间的关系，其所有键的值需要是相同的类型，所有值的类型也需要相同。每个值（value）都关联唯一的键（key），键作为字典中这个值数据的标识符。和数组中的数据项不同，字典中的数据项并没有什么具体顺序。你在需要通过标识符（键）访问数据的时候使用字典，这种方法很大程度上和现实世界中使用字典查字义的方法一样。

本问不介绍字典的用法，如果你需要了解更多关于字典的用法请查看Swift.gg 字典

注意
Swift 的 Dictionary类型被桥接到Foundation的NSDictionary类

更多关于在Foundation和Cocoa中使用Dictionary类型的信息，参见Bridging Between Dictionary and NSDictionary

哈希表

首先我们先了解一下哈希表：

哈希表，也叫Hash Table或者散列表，是根据关键字（Key value）而直接访问在内存存储位置的数据结构。也就是说，它通过计算一个关于键值的函数，将所需查询的数据映射到表中一个位置来访问记录，这加快了查找速度。这个映射函数称做散列函数，存放记录的数组称做散列表。

哈希函数（散列函数）

直接寻址法
数字分析法
平方取中法
折叠法
随机数法
除留余数法

哈希冲突

开放定址法
拉链法

负载因子

填入表中的元素个数 / 散列表的长度

1. Dictionary 的内存结构

首先我们初始化一个字典：

var dict = ["key1" : "value1", "key2" : "value2", "key3" : "value3"]

-w762

1.1 dictionaryLiteral

查看一下sil代码可以发现是调用的Dictionary.init(dictionaryLiteral:)方法，其实不看sil也能知道是调用了Literal方法，因为这是一个通过字面量初始化的字典。下面我们来到Swift源码中的Dictionary.swift文件中来查找一下dictionaryLiteral方法。

-w583

首先我们可以看到Dictionary是一个结构体。这里的Key需要遵循Hashable协议，也就是Key必须是可哈希的。

-w651

这个字面量的方法是遵循ExpressibleByDictionaryLiteral协议的。方法内部流程如下：

首先创建一个_NativeDictionary类型的实例
然后循环向里面插入数据，如果存在重复的key就会报错
最后调用Dictionary的init方法进行初始化

1.2 _NativeDictionary

下面我们就来看看_NativeDictionary是什么？在NativeDictionary.swift文件中。

-w641

我们可以看到_NativeDictionary是对__RawDictionaryStorage的包装，用于实现字典的大部分功能。

-w664

再看一下init(capacity:)方法，这里区分了字典在初始化的时候是空的还是不空的。所以我们主要看不空的情况。

1.3 _DictionaryStorage

下面我们看看_DictionaryStorage：

-w603

可以看到_DictionaryStorage是一个类，继承自__RawDictionaryStorage 和_NSDictionaryCore。

1.4 __RawDictionaryStorage

下面我们看看__RawDictionaryStorage，在DictionaryStorage.swift文件中：

-w789

我们可以看到__RawDictionaryStorage是一个类，继承自__SwiftNativeNSDictionary，定义了如下属性：

属性名称	类型	作用
_count	Int	记录count
_capacity	Int	记录容量
_scale	Int8	字典的规模，为2的n次方，参与计算buckets的
_reservedScale	Int8	对应到目前为止最高的 reserveccapacity(_:) 调用，如果没有则为0。这可以在以后使用，以允许删除来调整存储的大小。
_extra	Int16	当前未使用，设置为0
_age	Int32	突变计数，支持更严格的索引验证
_seed	Int	用于对该字典实例中的元素进行哈希的哈希种子。哈希加密需要用到一个随机数，就是这个开始的随机数
_rawKeys	UnsafeMutableRawPointer	记录所有key的指针，指向一个数组
_rawValues	UnsafeMutableRawPointer	记录所有Value的指针，指向一个数组

1.5 __SwiftNativeNSDictionary

-w535

在Runtime.swift中我们可以找到__SwiftNativeNSDictionary的定义，如上图。

1.6 _NSDictionaryCore

在ShadowProtocols.swift文件中我们可以找到_NSDictionaryCore的定义：

-w618

_NSDictionaryCore是一个协议，这也是我们上面提到的，Swift 的 Dictionary类型被桥接到Foundation的NSDictionary类，这就是与NSDictionary桥接的接口。

/// A shadow for the "core operations" of NSDictionary.
///
/// Covers a set of operations everyone needs to implement in order to
/// be a useful `NSDictionary` subclass.
@objc
internal protocol _NSDictionaryCore: _NSCopying, _NSFastEnumeration {
  // The following methods should be overridden when implementing an
  // NSDictionary subclass.

  // The designated initializer of `NSDictionary`.
  init(
    objects: UnsafePointer,
    forKeys: UnsafeRawPointer, count: Int)

  var count: Int { get }

  @objc(objectForKey:)
  func object(forKey aKey: AnyObject) -> AnyObject?

  func keyEnumerator() -> _NSEnumerator

  // We also override the following methods for efficiency.

  @objc(copyWithZone:)
  override func copy(with zone: _SwiftNSZone?) -> AnyObject

  @objc(getObjects:andKeys:count:)
  func getObjects(
    _ objects: UnsafeMutablePointer?,
    andKeys keys: UnsafeMutablePointer?,
    count: Int
  )

  @objc(countByEnumeratingWithState:objects:count:)
  override func countByEnumerating(
    with state: UnsafeMutablePointer<_SwiftNSFastEnumerationState>,
    objects: UnsafeMutablePointer?, count: Int
  ) -> Int
}

1.7 Dictionary init

下面我们回到Dictionary的init方法。

在字面量初始化的时候是这么调用的self.init(_native: native)

-w576

1.8 _Variant

-w639

在Dictionary.swift文件中经过一番查找，可以看到_Variant是一个具有关联值的枚举类型。

1.9 Dictionary内存结构总结

经过上面的分析我们可以得到如下的结论，在纯Swift 的字典中其内存结构如下：

Dictionary----->包含关联值枚举属性_variant初始化的关联值是_NativeDictionary
_NativeDictionary是一个结构体包含属性 _storage，类型是__RawDictionaryStorage
__RawDictionaryStorage是一个类型，初始化_storage的时候使用的是子类_DictionaryStorage

所以我们可以得到Dictionary的内存结构如下：

-w742

1.10 _DictionaryStorage.allocate(capacity:)

根据上面的总结的内存结构我们可以知道，这里面重要的额就是_DictionaryStorage，它在初始化的时候调用的是allocate(capacity:)方法，下面我们看看这个方法都做了什么在DictionaryStorage.swift文件中可以看到如下代码：

  @usableFromInline
  @_effects(releasenone)
  static internal func allocate(capacity: Int) -> _DictionaryStorage {
    let scale = _HashTable.scale(forCapacity: capacity)
    return allocate(scale: scale, age: nil, seed: nil)
  }

可以看到这里面有一个_HashTable，下面我们看看这个_HashTable是什么。

1.11 _HashTable

在HashTable.swift文件中可以找到_HashTable的定义。

-w571

在_HashTable中可以发现两个属性：

words这是一个二进制位，用于标记当前位置是否存储了元素
掩码，bucketCount - 1 也就是2^n - 1，n的值来自scale

下面我们在看看_HashTable.scale(forCapacity:）方法：

  internal static func scale(forCapacity capacity: Int) -> Int8 {
    let capacity = Swift.max(capacity, 1)
    // Calculate the minimum number of entries we need to allocate to satisfy
    // the maximum load factor. `capacity + 1` below ensures that we always
    // leave at least one hole.
    let minimumEntries = Swift.max(
      Int((Double(capacity) / maxLoadFactor).rounded(.up)),
      capacity + 1)
    // The actual number of entries we need to allocate is the lowest power of
    // two greater than or equal to the minimum entry count. Calculate its
    // exponent.
    let exponent = (Swift.max(minimumEntries, 2) - 1)._binaryLogarithm() + 1
    _internalInvariant(exponent >= 0 && exponent < Int.bitWidth)
    // The scale is the exponent corresponding to the bucket count.
    let scale = Int8(truncatingIfNeeded: exponent)
    _internalInvariant(self.capacity(forScale: scale) >= capacity)
    return scale
  }

这里就是计算scale的，通过传入的capacity，这里是这样的，scale指数的幂，为了方便通过哈希计算出元素的位置，这里面看过看过Objective-C或者Swift底层源码的同学都知道，苹果在底层经常会用到一个叫做mask的值，也就是掩码。在objc_msgSend查找缓存的时候，计算index通过sel & mask= index。下面我们就要知道这个mask的值是如何取得的。

通常情况下mask的取值范围是2^n - 1。

所以有这么一个表达式：x % y = x & (y - 1)，其中y的取值是2^n，一个数对2^n取模相当于一个数和2^n - 1做按位与运算。

举个例子:
3 % 4 = 3 & 3
5 % 4 = 5 & 3

为了方便计算位置，我们可以通过与运算的方式来计算index。那么我们就要在初始化的时候，通过计算得到一个稍大的与想要的大小最接近的2^n的容量。

举个例子，如果要存储3个元素，就要开辟4个空间，2^scale = 4，所以scale = 2。所以scale(forCapacity:)方法的作用就是计算这个scale的。当然，看注释还有很多细节，这里就不过多介绍了。

1.12 _DictionaryStorage.allocate(scale:, age:, seed:)

知道如何计算scale后，我们回到_DictionaryStorage中看看它的_DictionaryStorage.allocate(scale:, age:, seed:)

static internal func allocate(
    scale: Int8,
    age: Int32?,
    seed: Int?
  ) -> _DictionaryStorage {
    // The entry count must be representable by an Int value; hence the scale's
    // peculiar upper bound.
    _internalInvariant(scale >= 0 && scale < Int.bitWidth - 1)

    let bucketCount = (1 as Int) &<< scale
    let wordCount = _UnsafeBitset.wordCount(forCapacity: bucketCount)
    let storage = Builtin.allocWithTailElems_3(
      _DictionaryStorage.self,
      wordCount._builtinWordValue, _HashTable.Word.self,
      bucketCount._builtinWordValue, Key.self,
      bucketCount._builtinWordValue, Value.self)

    let metadataAddr = Builtin.projectTailElems(storage, _HashTable.Word.self)
    let keysAddr = Builtin.getTailAddr_Word(
      metadataAddr, wordCount._builtinWordValue, _HashTable.Word.self,
      Key.self)
    let valuesAddr = Builtin.getTailAddr_Word(
      keysAddr, bucketCount._builtinWordValue, Key.self,
      Value.self)
    storage._count = 0
    storage._capacity = _HashTable.capacity(forScale: scale)
    storage._scale = scale
    storage._reservedScale = 0
    storage._extra = 0

    if let age = age {
      storage._age = age
    } else {
      // The default mutation count is simply a scrambled version of the storage
      // address.
      storage._age = Int32(
        truncatingIfNeeded: ObjectIdentifier(storage).hashValue)
    }

    storage._seed = seed ?? _HashTable.hashSeed(for: storage, scale: scale)
    storage._rawKeys = UnsafeMutableRawPointer(keysAddr)
    storage._rawValues = UnsafeMutableRawPointer(valuesAddr)

    // Initialize hash table metadata.
    storage._hashTable.clear()
    return storage
  }

-w688

我们可以看到这里初始化了_DictionaryStorage对应的属性。

1.13 lldb 验证内存结构

编写一段简单的代码：

var dict = ["1" : "a", "2" : "b", "3" : "c", "4" : "d"]

-w899

看图吧。

这里的capacity为什么是6呢？看了源码就知道了：

storage._capacity = _HashTable.capacity(forScale: scale)

extension _HashTable {
  /// The inverse of the maximum hash table load factor.
  private static var maxLoadFactor: Double {
    @inline(__always) get { return 3 / 4 }
  }

  internal static func capacity(forScale scale: Int8) -> Int {
    let bucketCount = (1 as Int) &<< scale
    return Int(Double(bucketCount) * maxLoadFactor)
  }
}

这里面取3/4，所以就是8 * 3/4 = 6。8是2^{3，因为初始化的4个键值对，为了保证一定有空间，比4大的最小的2}n是8，所以scale为3。

2. get & set

这里我们通过下标来入手，首先找到subscript方法：

  @inlinable
  public subscript(key: Key) -> Value? {
    get {
      return _variant.lookup(key)
    }
    set(newValue) {
      if let x = newValue {
        _variant.setValue(x, forKey: key)
      } else {
        removeValue(forKey: key)
      }
    }
    _modify {
      defer { _fixLifetime(self) }
      yield &_variant[key]
    }
  }
}

2.1 get

2.1.1 lookup

在subscript方法中我们可以看到get中会调用一个lookup方法，_variant是个关联值枚举，关联值类型是_NativeDictionary，所以在_NativeDictionary找到lookup如下：

  @inlinable
  @inline(__always)
  func lookup(_ key: Key) -> Value? {
    if count == 0 {
      // Fast path that avoids computing the hash of the key.
      return nil
    }
    let (bucket, found) = self.find(key)
    guard found else { return nil }
    return self.uncheckedValue(at: bucket)
  }

这里面主要是调用find方法。
如果没找到就返回nil
找到了就调用uncheckedValue去根据下标查找，然后返回

  @inlinable
  @inline(__always)
  internal func uncheckedValue(at bucket: Bucket) -> Value {
    defer { _fixLifetime(self) }
    _internalInvariant(hashTable.isOccupied(bucket))
    return _values[bucket.offset]
  }

2.1.2 _NativeDictionary find

find方法代码如下：

  @inlinable
  @inline(__always)
  internal func find(_ key: Key) -> (bucket: Bucket, found: Bool) {
    return _storage.find(key)
  }

这里面调用的是_storage.find

2.1.2 __RawDictionaryStorage.find

代码如下：

internal final func find(_ key: Key) -> (bucket: _HashTable.Bucket, found: Bool) {
    return find(key, hashValue: key._rawHashValue(seed: _seed))
  }

里面又调用了另一个find方法，代码如下：

  @_alwaysEmitIntoClient
  @inline(never)
  internal final func find(_ key: Key, hashValue: Int) -> (bucket: _HashTable.Bucket, found: Bool) {
      let hashTable = _hashTable
      var bucket = hashTable.idealBucket(forHashValue: hashValue)
      while hashTable._isOccupied(bucket) {
        if uncheckedKey(at: bucket) == key {
          return (bucket, true)
        }
        bucket = hashTable.bucket(wrappedAfter: bucket)
      }
      return (bucket, false)
  }

_hashTable为一个计算属性，代码如下：

  // The _HashTable struct contains pointers into tail-allocated storage, so
  // this is unsafe and needs `_fixLifetime` calls in the caller.
  @inlinable
  @nonobjc
  internal final var _hashTable: _HashTable {
    @inline(__always) get {
      return _HashTable(words: _metadata, bucketCount: _bucketCount)
}

这里就是初始化了一个_HashTable

idealBucket方法的代码如下：

@inlinable
  @inline(__always)
  internal func idealBucket(forHashValue hashValue: Int) -> Bucket {
    return Bucket(offset: hashValue & bucketMask)
  }

idealBucket是返回了一个Bucket，将hashValue & bucketMask得到对应的index

_isOccupied源码如下：

@inlinable
  @inline(__always)
  internal func _isOccupied(_ bucket: Bucket) -> Bool {
    _internalInvariant(isValid(bucket))
    return words[bucket.word].uncheckedContains(bucket.bit)
  }

_isOccupied返回当前二进制位标记的是否有元素。

所以通过上述一系列操作，最终会判断是否找到当前的key，如果找到就返回，找不到就下一个，直到没有下一个返回一个元组，内容为当前bucket和是否找到。

接下来就需要回到lookup里面分析了，在上面有提到，回去再看看就行。这里面就使用了开放寻址法。

2.2 set

下面我们在看看set，如果有值就调用_variant.setValue，没值就removeValue。

我们先看看有值的情况：

2.2.1 setValue

在_NativeDictionary找到setValue如下：

  @inlinable
  internal mutating func setValue(
    _ value: __owned Value,
    forKey key: Key,
    isUnique: Bool
  ) {
    let (bucket, found) = mutatingFind(key, isUnique: isUnique)
    if found {
      (_values + bucket.offset).pointee = value
    } else {
      _insert(at: bucket, key: key, value: value)
    }
  }

其实也很简单，还是先去查找，如果找到了就覆盖，没找到就插入。

2.2.2 mutatingFind

代码如下：

  @inlinable
  internal mutating func mutatingFind(
    _ key: Key,
    isUnique: Bool
  ) -> (bucket: Bucket, found: Bool) {
    let (bucket, found) = find(key)

    // Prepare storage.
    // If `key` isn't in the dictionary yet, assume that this access will end
    // up inserting it. (If we guess wrong, we might needlessly expand
    // storage; that's fine.) Otherwise this can only be a removal or an
    // in-place mutation.
    let rehashed = ensureUnique(
      isUnique: isUnique,
      capacity: count + (found ? 0 : 1))
    guard rehashed else { return (bucket, found) }
    let (b, f) = find(key)
    if f != found {
      KEY_TYPE_OF_DICTIONARY_VIOLATES_HASHABLE_REQUIREMENTS(Key.self)
    }
    return (b, found)
  }

这里面还是调用find(key)去查找，详细分析看上面的get，如果没找到则说明要插入，这里我们会尝试去开辟空间（扩容），最后还是返回一个元组。

2.2.3 _insert

如果没找到就是插入，代码如下：

  @inlinable
  internal func _insert(
    at bucket: Bucket,
    key: __owned Key,
    value: __owned Value) {
    _internalInvariant(count < capacity)
    hashTable.insert(bucket)
    uncheckedInitialize(at: bucket, toKey: key, value: value)
    _storage._count += 1
  }

插入就简单了：

首先判断容量够不够，不够应该扩容，这个方法没仔细找
然后插入数据
调用uncheckedInitialize
count + 1

2.2.4 removeValue

下面我们再来看看removeValue，代码如下：

  @inlinable
  @discardableResult
  public mutating func removeValue(forKey key: Key) -> Value? {
    return _variant.removeValue(forKey: key)
  }

2.2.5 _variant.removeValue

这个是通过断点找到的，在DictionaryVariant.swift文件中：

extension Dictionary._Variant {
  @inlinable
  internal mutating func removeValue(forKey key: Key) -> Value? {
#if _runtime(_ObjC)
    guard isNative else {
      let cocoaKey = _bridgeAnythingToObjectiveC(key)
      let cocoa = asCocoa
      guard cocoa.lookup(cocoaKey) != nil else { return nil }
      var native = _NativeDictionary(cocoa)
      let (bucket, found) = native.find(key)
      _precondition(found, "Bridging did not preserve equality")
      let old = native.uncheckedRemove(at: bucket, isUnique: true).value
      self = .init(native: native)
      return old
    }
#endif
    let (bucket, found) = asNative.find(key)
    guard found else { return nil }
    let isUnique = isUniquelyReferenced()
    return asNative.uncheckedRemove(at: bucket, isUnique: isUnique).value
  }
}

是通过扩展Dictionary._Variant的一个方法。

首先是判断了是不是与objc交互
如果不是则通过find查找，这里跟上面也是一样的
如果没找到就返回nil
找到了则调用uncheckedRemove清空

2.2.6 uncheckedRemove

这个就是_NativeDictionary中的方法了：

  @inlinable
  @_semantics("optimize.sil.specialize.generic.size.never")
  internal mutating func uncheckedRemove(
    at bucket: Bucket,
    isUnique: Bool
  ) -> Element {
    _internalInvariant(hashTable.isOccupied(bucket))
    let rehashed = ensureUnique(isUnique: isUnique, capacity: capacity)
    _internalInvariant(!rehashed)
    let oldKey = (_keys + bucket.offset).move()
    let oldValue = (_values + bucket.offset).move()
    _delete(at: bucket)
    return (oldKey, oldValue)
  }

关于里面的调用就不具体分析了，感兴趣的去源码中再仔细看看吧

Swift - Dictionary

Swift - Dictionary

前言

哈希表

哈希函数（散列函数）

哈希冲突

负载因子

1. Dictionary 的内存结构

1.1 dictionaryLiteral

1.2 _NativeDictionary

1.3 _DictionaryStorage

1.4 __RawDictionaryStorage

1.5 __SwiftNativeNSDictionary

1.6 _NSDictionaryCore

1.7 Dictionary init

1.8 _Variant

1.9 Dictionary内存结构总结

1.10 _DictionaryStorage.allocate(capacity:)

1.11 _HashTable

1.12 _DictionaryStorage.allocate(scale:, age:, seed:)

1.13 lldb 验证内存结构

2. get & set

2.1 get

2.1.1 lookup

2.1.2 _NativeDictionary find

2.1.2 __RawDictionaryStorage.find

2.2 set

2.2.1 setValue

2.2.2 mutatingFind

2.2.3 _insert

2.2.4 removeValue

2.2.5 _variant.removeValue

2.2.6 uncheckedRemove

你可能感兴趣的:(Swift - Dictionary)