原文:https://stackoverflow.com/questions/5232198/about-vectors-growth
翻译:joey
Answer 1:
The rate at which the capacity of a vector grows is implementation dependent. Implementations almost invariably choose exponential growth, in order to meet the amortized constant time requirement for the push_back operation. What amortized constant time means and how exponential growth achieves this is interesting.
vector的容量的增长倍数在不同实现版本的STL里是不同的。但这些实现都选择了指数级的增长,其目的是使push_back()的操作有均摊的O(1)的时间复杂度。通过capacity的指数级的增长,以达到push_back()的均摊的O(1)时间复杂度是很有趣的。
Every time a vector's capacity is grown the elements need to be copied. If you 'amortize' this cost out over the lifetime of the vector, it turns out that if you increase the capacity by an exponential factor you end up with an amortized constant cost.
每次vector的capacity增长时,它的所有元素都需要被拷贝一份。如果你把这个vector整个生存周期内的这种开销“均摊”一下,你会发现如果capacity的增长是指数级的话,你会得到一份O(1)时间复杂度的push_back()。
This probably seems a bit odd, so let me explain to you how this works...
这可能听起来有点怪,让我为你解释一下我说的是啥玩意儿…
size: 1 capacity 1 - No elements have been copied, the cost per element for copies is 0.
size: 1 capacity 1 - 不需要拷贝元素,对于每个元素的拷贝开销是0。
size: 2 capacity 2 - When the vector's capacity was increased to 2, the first element had to be copied. Average copies per element is 0.5
size: 2 capacity 2 - 当vector的capacity提升至2时,第一个元素需要被拷贝,对于每个元素的拷贝开销是1/2=0.5
size: 3 capacity 4 - When the vector's capacity was increased to 4, the first two elements had to be copied. Average copies per element is (2 + 1 + 0) / 3 = 1.
size: 3 capacity 4 - 当vector的capacity提升至4时,最前的两个元素需要被拷贝,对于每个元素的拷贝开销是(2 + 1 + 0) / 3 = 1 (译注:2是指size从2->3时,1是指size从1->2时,0是指size从0->1时,以下类推)
size: 4 capacity 4 - Average copies per element is (2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
size: 4 capacity 4 - 对于每个元素的拷贝开销是(2 + 1 + 0 + 0) / 4 = 3 / 4 = 0.75.
size: 5 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
size: 5 capacity 8 - 对于每个元素的拷贝开销是(3 + 2 + 1 + 1 + 0) / 5 = 7 / 5 = 1.4
...
size: 8 capacity 8 - Average copies per element is (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
size: 8 capacity 8 - 对于每个元素的拷贝开销是 (3 + 2 + 1 + 1 + 0 + 0 + 0 + 0) / 8 = 7 / 8 = 0.875
size: 9 capacity 16 - Average copies per element is (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
size: 9 capacity 16 - 对于每个元素的拷贝开销是 (4 + 3 + 2 + 2 + 1 + 1 + 1 + 1 + 0) / 9 = 15 / 9 = 1.67
...
size 16 capacity 16 - Average copies per element is 15 / 16 = 0.938
size: 16 capacity 16 - 对于每个元素的拷贝开销是 15 / 16 = 0.938
size 17 capacity 32 - Average copies per element is 31 / 17 = 1.82
size: 17 capacity 32 - 对于每个元素的拷贝开销是 31 / 17 = 1.82
As you can see, every time the capacity jumps, the number of copies goes up by the previous size of the array. But because the array has to double in size before the capacity jumps again, the number of copies per element always stays less than 2.
就像你所看到的那样,每当capacity增长时,拷贝的开销是随着当前的capacity越来越大的,但是因为在vector的size翻倍之前,vector的capacity必须翻倍的缘故,每个元素的拷贝开销始终小于2。(译注:妙啊)
If you increased the capacity by 1.5 * N instead of by 2 * N, you would end up with a very similar effect, except the upper bound on the copies per element would be higher (I think it would be 3).
如果vector的capacity按照1.5增长而不是2倍,你可能会得到一个类似的结果,除了每个元素的拷贝开销可能会更高(我认为可能高达3)。
I suspect an implementation would choose 1.5 over 2 both to save a bit of space, but also because 1.5 is closer to the golden ratio. I have an intuition (that is currently not backed up by any hard data) that a growth rate in line with the golden ratio (because of its relationship to the fibonacci sequence) will prove to be the most efficient growth rate for real-world loads in terms of minimizing both extra space used and time.
我猜测一个STL的实现可能会选择一个1.5~2的倍数来实现节约一些空间。但是因为1.5倍更加接近黄金分割,我有个直觉(虽然还没有任何数据支撑):黄金分割比例会是最优效率的增长率(因为它和斐波拉切数列的关系),以保证最小的时间和空间开销。