2017年度python面试题超纲20道（二）

2018你还是没有猫

相关链接：

2017年度python超纲20问（一）
2017年度python超纲20问（完）

我知道有人觉得问题就在stackoverflow上有什么好写的，但是我答案是自己写的，我将我解决的思路分享记录，自己思考一遍总比寡看一遍stackoverflow上的问题好的多你说对吧~

Speed up millions of regex replacements in Python 3

为Python3中百万次级的正则替换提速

compiled_words = [re.compile(r'\b' + word + r'\b') for word in my20000words]

import re
for sentence in sentences:
  for word in compiled_words:
    sentence = re.sub(word, "", sentence)
  # put sentence into a growing list
# This nested loop is processing about 50 sentences per second

正经回答：写成这种形式能提速"\b(word1|word2|word3)\b"，题目的写法会重复百万次的编译c代码过程。
不正经回答：换库，用FlashText，官方说法是Regex速度与量成正比，而FlashText几乎保持常数不动，看下面对比图。

执行replace操作的速度对比图

Union of 2 sets does not contain all items

两个集合做并集，为什么会丢失集合中的元素？

set1 = {1, 2, 3}
set2 = {True, False}

print(set1 | set2)
# {False, 1, 2, 3}

print(set2 | set1)
#{False, True, 2, 3}

对啊，为什么并集还丢数据，摔。等等，先开你的终端试试。

>>> 1 == True
True
>>> 0 == False
True
>>> {0, False}
{0}
>>> {False, 0}
{False}

继续摔，虽然知道了原因，但要同时用1和True怎么办？
你可以试试这样

>>> set1 = {(1, int), (2, int), (3, int)}
>>> set2 = {(True, bool), (False, bool)}
>>> set1 | set2
{(3, ), (1, ), (2, ),
 (True, ), (False, )}
>>> set1 & set2
set()

  # 或者
>>> set1 = {'1', '2', '3'}
>>> set2 = {'True', 'False'}
>>> set1 | set2
{'2', '3', 'False', 'True', '1'}
>>> set1 & set2
set()

虽然是麻烦了点，但好歹能用了。

What is the difference between i = i + 1 and i += 1 in a 'for' loop?

在for循环中，i = i + 1 和 i += 1的区别？

这个确实比较...嗯...难查。翻了很多资料，最后再python的官方文档找到差别，+与+=调用的方法不一样，其中，+=调用object.__iadd__(self, other)，+调用object.__add__(self, other)。
我把原文及网址贴出来，我怕我自己翻译过来变了味道，基本都能看懂

These methods are called to implement the augmented arithmetic assignments (+=, -=, *=, @=, /=, //=, %=, **=, <<=, >>=, &=, ^=, |=). These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If a specific method is not defined, the augmented assignment falls back to the normal methods. For instance, if x is an instance of a class with an iadd() method, x += y is equivalent to x = x.iadd(y) . Otherwise, x.add(y) and y.radd(x) are considered, as with the evaluation of x + y. In certain situations, augmented assignment can result in unexpected errors (see Why does a_tuple[i] += [‘item’] raise an exception when the addition works?), but this behavior is in fact part of the data model.
https://docs.python.org/3/reference/datamodel.html#emulating-numeric-types

Why is copying a shuffled list much slower?

为什么拷贝一个随机排序的列表慢？

这题问起来就比较难懂我先完善一下题目，大家看一下如下两种程序运行速度：

from timeit import timeit
import random

#case1 run: 5.84262761547 s
print timeit(lambda: random.shuffle(range(10**6)), number=10)

#case2 run: 1.07579151663 s
print timeit(lambda: range(10**6), number=10)

题目问的就是这两种方式速度为什么会相差那么大？摔，这题真的很难，讲道理面试官就是不想让你进这家公司吧。
查了很多资料(真的很多)，我大概是知道是什么意思的，但是可能自己写的不太到位，如果大家有更准确的表达方式请写在评论区。

影响列表复制速度的因素是什么？
- 列表复制的速度是取决于列表元素在堆中的顺序。
列表的复制又是一个什么操作？
- Python所有对象都在heap上，因此每个对象都是指针。
- 这里的列表复制是浅操作。
- Pytohn的数字也是对象，你定义的整形1实际上是对对象1的引用。而且Python使用引用计数，所以当一个对象放在一个新的容器，它的引用计数必须递增，所以pytohn不能仅仅是复制引用，而是真的需要去物理地址那里一趟(意会一下)。

shuffle操作后在物理层面对原列表进行了哪些更改？

看一段代码

import random
a = list(range(10**6, 100+10**6))
random.shuffle(a)
last = None
for item in a:
    if last is not None:
        print('diff', id(item) - id(last))
    last = item
# diff 736
# diff -64
# diff -17291008
# diff -128
# diff 288
# diff -224
# diff 17292032
# diff -1312
# diff 1088
# .
# .
# .

综上其实已经可以得出速度变慢的原因：进行shuffle操作后，它们的引用位置更差，导致缓存性能更差。而复制列表不仅是复制引用，复制操作仍然需要为了修改访问每个对象的引用计数。
你就当我在写数学证明题吧...

Why do tuples take less space in memory than lists?

为什么元组比列表消耗更少的内存？

>>> a = (1,2,3)
>>> a.__sizeof__()
48

>>> b = [1,2,3]
>>> b.__sizeof__()
64

看完上面的题就觉得这种实在是简单...显而易见的因为两种的数据结构不同，看下图来找茬。

list与tuple数据结构示意图

tuple中的元素不可修改，list可以。
list.ob_item 是指向列表对象的指针数组。
list.allocated 是申请内存的槽的个数。
因为存的多所以占用内存大嘛，了以理解。

2017年度python面试题超纲20道（二）

Speed up millions of regex replacements in Python 3

Union of 2 sets does not contain all items

What is the difference between i = i + 1 and i += 1 in a 'for' loop?

Why is copying a shuffled list much slower?

影响列表复制速度的因素是什么？

列表的复制又是一个什么操作？

shuffle操作后在物理层面对原列表进行了哪些更改？

Why do tuples take less space in memory than lists?

你可能感兴趣的:(2017年度python面试题超纲20道（二）)