本文主要总结项目开发中和面试中的Python高级知识点,是进阶Python高级工程师必备要点。
主要内容:
No.1 一切皆对象
-
Python中函数和类可以赋值给一个变量
-
Python中函数和类可以存放到集合对象中
-
Python中函数和类可以作为一个函数的参数传递给函数
-
Python中函数和类可以作为返回值
Step.1
1 # 首先创建一个函数和一个Python3.x的新式类 2 class Demo(object): 3 def __init__(self): 4 print("Demo Class")
1 # 定义一个函数 2 def function(): 3 print("function")
1 # 在Python无论是函数,还是类,都是对象,他们可以赋值给一个变量 2 class_value = Demo 3 func_value = function
1 # 并且可以通过变量调用 2 class_value() # Demo Class 3 func_value() # function
1 # 将函数和类添加到集合中 2 obj_list = [] 3 obj_list.append(Demo) 4 obj_list.append(function) 5 # 遍历列表 6 for i in obj_list: 7 print(i) 8 #9 #
1 # 定义一个具体函数 2 def test_func(class_name, func_name): 3 class_name() 4 func_name()
1 # 将类名和函数名传入形参列表 2 test_func(Demo, function) 3 # Demo Class 4 # function
1 # 定义函数实现返回类和函数 2 def test_func2(): 3 return Demo 4 5 def test_func3(): 6 return function
1 # 执行函数 2 test_func2()() # Demo Class 3 test_func3()() # function
在Python中,object
的实例是type
,object
是顶层类,没有基类;type
的实例是type
,type
的基类是object
。Python中的内置类型的基类是object
,但是他们都是由type
实例化而来,具体的值由内置类型实例化而来。在Python2.x的语法中用户自定义的类没有明确指定基类就默认是没有基类,在Python3.x的语法中,指定基类为object
。
1 # object是谁实例化的? 2 print(type(object)) #3 4 # object继承自哪个类? 5 print(object.__bases__) # () 6 7 # type是谁实例化的? 8 print(type(type)) # 9 10 # type继承自哪个类? 11 print(type.__bases__) # ( ,) 12 13 # 定义一个变量 14 value = 100 15 16 # 100由谁实例化? 17 print(type(value)) #18 19 # int由谁实例化? 20 print(type(int)) # 21 22 # int继承自哪个类? 23 print(int.__bases__) # ( ,)
1 # Python 2.x的旧式类 2 class OldClass(): 3 pass 4 5 # Python 3.x的新式类 6 class NewClass(object): 7 pass
在Python中,对象有3个特征属性:
-
在内存中的地址,使用
id()
函数进行查看 -
对象的类型
-
对象的默认值
Step.1 None类型
在Python解释器启动时,会创建一个None类型的None对象,并且None对象全局只有一个。
Step.2 数值类型
-
ini类型
-
float类型
-
complex类型
-
bool类型
Step.3 迭代类型
在Python中,迭代类型可以使用循环来进行遍历。
Step.4 序列类型
-
list
-
tuple
-
str
-
array
-
range
-
bytes, bytearray, memoryvie(二进制序列)
Step.5 映射类型
-
dict
Step.6 集合类型
-
set
-
frozenset
Step.7 上下文管理类型
-
with语句
Step.8 其他类型
-
模块
-
class
-
实例
-
函数
-
方法
-
代码
-
object对象
-
type对象
-
ellipsis(省略号)
-
notimplemented
NO.4 魔法函数
Python中的魔法函数使用双下划线开始,以双下划线结尾。关于详细介绍请看我的文章——《全面总结Python中的魔法函数》。
No.5 鸭子类型与白鹅类型
鸭子类型是程序设计中的推断风格,在鸭子类型中关注对象如何使用而不是类型本身。鸭子类型像多态一样工作但是没有继承。鸭子类型的概念来自于:“当看到一只鸟走起来像鸭子、游泳起来像鸭子、叫起来也像鸭子,那么这只鸟就可以被称为鸭子。”
1 # 定义狗类 2 class Dog(object): 3 def eat(self): 4 print("dog is eatting...") 5 6 # 定义猫类 7 class Cat(object): 8 def eat(self): 9 print("cat is eatting...") 10 11 # 定义鸭子类 12 class Duck(object): 13 def eat(self): 14 print("duck is eatting...") 15 16 # 以上Python中多态的体现 17 18 # 定义动物列表 19 an_li = [] 20 # 将动物添加到列表 21 an_li.append(Dog) 22 an_li.append(Cat) 23 an_li.append(Duck) 24 25 # 依次调用每个动物的eat()方法 26 for i in an_li: 27 i().eat() 28 29 # dog is eatting... 30 # cat is eatting... 31 # duck is eatting...
No.6 协议、 抽象基类、abc模块和序列之间的继承关系
-
协议:Python中的非正式接口,是允许Python实现多态的方式,协议是非正式的,不具备强制性,由约定和文档定义。
-
接口:泛指实体把自己提供给外界的一种抽象化物(可以为另一实体),用以由内部操作分离出外部沟通方法,使其能被内部修改而不影响外界其他实体与其交互的方式。
我们可以使用猴子补丁
来实现协议,那么什么是猴子补丁呢?
猴子补丁就是在运行时修改模块或类,不去修改源代码,从而实现目标协议接口操作,这就是所谓的打猴子补丁。
-
在运行时替换方法、属性
-
在不修改源代码的情况下对程序本身添加之前没有的功能
-
在运行时对象中添加补丁,而不是在磁盘中的源代码上
应用案例:假设写了一个很大的项目,处处使用了json模块来解析json文件,但是后来发现ujson比json性能更高,修改源代码是要修改很多处的,所以只需要在程序入口加入:
1 import json 2 # pip install ujson 3 import ujson 4 5 def monkey_patch_json(): 6 json.__name__ = 'ujson' 7 json.dumps = ujson.dumps 8 json.loads = ujson.loads 9 10 monkey_patch_json()
-
抽象基类不能被实例化(不能创建对象),通常是作为基类供子类继承,子类中重写虚函数,实现具体的接口。
-
判定某个对象的类型
-
强制子类必须实现某些方法
1 import abc 2 3 # 定义缓存类 4 class Cache(metaclass=abc.ABCMeta): 5 6 @abc.abstractmethod 7 def get(self, key): 8 pass 9 10 @abc.abstractmethod 11 def set(self, key, value): 12 pass 13 14 # 定义redis缓存类实现Cache类中的get()和set()方法 15 class RedisCache(Cache): 16 17 def set(self, key): 18 pass 19 20 def get(self, key, value): 21 pass
-
collections.abc模块中各个抽象基类的UML类图
1 class A(object): 2 pass 3 4 class B(A): 5 pass 6 7 b = B() 8 9 print(isinstance(b, B)) 10 print(isinstance(b, A)) 11 print(type(b) is B) 12 print(type(b) is A) 13 14 # True 15 # True 16 # True 17 # False
-
实例变量只能通过类的实例进行调用
-
修改模板对象创建的对象的属性,模板对象的属性不会改变
-
修改模板对象的属性,由模板对象创建的对象的属性会改变
1 # 此处的类也是模板对象,Python中一切皆对象 2 class A(object): 3 4 #类变量 5 number = 12 6 7 def __init__(self): 8 # 实例变量 9 self.number_2 = 13 10 11 # 实例变量只能通过类的实例进行调用 12 print(A.number) # 12 13 print(A().number) # 12 14 print(A().number_2) # 13 15 16 # 修改模板对象创建的对象的属性,模板对象的属性不会改变 17 a = A() 18 a.number = 18 19 print(a.number) # 18 20 print(A().number) # 12 21 print(A.number) # 12 22 23 # 修改模板对象的属性,由模板对象创建的对象的属性会改变 24 A.number = 19 25 print(A.number) # 19 26 print(A().number) # 19
-
在Python 2.2之前只有经典类,到Python2.7还会兼容经典类,Python3.x以后只使用新式类,Python之前版本也会兼容新式类
-
Python 2.2 及其之前类没有基类,Python新式类需要显式继承自
object
,即使不显式继承也会默认继承自object
-
经典类在类多重继承的时候是采用从左到右深度优先原则匹配方法的.而新式类是采用C3算法
-
经典类没有MRO和instance.mro()调用的
假定存在以下继承关系:
1 class D(object): 2 def say_hello(self): 3 pass 4 5 class E(object): 6 pass 7 8 class B(D): 9 pass 10 11 class C(E): 12 pass 13 14 class A(B, C): 15 pass
采用DFS(深度优先搜索算法)当调用了A的say_hello()方法的时候,系统会去B中查找如果B中也没有找到,那么去D中查找,很显然D中存在这个方法,但是DFS对于以下继承关系就会有缺陷:
1 class D(object): 2 pass 3 4 class B(D): 5 pass 6 7 class C(D): 8 def say_hello(self): 9 pass 10 11 class A(B, C): 12 pass
在A的实例对象中调用say_hello方法时,系统会先去B中查找,由于B类中没有该方法的定义,所以会去D中查找,D类中也没有,系统就会认为该方法没有定义,其实该方法在C中定义了。所以考虑使用BFS(广度优先搜索算法),那么问题回到第一个继承关系,假定C和D具备重名方法,在调用A的实例的方法时,应该先在B中查找,理应调用D中的方法,但是使用BFS的时候,C类中的方法会覆盖D类中的方法。在Python 2.3以后的版本中,使用C3算法:
1 # 获取解析顺序的方法 2 类名.mro() 3 类名.__mro__ 4 inspect.getmro(类名)
使用C3算法后的第二种继承顺序:
1 class D(object): 2 pass 3 4 class B(D): 5 pass 6 7 class C(D): 8 def say_hello(self): 9 pass 10 11 class A(B, C): 12 pass 13 14 print(A.mro()) # [, , , , ]
使用C3算法后的第一种继承顺序:
1 class D(object): 2 pass 3 4 class E(object): 5 pass 6 7 class B(D): 8 pass 9 10 class C(E): 11 pass 12 13 class A(B, C): 14 pass 15 16 print(A.mro()) 17 # [, , , , , ]
No.10 类方法、实例方法和静态方法
1 class Demo(object): 2 # 类方法 3 @classmethod 4 def class_method(cls, number): 5 pass 6 7 # 静态方法 8 @staticmethod 9 def static_method(number): 10 pass 11 12 # 对象方法/实例方法 13 def object_method(self, number): 14 pass
实例方法只能通过类的实例来调用;静态方法是一个独立的、无状态的函数,紧紧依托于所在类的命名空间上;类方法在为了获取类中维护的数据,比如:
1 class Home(object): 2 3 # 房间中人数 4 __number = 0 5 6 @classmethod 7 def add_person_number(cls): 8 cls.__number += 1 9 10 @classmethod 11 def get_person_number(cls): 12 return cls.__number 13 14 def __new__(self): 15 Home.add_person_number() 16 # 重写__new__方法,调用object的__new__ 17 return super().__new__(self) 18 19 class Person(Home): 20 21 def __init__(self): 22 23 # 房间人员姓名 24 self.name = 'name' 25 26 # 创建人员对象时调用Home的__new__()方法 27 28 tom = Person() 29 print(type(tom)) #30 alice = Person() 31 bob = Person() 32 test = Person() 33 34 print(Home.get_person_number())
Python中使用双下划线+属性名称实现类似于静态语言中的private修饰来实现数据封装。
1 class User(object): 2 3 def __init__(self, number): 4 self.__number = number 5 self.__number_2 = 0 6 7 def set_number(self, number): 8 self.__number = number 9 10 def get_number(self): 11 return self.__number 12 13 def set_number_2(self, number2): 14 self.__number_2 = number2 15 # self.__number2 = number2 16 17 def get_number_2(self): 18 return self.__number_2 19 # return self.__number2 20 21 u = User(25) 22 print(u.get_number()) # 25 23 # 真的类似于Java的反射机制吗? 24 print(u._User__number) # 25 25 # 下面又是啥情况。。。想不明白了T_T 26 u.set_number_2(18) 27 print(u.get_number_2()) # 18 28 print(u._User__number_2) 29 # Anaconda 3.6.3 第一次是:u._User__number_2 第二次是:18 30 # Anaconda 3.6.5 结果都是 0 31 32 # 代码我改成了正确答案,感谢我大哥给我指正错误,我保留了错误痕迹 33 # 变量名称写错了,算是个写博客突发事故,这问题我找了一天,万分感谢我大哥,我太傻B了,犯了低级错误 34 # 留给和我一样的童鞋参考我的错我之处吧! 35 36 # 正确结果: 37 # 25 25 18 18
自省(introspection)是一种自我检查行为。在计算机编程中,自省是指这种能力:检查某些事物以确定它是什么、它知道什么以及它能做什么。自省向程序员提供了极大的灵活性和控制力。
-
dir([obj]):返回传递给它的任何对象的属性名称经过排序的列表(会有一些特殊的属性不包含在内)
-
getattr(obj, attr):返回任意对象的任何属性 ,调用这个方法将返回obj中名为attr值的属性的值
-
... ...
No.13 super函数
Python3.x 和 Python2.x 的一个区别是: Python 3 可以使用直接使用 super().xxx 代替 super(type[, object-or-type]).xxx 。
super()函数用来调用MRO(类方法解析顺序表)的下一个类的方法。
No.14 Mixin继承
在设计上将Mixin类作为功能混入继承自Mixin的类。使用Mixin类实现多重继承应该注意:
-
Mixin类必须表示某种功能
-
职责单一,如果要有多个功能,就要设计多个Mixin类
-
不依赖子类实现,Mixin类的存在仅仅是增加了子类的功能特性
-
即使子类没有继承这个Mixin类也可以工作
1 class Cat(object): 2 3 def eat(self): 4 print("I can eat.") 5 6 def drink(self): 7 print("I can drink.") 8 9 class CatFlyMixin(object): 10 11 def fly(self): 12 print("I can fly.") 13 14 class CatJumpMixin(object): 15 16 def jump(self): 17 print("I can jump.") 18 19 20 class TomCat(Cat, CatFlyMixin): 21 pass 22 23 class PersianCat(Cat, CatFlyMixin, CatJumpMixin): 24 pass 25 26 if __name__ == '__main__': 27 28 # 汤姆猫没有跳跃功能 29 tom = TomCat() 30 tom.fly() 31 tom.eat() 32 tom.drink() 33 34 # 波斯猫混入了跳跃功能 35 persian = PersianCat() 36 persian.drink() 37 persian.eat() 38 persian.fly() 39 persian.jump()
普通的异常捕获机制:
1 try: 2 pass 3 except Exception as err: 4 pass 5 else: 6 pass 7 finally: 8 pass
with简化了异常捕获写法:
1 class Demo(object): 2 3 def __enter__(self): 4 print("enter...") 5 return self 6 7 def __exit__(self, exc_type, exc_val, exc_tb): 8 print("exit...") 9 10 def echo_hello(self): 11 print("Hello, Hello...") 12 13 with Demo() as d: 14 d.echo_hello() 15 16 # enter... 17 # Hello, Hello... 18 # exit...
1 import contextlib 2 3 # 使用装饰器 4 @contextlib.contextmanager 5 def file_open(file_name): 6 # 此处写__enter___函数中定义的代码 7 print("enter function code...") 8 yield {} 9 # 此处写__exit__函数中定义的代码 10 print("exit function code...") 11 12 with file_open("json.json") as f: 13 pass 14 15 # enter function code... 16 # exit function code...
-
容器序列:list tuple deque
-
扁平序列:str bytes bytearray array.array
-
可变序列:list deque bytearray array
-
不可变序列:str tuple bytes
No.27 +、+=、extend()之间的区别于应用场景
首先看测试用例:
1 # 创建一个序列类型的对象 2 my_list = [1, 2, 3] 3 # 将现有的序列合并到my_list 4 extend_my_list = my_list + [4, 5] 5 6 print(extend_my_list) # [1, 2, 3, 4, 5] 7 # 将一个元组合并到这个序列 8 extend_my_list = my_list + (6, 7) 9 # 抛出异常 TypeError: can only concatenate list (not "tuple") to list 10 print(extend_my_list) 11 12 # 使用另一种方式合并 13 extend_my_list += (6, 7) 14 print(extend_my_list) # [1, 2, 3, 4, 5, 6, 7] 15 16 # 使用extend()函数进行合并 17 18 extend_my_list.extend((7, 8)) 19 print(extend_my_list) # [1, 2, 3, 4, 5, 6, 7, 7, 8]
由源代码片段可知:
1 class MutableSequence(Sequence): 2 3 __slots__ = () 4 5 """All the operations on a read-write sequence. 6 7 Concrete subclasses must provide __new__ or __init__, 8 __getitem__, __setitem__, __delitem__, __len__, and insert(). 9 10 """ 11 # extend()方法内部使用for循环来append()元素,它接收一个可迭代序列 12 def extend(self, values): 13 'S.extend(iterable) -- extend sequence by appending elements from the iterable' 14 for v in values: 15 self.append(v) 16 # 调用 += 运算的时候就是调用该函数,这个函数内部调用extend()方法 17 def __iadd__(self, values): 18 self.extend(values) 19 return self
1 import bisect 2 3 my_list = [] 4 bisect.insort(my_list, 2) 5 bisect.insort(my_list, 9) 6 bisect.insort(my_list, 5) 7 bisect.insort(my_list, 5) 8 bisect.insort(my_list, 1) 9 # insort()函数返回接收的元素应该插入到指定序列的索引位置 10 print(my_list) # [1, 2, 5, 5, 9]
deque是Python中一个双端队列,能在队列两端以的效率插入数据,位于collections模块中。
1 from collections import deque 2 # 定义一个双端队列,长度为3 3 d = deque(maxlen=3)
deque类的源码:
1 class deque(object): 2 """ 3 deque([iterable[, maxlen]]) --> deque object 4 一个类似列表的序列,用于对其端点附近的数据访问进行优化。 5 """ 6 def append(self, *args, **kwargs): 7 """ 在队列右端添加数据 """ 8 pass 9 10 def appendleft(self, *args, **kwargs): 11 """ 在队列左端添加数据 """ 12 pass 13 14 def clear(self, *args, **kwargs): 15 """ 清空所有元素 """ 16 pass 17 18 def copy(self, *args, **kwargs): 19 """ 浅拷贝一个双端队列 """ 20 pass 21 22 def count(self, value): 23 """ 统计指定value值的出现次数 """ 24 return 0 25 26 def extend(self, *args, **kwargs): 27 """ 使用迭代的方式扩展deque的右端 """ 28 pass 29 30 def extendleft(self, *args, **kwargs): 31 """ 使用迭代的方式扩展deque的左端 """ 32 pass 33 34 def index(self, value, start=None, stop=None): __doc__ 35 """ 36 返回第一个符合条件的索引的值 37 """ 38 return 0 39 40 def insert(self, index, p_object): 41 """ 在指定索引之前插入 """ 42 pass 43 44 def pop(self, *args, **kwargs): # real signature unknown 45 """ 删除并返回右端的一个元素 """ 46 pass 47 48 def popleft(self, *args, **kwargs): # real signature unknown 49 """ 删除并返回左端的一个元素 """ 50 pass 51 52 def remove(self, value): # real signature unknown; restored from __doc__ 53 """ 删除第一个与value相同的值 """ 54 pass 55 56 def reverse(self): # real signature unknown; restored from __doc__ 57 """ 翻转队列 """ 58 pass 59 60 def rotate(self, *args, **kwargs): # real signature unknown 61 """ 向右旋转deque N步, 如果N是个负数,那么向左旋转N的绝对值步 """ 62 pass 63 64 def __add__(self, *args, **kwargs): # real signature unknown 65 """ Return self+value. """ 66 pass 67 68 def __bool__(self, *args, **kwargs): # real signature unknown 69 """ self != 0 """ 70 pass 71 72 def __contains__(self, *args, **kwargs): # real signature unknown 73 """ Return key in self. """ 74 pass 75 76 def __copy__(self, *args, **kwargs): # real signature unknown 77 """ Return a shallow copy of a deque. """ 78 pass 79 80 def __delitem__(self, *args, **kwargs): # real signature unknown 81 """ Delete self[key]. """ 82 pass 83 84 def __eq__(self, *args, **kwargs): # real signature unknown 85 """ Return self==value. """ 86 pass 87 88 def __getattribute__(self, *args, **kwargs): # real signature unknown 89 """ Return getattr(self, name). """ 90 pass 91 92 def __getitem__(self, *args, **kwargs): # real signature unknown 93 """ Return self[key]. """ 94 pass 95 96 def __ge__(self, *args, **kwargs): # real signature unknown 97 """ Return self>=value. """ 98 pass 99 100 def __gt__(self, *args, **kwargs): # real signature unknown 101 """ Return self>value. """ 102 pass 103 104 def __iadd__(self, *args, **kwargs): # real signature unknown 105 """ Implement self+=value. """ 106 pass 107 108 def __imul__(self, *args, **kwargs): # real signature unknown 109 """ Implement self*=value. """ 110 pass 111 112 def __init__(self, iterable=(), maxlen=None): # known case of _collections.deque.__init__ 113 """ 114 deque([iterable[, maxlen]]) --> deque object 115 116 A list-like sequence optimized for data accesses near its endpoints. 117 # (copied from class doc) 118 """ 119 pass 120 121 def __iter__(self, *args, **kwargs): # real signature unknown 122 """ Implement iter(self). """ 123 pass 124 125 def __len__(self, *args, **kwargs): # real signature unknown 126 """ Return len(self). """ 127 pass 128 129 def __le__(self, *args, **kwargs): # real signature unknown 130 """ Return self<=value. """ 131 pass 132 133 def __lt__(self, *args, **kwargs): # real signature unknown 134 """ Return self""" 135 pass 136 137 def __mul__(self, *args, **kwargs): # real signature unknown 138 """ Return self*value.n """ 139 pass 140 141 @staticmethod # known case of __new__ 142 def __new__(*args, **kwargs): # real signature unknown 143 """ Create and return a new object. See help(type) for accurate signature. """ 144 pass 145 146 def __ne__(self, *args, **kwargs): # real signature unknown 147 """ Return self!=value. """ 148 pass 149 150 def __reduce__(self, *args, **kwargs): # real signature unknown 151 """ Return state information for pickling. """ 152 pass 153 154 def __repr__(self, *args, **kwargs): # real signature unknown 155 """ Return repr(self). """ 156 pass 157 158 def __reversed__(self): # real signature unknown; restored from __doc__ 159 """ D.__reversed__() -- return a reverse iterator over the deque """ 160 pass 161 162 def __rmul__(self, *args, **kwargs): # real signature unknown 163 """ Return self*value. """ 164 pass 165 166 def __setitem__(self, *args, **kwargs): # real signature unknown 167 """ Set self[key] to value. """ 168 pass 169 170 def __sizeof__(self): # real signature unknown; restored from __doc__ 171 """ D.__sizeof__() -- size of D in memory, in bytes """ 172 pass 173 174 maxlen = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 175 """maximum size of a deque or None if unbounded""" 176 177 178 __hash__ = None
-
列表推导式
列表生成式要比操作列表效率高很多,但是列表生成式的滥用会导致代码可读性降低,并且列表生成式可以替换map()
和reduce()
函数。
1 # 构建列表 2 my_list = [x for x in range(9)] 3 print(my_list) # [0, 1, 2, 3, 4, 5, 6, 7, 8] 4 5 # 构建0-8中为偶数的列表 6 my_list = [x for x in range(9) if(x%2==0)] 7 print(my_list) # [0, 2, 4, 6, 8] 8 9 # 构建0-8为奇数的列表,并将每个数字做平方运算 10 11 def function(number): 12 return number * number 13 14 my_list = [function(x) for x in range(9) if x%2!=0] 15 print(my_list) # [1, 9, 25, 49]
生成器表达式就是把列表表达式的中括号变成小括号。
1 # 构造一个生成器 2 gen = (i for i in range(9)) 3 4 # 生成器可以被遍历 5 for i in gen: 6 print(i)
1 # 将生成器转换为列表 2 li = list(gen) 3 print(li)
- 字典推导式
1 d = { 2 'tom': 18, 3 'alice': 16, 4 'bob': 20, 5 } 6 dict = {key: value for key, value in d.items()} 7 print(dict) # {'tom': 18, 'alice': 16, 'bob': 20}
- Set集合推导式
1 my_set = {i for i in range(9)} 2 print(my_set) # {0, 1, 2, 3, 4, 5, 6, 7, 8}
Set和Dict的背后实现都是Hash(哈希)表,有的书本上也较散列表。Hash表原理可以参考我的算法与数学
博客栏目,下面给出几点总结:
-
Set和Dict的效率高于List。
-
Se和Dict的Key必须是可哈希的元素。
-
在Python中,不可变对象都是可哈希的,比如:str、fronzenset、tuple,需要实现
__hash__()
函数。 -
Dict内存空间占用多,但是速度快,Python中自定义对象或Python内部对象都是Dict包装的。
-
Dict和Set的元素存储顺序和元素的添加顺序有关,但是添加元素时有可能改变已有的元素顺序。
-
List会随着元素数量的增加,查找元素的时间也会增大。
-
Dict和Set不会随着元素数量的增加而查找时间延长。
No.32 Python中的集合类模块collections
defaultdict
defaultdict
在dict
的基础上添加了default_factroy
方法,它的作用是当key不存在的时候自动生成相应类型的value,defalutdict
参数可以指定成list
、set
、int
等各种类型。
1 from collections import defaultdict 2 3 my_list = [ 4 ("Tom", 18), 5 ("Tom", 20), 6 ("Alice", 15), 7 ("Bob", 21), 8 ] 9 10 def_dict = defaultdict(list) 11 12 for key, val in my_list: 13 def_dict[key].append(val) 14 15 print(def_dict.items()) 16 # dict_items([('Tom', [18, 20]), ('Alice', [15]), ('Bob', [21])]) 17 18 # 如果不考虑重复元素可以使用如下方式 19 def_dict_2 = defaultdict(set) 20 21 for key, val in my_list: 22 def_dict_2[key].add(val) 23 24 print(def_dict_2.items()) 25 # dict_items([('Tom', {18, 20}), ('Alice', {15}), ('Bob', {21})])
1 class defaultdict(Dict[_KT, _VT], Generic[_KT, _VT]): 2 default_factory = ... # type: Callable[[], _VT] 3 4 @overload 5 def __init__(self, **kwargs: _VT) -> None: ... 6 @overload 7 def __init__(self, default_factory: Optional[Callable[[], _VT]]) -> None: ... 8 @overload 9 def __init__(self, default_factory: Optional[Callable[[], _VT]], **kwargs: _VT) -> None: ... 10 @overload 11 def __init__(self, default_factory: Optional[Callable[[], _VT]], 12 map: Mapping[_KT, _VT]) -> None: ... 13 @overload 14 def __init__(self, default_factory: Optional[Callable[[], _VT]], 15 map: Mapping[_KT, _VT], **kwargs: _VT) -> None: ... 16 @overload 17 def __init__(self, default_factory: Optional[Callable[[], _VT]], 18 iterable: Iterable[Tuple[_KT, _VT]]) -> None: ... 19 @overload 20 def __init__(self, default_factory: Optional[Callable[[], _VT]], 21 iterable: Iterable[Tuple[_KT, _VT]], **kwargs: _VT) -> None: ... 22 def __missing__(self, key: _KT) -> _VT: ... 23 # TODO __reversed__ 24 def copy(self: _DefaultDictT) -> _DefaultDictT: ...
OrderDict最大的特点就是元素位置有序,它是dict
的子类。OrderDict在内部维护一个字典元素的有序列表。
1 from collections import OrderedDict 2 3 my_dict = { 4 "Bob": 20, 5 "Tim": 20, 6 "Amy": 18, 7 } 8 # 通过key来排序 9 order_dict = OrderedDict(sorted(my_dict.items(), key=lambda li: li[1])) 10 print(order_dict) # OrderedDict([('Amy', 18), ('Bob', 20), ('Tim', 20)])
1 class OrderedDict(dict): 2 'Dictionary that remembers insertion order' 3 # An inherited dict maps keys to values. 4 # The inherited dict provides __getitem__, __len__, __contains__, and get. 5 # The remaining methods are order-aware. 6 # Big-O running times for all methods are the same as regular dictionaries. 7 8 # The internal self.__map dict maps keys to links in a doubly linked list. 9 # The circular doubly linked list starts and ends with a sentinel element. 10 # The sentinel element never gets deleted (this simplifies the algorithm). 11 # The sentinel is in self.__hardroot with a weakref proxy in self.__root. 12 # The prev links are weakref proxies (to prevent circular references). 13 # Individual links are kept alive by the hard reference in self.__map. 14 # Those hard references disappear when a key is deleted from an OrderedDict. 15 16 def __init__(*args, **kwds): 17 '''Initialize an ordered dictionary. The signature is the same as 18 regular dictionaries. Keyword argument order is preserved. 19 ''' 20 if not args: 21 raise TypeError("descriptor '__init__' of 'OrderedDict' object " 22 "needs an argument") 23 self, *args = args 24 if len(args) > 1: 25 raise TypeError('expected at most 1 arguments, got %d' % len(args)) 26 try: 27 self.__root 28 except AttributeError: 29 self.__hardroot = _Link() 30 self.__root = root = _proxy(self.__hardroot) 31 root.prev = root.next = root 32 self.__map = {} 33 self.__update(*args, **kwds) 34 35 def __setitem__(self, key, value, 36 dict_setitem=dict.__setitem__, proxy=_proxy, Link=_Link): 37 'od.__setitem__(i, y) <==> od[i]=y' 38 # Setting a new item creates a new link at the end of the linked list, 39 # and the inherited dictionary is updated with the new key/value pair. 40 if key not in self: 41 self.__map[key] = link = Link() 42 root = self.__root 43 last = root.prev 44 link.prev, link.next, link.key = last, root, key 45 last.next = link 46 root.prev = proxy(link) 47 dict_setitem(self, key, value) 48 49 def __delitem__(self, key, dict_delitem=dict.__delitem__): 50 'od.__delitem__(y) <==> del od[y]' 51 # Deleting an existing item uses self.__map to find the link which gets 52 # removed by updating the links in the predecessor and successor nodes. 53 dict_delitem(self, key) 54 link = self.__map.pop(key) 55 link_prev = link.prev 56 link_next = link.next 57 link_prev.next = link_next 58 link_next.prev = link_prev 59 link.prev = None 60 link.next = None 61 62 def __iter__(self): 63 'od.__iter__() <==> iter(od)' 64 # Traverse the linked list in order. 65 root = self.__root 66 curr = root.next 67 while curr is not root: 68 yield curr.key 69 curr = curr.next 70 71 def __reversed__(self): 72 'od.__reversed__() <==> reversed(od)' 73 # Traverse the linked list in reverse order. 74 root = self.__root 75 curr = root.prev 76 while curr is not root: 77 yield curr.key 78 curr = curr.prev 79 80 def clear(self): 81 'od.clear() -> None. Remove all items from od.' 82 root = self.__root 83 root.prev = root.next = root 84 self.__map.clear() 85 dict.clear(self) 86 87 def popitem(self, last=True): 88 '''Remove and return a (key, value) pair from the dictionary. 89 90 Pairs are returned in LIFO order if last is true or FIFO order if false. 91 ''' 92 if not self: 93 raise KeyError('dictionary is empty') 94 root = self.__root 95 if last: 96 link = root.prev 97 link_prev = link.prev 98 link_prev.next = root 99 root.prev = link_prev 100 else: 101 link = root.next 102 link_next = link.next 103 root.next = link_next 104 link_next.prev = root 105 key = link.key 106 del self.__map[key] 107 value = dict.pop(self, key) 108 return key, value 109 110 def move_to_end(self, key, last=True): 111 '''Move an existing element to the end (or beginning if last==False). 112 113 Raises KeyError if the element does not exist. 114 When last=True, acts like a fast version of self[key]=self.pop(key). 115 116 ''' 117 link = self.__map[key] 118 link_prev = link.prev 119 link_next = link.next 120 soft_link = link_next.prev 121 link_prev.next = link_next 122 link_next.prev = link_prev 123 root = self.__root 124 if last: 125 last = root.prev 126 link.prev = last 127 link.next = root 128 root.prev = soft_link 129 last.next = link 130 else: 131 first = root.next 132 link.prev = root 133 link.next = first 134 first.prev = soft_link 135 root.next = link 136 137 def __sizeof__(self): 138 sizeof = _sys.getsizeof 139 n = len(self) + 1 # number of links including root 140 size = sizeof(self.__dict__) # instance dictionary 141 size += sizeof(self.__map) * 2 # internal dict and inherited dict 142 size += sizeof(self.__hardroot) * n # link objects 143 size += sizeof(self.__root) * n # proxy objects 144 return size 145 146 update = __update = MutableMapping.update 147 148 def keys(self): 149 "D.keys() -> a set-like object providing a view on D's keys" 150 return _OrderedDictKeysView(self) 151 152 def items(self): 153 "D.items() -> a set-like object providing a view on D's items" 154 return _OrderedDictItemsView(self) 155 156 def values(self): 157 "D.values() -> an object providing a view on D's values" 158 return _OrderedDictValuesView(self) 159 160 __ne__ = MutableMapping.__ne__ 161 162 __marker = object() 163 164 def pop(self, key, default=__marker): 165 '''od.pop(k[,d]) -> v, remove specified key and return the corresponding 166 value. If key is not found, d is returned if given, otherwise KeyError 167 is raised. 168 169 ''' 170 if key in self: 171 result = self[key] 172 del self[key] 173 return result 174 if default is self.__marker: 175 raise KeyError(key) 176 return default 177 178 def setdefault(self, key, default=None): 179 'od.setdefault(k[,d]) -> od.get(k,d), also set od[k]=d if k not in od' 180 if key in self: 181 return self[key] 182 self[key] = default 183 return default 184 185 @_recursive_repr() 186 def __repr__(self): 187 'od.__repr__() <==> repr(od)' 188 if not self: 189 return '%s()' % (self.__class__.__name__,) 190 return '%s(%r)' % (self.__class__.__name__, list(self.items())) 191 192 def __reduce__(self): 193 'Return state information for pickling' 194 inst_dict = vars(self).copy() 195 for k in vars(OrderedDict()): 196 inst_dict.pop(k, None) 197 return self.__class__, (), inst_dict or None, None, iter(self.items()) 198 199 def copy(self): 200 'od.copy() -> a shallow copy of od' 201 return self.__class__(self) 202 203 @classmethod 204 def fromkeys(cls, iterable, value=None): 205 '''OD.fromkeys(S[, v]) -> New ordered dictionary with keys from S. 206 If not specified, the value defaults to None. 207 208 ''' 209 self = cls() 210 for key in iterable: 211 self[key] = value 212 return self 213 214 def __eq__(self, other): 215 '''od.__eq__(y) <==> od==y. Comparison to another OD is order-sensitive 216 while comparison to a regular mapping is order-insensitive. 217 218 ''' 219 if isinstance(other, OrderedDict): 220 return dict.__eq__(self, other) and all(map(_eq, self, other)) 221 return dict.__eq__(self, other)
list
存储数据的时候,内部实现是数组,数组的查找速度是很快的,但是插入和删除数据的速度堪忧。deque双端列表内部实现是双端队列。deuque
适用队列和栈,并且是线程安全的。
deque
提供append()
和pop()
函数实现在deque
尾部添加和弹出数据,提供appendleft()
和popleft()
函数实现在deque
头部添加和弹出元素。这4个函数的时间复杂度都是的,但是list
的时间复杂度高达O(n)。
1 from collections import deque 2 3 # 创建一个队列长度为20的deque 4 dQ = deque(range(10), maxlen=20) 5 print(dQ) 6 # deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], maxlen=20)
1 class deque(object): 2 """ 3 deque([iterable[, maxlen]]) --> deque object 4 5 A list-like sequence optimized for data accesses near its endpoints. 6 """ 7 def append(self, *args, **kwargs): # real signature unknown 8 """ Add an element to the right side of the deque. """ 9 pass 10 11 def appendleft(self, *args, **kwargs): # real signature unknown 12 """ Add an element to the left side of the deque. """ 13 pass 14 15 def clear(self, *args, **kwargs): # real signature unknown 16 """ Remove all elements from the deque. """ 17 pass 18 19 def copy(self, *args, **kwargs): # real signature unknown 20 """ Return a shallow copy of a deque. """ 21 pass 22 23 def count(self, value): # real signature unknown; restored from __doc__ 24 """ D.count(value) -> integer -- return number of occurrences of value """ 25 return 0 26 27 def extend(self, *args, **kwargs): # real signature unknown 28 """ Extend the right side of the deque with elements from the iterable """ 29 pass 30 31 def extendleft(self, *args, **kwargs): # real signature unknown 32 """ Extend the left side of the deque with elements from the iterable """ 33 pass 34 35 def index(self, value, start=None, stop=None): # real signature unknown; restored from __doc__ 36 """ 37 D.index(value, [start, [stop]]) -> integer -- return first index of value. 38 Raises ValueError if the value is not present. 39 """ 40 return 0 41 42 def insert(self, index, p_object): # real signature unknown; restored from __doc__ 43 """ D.insert(index, object) -- insert object before index """ 44 pass 45 46 def pop(self, *args, **kwargs): # real signature unknown 47 """ Remove and return the rightmost element. """ 48 pass 49 50 def popleft(self, *args, **kwargs): # real signature unknown 51 """ Remove and return the leftmost element. """ 52 pass 53 54 def remove(self, value): # real signature unknown; restored from __doc__ 55 """ D.remove(value) -- remove first occurrence of value. """ 56 pass 57 58 def reverse(self): # real signature unknown; restored from __doc__ 59 """ D.reverse() -- reverse *IN PLACE* """ 60 pass 61 62 def rotate(self, *args, **kwargs): # real signature unknown 63 """ Rotate the deque n steps to the right (default n=1). If n is negative, rotates left. """ 64 pass 65 66 def __add__(self, *args, **kwargs): # real signature unknown 67 """ Return self+value. """ 68 pass 69 70 def __bool__(self, *args, **kwargs): # real signature unknown 71 """ self != 0 """ 72 pass 73 74 def __contains__(self, *args, **kwargs): # real signature unknown 75 """ Return key in self. """ 76 pass 77 78 def __copy__(self, *args, **kwargs): # real signature unknown 79 """ Return a shallow copy of a deque. """ 80 pass 81 82 def __delitem__(self, *args, **kwargs): # real signature unknown 83 """ Delete self[key]. """ 84 pass 85 86 def __eq__(self, *args, **kwargs): # real signature unknown 87 """ Return self==value. """ 88 pass 89 90 def __getattribute__(self, *args, **kwargs): # real signature unknown 91 """ Return getattr(self, name). """ 92 pass 93 94 def __getitem__(self, *args, **kwargs): # real signature unknown 95 """ Return self[key]. """ 96 pass 97 98 def __ge__(self, *args, **kwargs): # real signature unknown 99 """ Return self>=value. """ 100 pass 101 102 def __gt__(self, *args, **kwargs): # real signature unknown 103 """ Return self>value. """ 104 pass 105 106 def __iadd__(self, *args, **kwargs): # real signature unknown 107 """ Implement self+=value. """ 108 pass 109 110 def __imul__(self, *args, **kwargs): # real signature unknown 111 """ Implement self*=value. """ 112 pass 113 114 def __init__(self, iterable=(), maxlen=None): # known case of _collections.deque.__init__ 115 """ 116 deque([iterable[, maxlen]]) --> deque object 117 118 A list-like sequence optimized for data accesses near its endpoints. 119 # (copied from class doc) 120 """ 121 pass 122 123 def __iter__(self, *args, **kwargs): # real signature unknown 124 """ Implement iter(self). """ 125 pass 126 127 def __len__(self, *args, **kwargs): # real signature unknown 128 """ Return len(self). """ 129 pass 130 131 def __le__(self, *args, **kwargs): # real signature unknown 132 """ Return self<=value. """ 133 pass 134 135 def __lt__(self, *args, **kwargs): # real signature unknown 136 """ Return self""" 137 pass 138 139 def __mul__(self, *args, **kwargs): # real signature unknown 140 """ Return self*value.n """ 141 pass 142 143 @staticmethod # known case of __new__ 144 def __new__(*args, **kwargs): # real signature unknown 145 """ Create and return a new object. See help(type) for accurate signature. """ 146 pass 147 148 def __ne__(self, *args, **kwargs): # real signature unknown 149 """ Return self!=value. """ 150 pass 151 152 def __reduce__(self, *args, **kwargs): # real signature unknown 153 """ Return state information for pickling. """ 154 pass 155 156 def __repr__(self, *args, **kwargs): # real signature unknown 157 """ Return repr(self). """ 158 pass 159 160 def __reversed__(self): # real signature unknown; restored from __doc__ 161 """ D.__reversed__() -- return a reverse iterator over the deque """ 162 pass 163 164 def __rmul__(self, *args, **kwargs): # real signature unknown 165 """ Return self*value. """ 166 pass 167 168 def __setitem__(self, *args, **kwargs): # real signature unknown 169 """ Set self[key] to value. """ 170 pass 171 172 def __sizeof__(self): # real signature unknown; restored from __doc__ 173 """ D.__sizeof__() -- size of D in memory, in bytes """ 174 pass 175 176 maxlen = property(lambda self: object(), lambda self, v: None, lambda self: None) # default 177 """maximum size of a deque or None if unbounded""" 178 179 180 __hash__ = None
用来统计元素出现的次数。
应用场景:
1 class Counter(dict): 2 '''Dict subclass for counting hashable items. Sometimes called a bag 3 or multiset. Elements are stored as dictionary keys and their counts 4 are stored as dictionary values. 5 6 >>> c = Counter('abcdeabcdabcaba') # count elements from a string 7 8 >>> c.most_common(3) # three most common elements 9 [('a', 5), ('b', 4), ('c', 3)] 10 >>> sorted(c) # list all unique elements 11 ['a', 'b', 'c', 'd', 'e'] 12 >>> ''.join(sorted(c.elements())) # list elements with repetitions 13 'aaaaabbbbcccdde' 14 >>> sum(c.values()) # total of all counts 15 15 16 17 >>> c['a'] # count of letter 'a' 18 5 19 >>> for elem in 'shazam': # update counts from an iterable 20 ... c[elem] += 1 # by adding 1 to each element's count 21 >>> c['a'] # now there are seven 'a' 22 7 23 >>> del c['b'] # remove all 'b' 24 >>> c['b'] # now there are zero 'b' 25 0 26 27 >>> d = Counter('simsalabim') # make another counter 28 >>> c.update(d) # add in the second counter 29 >>> c['a'] # now there are nine 'a' 30 9 31 32 >>> c.clear() # empty the counter 33 >>> c 34 Counter() 35 36 Note: If a count is set to zero or reduced to zero, it will remain 37 in the counter until the entry is deleted or the counter is cleared: 38 39 >>> c = Counter('aaabbc') 40 >>> c['b'] -= 2 # reduce the count of 'b' by two 41 >>> c.most_common() # 'b' is still in, but its count is zero 42 [('a', 3), ('c', 1), ('b', 0)] 43 44 ''' 45 # References: 46 # http://en.wikipedia.org/wiki/Multiset 47 # http://www.gnu.org/software/smalltalk/manual-base/html_node/Bag.html 48 # http://www.demo2s.com/Tutorial/Cpp/0380__set-multiset/Catalog0380__set-multiset.htm 49 # http://code.activestate.com/recipes/259174/ 50 # Knuth, TAOCP Vol. II section 4.6.3 51 52 def __init__(*args, **kwds): 53 '''Create a new, empty Counter object. And if given, count elements 54 from an input iterable. Or, initialize the count from another mapping 55 of elements to their counts. 56 57 >>> c = Counter() # a new, empty counter 58 >>> c = Counter('gallahad') # a new counter from an iterable 59 >>> c = Counter({'a': 4, 'b': 2}) # a new counter from a mapping 60 >>> c = Counter(a=4, b=2) # a new counter from keyword args 61 62 ''' 63 if not args: 64 raise TypeError("descriptor '__init__' of 'Counter' object " 65 "needs an argument") 66 self, *args = args 67 if len(args) > 1: 68 raise TypeError('expected at most 1 arguments, got %d' % len(args)) 69 super(Counter, self).__init__() 70 self.update(*args, **kwds) 71 72 def __missing__(self, key): 73 'The count of elements not in the Counter is zero.' 74 # Needed so that self[missing_item] does not raise KeyError 75 return 0 76 77 def most_common(self, n=None): 78 '''List the n most common elements and their counts from the most 79 common to the least. If n is None, then list all element counts. 80 81 >>> Counter('abcdeabcdabcaba').most_common(3) 82 [('a', 5), ('b', 4), ('c', 3)] 83 84 ''' 85 # Emulate Bag.sortedByCount from Smalltalk 86 if n is None: 87 return sorted(self.items(), key=_itemgetter(1), reverse=True) 88 return _heapq.nlargest(n, self.items(), key=_itemgetter(1)) 89 90 def elements(self): 91 '''Iterator over elements repeating each as many times as its count. 92 93 >>> c = Counter('ABCABC') 94 >>> sorted(c.elements()) 95 ['A', 'A', 'B', 'B', 'C', 'C'] 96 97 # Knuth's example for prime factors of 1836: 2**2 * 3**3 * 17**1 98 >>> prime_factors = Counter({2: 2, 3: 3, 17: 1}) 99 >>> product = 1 100 >>> for factor in prime_factors.elements(): # loop over factors 101 ... product *= factor # and multiply them 102 >>> product 103 1836 104 105 Note, if an element's count has been set to zero or is a negative 106 number, elements() will ignore it. 107 108 ''' 109 # Emulate Bag.do from Smalltalk and Multiset.begin from C++. 110 return _chain.from_iterable(_starmap(_repeat, self.items())) 111 112 # Override dict methods where necessary 113 114 @classmethod 115 def fromkeys(cls, iterable, v=None): 116 # There is no equivalent method for counters because setting v=1 117 # means that no element can have a count greater than one. 118 raise NotImplementedError( 119 'Counter.fromkeys() is undefined. Use Counter(iterable) instead.') 120 121 def update(*args, **kwds): 122 '''Like dict.update() but add counts instead of replacing them. 123 124 Source can be an iterable, a dictionary, or another Counter instance. 125 126 >>> c = Counter('which') 127 >>> c.update('witch') # add elements from another iterable 128 >>> d = Counter('watch') 129 >>> c.update(d) # add elements from another counter 130 >>> c['h'] # four 'h' in which, witch, and watch 131 4 132 133 ''' 134 # The regular dict.update() operation makes no sense here because the 135 # replace behavior results in the some of original untouched counts 136 # being mixed-in with all of the other counts for a mismash that 137 # doesn't have a straight-forward interpretation in most counting 138 # contexts. Instead, we implement straight-addition. Both the inputs 139 # and outputs are allowed to contain zero and negative counts. 140 141 if not args: 142 raise TypeError("descriptor 'update' of 'Counter' object " 143 "needs an argument") 144 self, *args = args 145 if len(args) > 1: 146 raise TypeError('expected at most 1 arguments, got %d' % len(args)) 147 iterable = args[0] if args else None 148 if iterable is not None: 149 if isinstance(iterable, Mapping): 150 if self: 151 self_get = self.get 152 for elem, count in iterable.items(): 153 self[elem] = count + self_get(elem, 0) 154 else: 155 super(Counter, self).update(iterable) # fast path when counter is empty 156 else: 157 _count_elements(self, iterable) 158 if kwds: 159 self.update(kwds) 160 161 def subtract(*args, **kwds): 162 '''Like dict.update() but subtracts counts instead of replacing them. 163 Counts can be reduced below zero. Both the inputs and outputs are 164 allowed to contain zero and negative counts. 165 166 Source can be an iterable, a dictionary, or another Counter instance. 167 168 >>> c = Counter('which') 169 >>> c.subtract('witch') # subtract elements from another iterable 170 >>> c.subtract(Counter('watch')) # subtract elements from another counter 171 >>> c['h'] # 2 in which, minus 1 in witch, minus 1 in watch 172 0 173 >>> c['w'] # 1 in which, minus 1 in witch, minus 1 in watch 174 -1 175 176 ''' 177 if not args: 178 raise TypeError("descriptor 'subtract' of 'Counter' object " 179 "needs an argument") 180 self, *args = args 181 if len(args) > 1: 182 raise TypeError('expected at most 1 arguments, got %d' % len(args)) 183 iterable = args[0] if args else None 184 if iterable is not None: 185 self_get = self.get 186 if isinstance(iterable, Mapping): 187 for elem, count in iterable.items(): 188 self[elem] = self_get(elem, 0) - count 189 else: 190 for elem in iterable: 191 self[elem] = self_get(elem, 0) - 1 192 if kwds: 193 self.subtract(kwds) 194 195 def copy(self): 196 'Return a shallow copy.' 197 return self.__class__(self) 198 199 def __reduce__(self): 200 return self.__class__, (dict(self),) 201 202 def __delitem__(self, elem): 203 'Like dict.__delitem__() but does not raise KeyError for missing values.' 204 if elem in self: 205 super().__delitem__(elem) 206 207 def __repr__(self): 208 if not self: 209 return '%s()' % self.__class__.__name__ 210 try: 211 items = ', '.join(map('%r: %r'.__mod__, self.most_common())) 212 return '%s({%s})' % (self.__class__.__name__, items) 213 except TypeError: 214 # handle case where values are not orderable 215 return '{0}({1!r})'.format(self.__class__.__name__, dict(self)) 216 217 # Multiset-style mathematical operations discussed in: 218 # Knuth TAOCP Volume II section 4.6.3 exercise 19 219 # and at http://en.wikipedia.org/wiki/Multiset 220 # 221 # Outputs guaranteed to only include positive counts. 222 # 223 # To strip negative and zero counts, add-in an empty counter: 224 # c += Counter() 225 226 def __add__(self, other): 227 '''Add counts from two counters. 228 229 >>> Counter('abbb') + Counter('bcc') 230 Counter({'b': 4, 'c': 2, 'a': 1}) 231 232 ''' 233 if not isinstance(other, Counter): 234 return NotImplemented 235 result = Counter() 236 for elem, count in self.items(): 237 newcount = count + other[elem] 238 if newcount > 0: 239 result[elem] = newcount 240 for elem, count in other.items(): 241 if elem not in self and count > 0: 242 result[elem] = count 243 return result 244 245 def __sub__(self, other): 246 ''' Subtract count, but keep only results with positive counts. 247 248 >>> Counter('abbbc') - Counter('bccd') 249 Counter({'b': 2, 'a': 1}) 250 251 ''' 252 if not isinstance(other, Counter): 253 return NotImplemented 254 result = Counter() 255 for elem, count in self.items(): 256 newcount = count - other[elem] 257 if newcount > 0: 258 result[elem] = newcount 259 for elem, count in other.items(): 260 if elem not in self and count < 0: 261 result[elem] = 0 - count 262 return result 263 264 def __or__(self, other): 265 '''Union is the maximum of value in either of the input counters. 266 267 >>> Counter('abbb') | Counter('bcc') 268 Counter({'b': 3, 'c': 2, 'a': 1}) 269 270 ''' 271 if not isinstance(other, Counter): 272 return NotImplemented 273 result = Counter() 274 for elem, count in self.items(): 275 other_count = other[elem] 276 newcount = other_count if count < other_count else count 277 if newcount > 0: 278 result[elem] = newcount 279 for elem, count in other.items(): 280 if elem not in self and count > 0: 281 result[elem] = count 282 return result 283 284 def __and__(self, other): 285 ''' Intersection is the minimum of corresponding counts. 286 287 >>> Counter('abbb') & Counter('bcc') 288 Counter({'b': 1}) 289 290 ''' 291 if not isinstance(other, Counter): 292 return NotImplemented 293 result = Counter() 294 for elem, count in self.items(): 295 other_count = other[elem] 296 newcount = count if count < other_count else other_count 297 if newcount > 0: 298 result[elem] = newcount 299 return result 300 301 def __pos__(self): 302 'Adds an empty counter, effectively stripping negative and zero counts' 303 result = Counter() 304 for elem, count in self.items(): 305 if count > 0: 306 result[elem] = count 307 return result 308 309 def __neg__(self): 310 '''Subtracts from an empty counter. Strips positive and zero counts, 311 and flips the sign on negative counts. 312 313 ''' 314 result = Counter() 315 for elem, count in self.items(): 316 if count < 0: 317 result[elem] = 0 - count 318 return result 319 320 def _keep_positive(self): 321 '''Internal method to strip elements with a negative or zero count''' 322 nonpositive = [elem for elem, count in self.items() if not count > 0] 323 for elem in nonpositive: 324 del self[elem] 325 return self 326 327 def __iadd__(self, other): 328 '''Inplace add from another counter, keeping only positive counts. 329 330 >>> c = Counter('abbb') 331 >>> c += Counter('bcc') 332 >>> c 333 Counter({'b': 4, 'c': 2, 'a': 1}) 334 335 ''' 336 for elem, count in other.items(): 337 self[elem] += count 338 return self._keep_positive() 339 340 def __isub__(self, other): 341 '''Inplace subtract counter, but keep only results with positive counts. 342 343 >>> c = Counter('abbbc') 344 >>> c -= Counter('bccd') 345 >>> c 346 Counter({'b': 2, 'a': 1}) 347 348 ''' 349 for elem, count in other.items(): 350 self[elem] -= count 351 return self._keep_positive() 352 353 def __ior__(self, other): 354 '''Inplace union is the maximum of value from either counter. 355 356 >>> c = Counter('abbb') 357 >>> c |= Counter('bcc') 358 >>> c 359 Counter({'b': 3, 'c': 2, 'a': 1}) 360 361 ''' 362 for elem, other_count in other.items(): 363 count = self[elem] 364 if other_count > count: 365 self[elem] = other_count 366 return self._keep_positive() 367 368 def __iand__(self, other): 369 '''Inplace intersection is the minimum of corresponding counts. 370 371 >>> c = Counter('abbb') 372 >>> c &= Counter('bcc') 373 >>> c 374 Counter({'b': 1}) 375 376 ''' 377 for elem, count in self.items(): 378 other_count = other[elem] 379 if other_count < count: 380 self[elem] = other_count 381 return self._keep_positive()
命名tuple中的元素来使程序更具可读性 。
1 from collections import namedtuple 2 3 City = namedtuple('City', 'name title popu coor') 4 tokyo = City('Tokyo', '下辈子让我做系守的姑娘吧!下辈子让我做东京的帅哥吧!', 36.933, (35.689722, 139.691667)) 5 print(tokyo) 6 # City(name='Tokyo', title='下辈子让我做系守的姑娘吧!下辈子让我做东京的帅哥吧!', popu=36.933, coor=(35.689722, 139.691667))
1 def namedtuple(typename, field_names, *, verbose=False, rename=False, module=None): 2 """Returns a new subclass of tuple with named fields. 3 4 >>> Point = namedtuple('Point', ['x', 'y']) 5 >>> Point.__doc__ # docstring for the new class 6 'Point(x, y)' 7 >>> p = Point(11, y=22) # instantiate with positional args or keywords 8 >>> p[0] + p[1] # indexable like a plain tuple 9 33 10 >>> x, y = p # unpack like a regular tuple 11 >>> x, y 12 (11, 22) 13 >>> p.x + p.y # fields also accessible by name 14 33 15 >>> d = p._asdict() # convert to a dictionary 16 >>> d['x'] 17 11 18 >>> Point(**d) # convert from a dictionary 19 Point(x=11, y=22) 20 >>> p._replace(x=100) # _replace() is like str.replace() but targets named fields 21 Point(x=100, y=22) 22 23 """ 24 25 # Validate the field names. At the user's option, either generate an error 26 # message or automatically replace the field name with a valid name. 27 if isinstance(field_names, str): 28 field_names = field_names.replace(',', ' ').split() 29 field_names = list(map(str, field_names)) 30 typename = str(typename) 31 if rename: 32 seen = set() 33 for index, name in enumerate(field_names): 34 if (not name.isidentifier() 35 or _iskeyword(name) 36 or name.startswith('_') 37 or name in seen): 38 field_names[index] = '_%d' % index 39 seen.add(name) 40 for name in [typename] + field_names: 41 if type(name) is not str: 42 raise TypeError('Type names and field names must be strings') 43 if not name.isidentifier(): 44 raise ValueError('Type names and field names must be valid ' 45 'identifiers: %r' % name) 46 if _iskeyword(name): 47 raise ValueError('Type names and field names cannot be a ' 48 'keyword: %r' % name) 49 seen = set() 50 for name in field_names: 51 if name.startswith('_') and not rename: 52 raise ValueError('Field names cannot start with an underscore: ' 53 '%r' % name) 54 if name in seen: 55 raise ValueError('Encountered duplicate field name: %r' % name) 56 seen.add(name) 57 58 # Fill-in the class template 59 class_definition = _class_template.format( 60 typename = typename, 61 field_names = tuple(field_names), 62 num_fields = len(field_names), 63 arg_list = repr(tuple(field_names)).replace("'", "")[1:-1], 64 repr_fmt = ', '.join(_repr_template.format(name=name) 65 for name in field_names), 66 field_defs = '\n'.join(_field_template.format(index=index, name=name) 67 for index, name in enumerate(field_names)) 68 ) 69 70 # Execute the template string in a temporary namespace and support 71 # tracing utilities by setting a value for frame.f_globals['__name__'] 72 namespace = dict(__name__='namedtuple_%s' % typename) 73 exec(class_definition, namespace) 74 result = namespace[typename] 75 result._source = class_definition 76 if verbose: 77 print(result._source) 78 79 # For pickling to work, the __module__ variable needs to be set to the frame 80 # where the named tuple is created. Bypass this step in environments where 81 # sys._getframe is not defined (Jython for example) or sys._getframe is not 82 # defined for arguments greater than 0 (IronPython), or where the user has 83 # specified a particular module. 84 if module is None: 85 try: 86 module = _sys._getframe(1).f_globals.get('__name__', '__main__') 87 except (AttributeError, ValueError): 88 pass 89 if module is not None: 90 result.__module__ = module 91 92 return result
用来合并多个字典。
1 from collections import ChainMap 2 3 cm = ChainMap( 4 {"Apple": 18}, 5 {"Orange": 20}, 6 {"Mango": 22}, 7 {"pineapple": 24}, 8 ) 9 print(cm) 10 # ChainMap({'Apple': 18}, {'Orange': 20}, {'Mango': 22}, {'pineapple': 24})
1 class ChainMap(MutableMapping): 2 ''' A ChainMap groups multiple dicts (or other mappings) together 3 to create a single, updateable view. 4 5 The underlying mappings are stored in a list. That list is public and can 6 be accessed or updated using the *maps* attribute. There is no other 7 state. 8 9 Lookups search the underlying mappings successively until a key is found. 10 In contrast, writes, updates, and deletions only operate on the first 11 mapping. 12 13 ''' 14 15 def __init__(self, *maps): 16 '''Initialize a ChainMap by setting *maps* to the given mappings. 17 If no mappings are provided, a single empty dictionary is used. 18 19 ''' 20 self.maps = list(maps) or [{}] # always at least one map 21 22 def __missing__(self, key): 23 raise KeyError(key) 24 25 def __getitem__(self, key): 26 for mapping in self.maps: 27 try: 28 return mapping[key] # can't use 'key in mapping' with defaultdict 29 except KeyError: 30 pass 31 return self.__missing__(key) # support subclasses that define __missing__ 32 33 def get(self, key, default=None): 34 return self[key] if key in self else default 35 36 def __len__(self): 37 return len(set().union(*self.maps)) # reuses stored hash values if possible 38 39 def __iter__(self): 40 return iter(set().union(*self.maps)) 41 42 def __contains__(self, key): 43 return any(key in m for m in self.maps) 44 45 def __bool__(self): 46 return any(self.maps) 47 48 @_recursive_repr() 49 def __repr__(self): 50 return '{0.__class__.__name__}({1})'.format( 51 self, ', '.join(map(repr, self.maps))) 52 53 @classmethod 54 def fromkeys(cls, iterable, *args): 55 'Create a ChainMap with a single dict created from the iterable.' 56 return cls(dict.fromkeys(iterable, *args)) 57 58 def copy(self): 59 'New ChainMap or subclass with a new copy of maps[0] and refs to maps[1:]' 60 return self.__class__(self.maps[0].copy(), *self.maps[1:]) 61 62 __copy__ = copy 63 64 def new_child(self, m=None): # like Django's Context.push() 65 '''New ChainMap with a new map followed by all previous maps. 66 If no map is provided, an empty dict is used. 67 ''' 68 if m is None: 69 m = {} 70 return self.__class__(m, *self.maps) 71 72 @property 73 def parents(self): # like Django's Context.pop() 74 'New ChainMap from maps[1:].' 75 return self.__class__(*self.maps[1:]) 76 77 def __setitem__(self, key, value): 78 self.maps[0][key] = value 79 80 def __delitem__(self, key): 81 try: 82 del self.maps[0][key] 83 except KeyError: 84 raise KeyError('Key not found in the first mapping: {!r}'.format(key)) 85 86 def popitem(self): 87 'Remove and return an item pair from maps[0]. Raise KeyError is maps[0] is empty.' 88 try: 89 return self.maps[0].popitem() 90 except KeyError: 91 raise KeyError('No keys found in the first mapping.') 92 93 def pop(self, key, *args): 94 'Remove *key* from maps[0] and return its value. Raise KeyError if *key* not in maps[0].' 95 try: 96 return self.maps[0].pop(key, *args) 97 except KeyError: 98 raise KeyError('Key not found in the first mapping: {!r}'.format(key)) 99 100 def clear(self): 101 'Clear maps[0], leaving maps[1:] intact.' 102 self.maps[0].clear()
UserDict是MutableMapping
和Mapping
的子类,它继承了MutableMapping.update
和Mapping.get
两个重要的方法 。
1 from collections import UserDict 2 3 class DictKeyToStr(UserDict): 4 def __missing__(self, key): 5 if isinstance(key, str): 6 raise KeyError(key) 7 return self[str(key)] 8 9 def __contains__(self, key): 10 return str(key) in self.data 11 12 def __setitem__(self, key, item): 13 self.data[str(key)] = item 14 # 该函数可以不实现 15 ''' 16 def get(self, key, default=None): 17 try: 18 return self[key] 19 except KeyError: 20 return default 21 '''
1 class UserDict(MutableMapping): 2 3 # Start by filling-out the abstract methods 4 def __init__(*args, **kwargs): 5 if not args: 6 raise TypeError("descriptor '__init__' of 'UserDict' object " 7 "needs an argument") 8 self, *args = args 9 if len(args) > 1: 10 raise TypeError('expected at most 1 arguments, got %d' % len(args)) 11 if args: 12 dict = args[0] 13 elif 'dict' in kwargs: 14 dict = kwargs.pop('dict') 15 import warnings 16 warnings.warn("Passing 'dict' as keyword argument is deprecated", 17 DeprecationWarning, stacklevel=2) 18 else: 19 dict = None 20 self.data = {} 21 if dict is not None: 22 self.update(dict) 23 if len(kwargs): 24 self.update(kwargs) 25 def __len__(self): return len(self.data) 26 def __getitem__(self, key): 27 if key in self.data: 28 return self.data[key] 29 if hasattr(self.__class__, "__missing__"): 30 return self.__class__.__missing__(self, key) 31 raise KeyError(key) 32 def __setitem__(self, key, item): self.data[key] = item 33 def __delitem__(self, key): del self.data[key] 34 def __iter__(self): 35 return iter(self.data) 36 37 # Modify __contains__ to work correctly when __missing__ is present 38 def __contains__(self, key): 39 return key in self.data 40 41 # Now, add the methods in dicts but not in MutableMapping 42 def __repr__(self): return repr(self.data) 43 def copy(self): 44 if self.__class__ is UserDict: 45 return UserDict(self.data.copy()) 46 import copy 47 data = self.data 48 try: 49 self.data = {} 50 c = copy.copy(self) 51 finally: 52 self.data = data 53 c.update(self) 54 return c 55 @classmethod 56 def fromkeys(cls, iterable, value=None): 57 d = cls() 58 for key in iterable: 59 d[key] = value 60 return d
Python与Java的变量本质上不一样,Python的变量本事是个指针。当Python解释器执行number=1
的时候,实际上先在内存中创建一个int
对象,然后将number
指向这个int
对象的内存地址,也就是将number
“贴”在int
对象上,测试用例如下:
1 number = [1, 2, 3] 2 demo = number 3 demo.append(4) 4 print(number) 5 # [1, 2, 3, 4]
1 class Person(object): 2 pass 3 4 p_0 = Person() 5 6 p_1 = Person() 7 8 print(p_0 is p_1) # False 9 print(p_0 == p_1) # False 10 print(id(p_0)) # 2972754016464 11 print(id(p_1)) # 2972754016408 12 13 li_a = [1, 2, 3, 4] 14 li_b = [1, 2, 3, 4] 15 16 print(li_a is li_b) # False 17 print(li_a == li_b) # True 18 print(id(li_a)) # 2972770077064 19 print(id(li_b)) # 2972769996680 20 21 a = 1 22 b = 1 23 24 print(a is b) # True 25 print(a == b) # True 26 print(id(a)) # 1842179136 27 print(id(b)) # 1842179136
Python中的del
语句并不等同于C++中的delete
,Python中的del
是将这个对象的指向删除,当这个对象没有任何指向的时候,Python虚拟机才会删除这个对象。
No.34 Python元类编程
property动态属性
1 class Home(object): 2 3 def __init__(self, age): 4 self.__age = age 5 6 @property 7 def age(self): 8 return self.__age 9 10 if __name__ == '__main__': 11 12 home = Home(21) 13 print(home.age) # 21
__getattr__和__getattribute__函数的使用
__getattr__在查找属性的时候,找不到该属性就会调用这个函数。
1 class Demo(object): 2 3 def __init__(self, user, passwd): 4 self.user = user 5 self.password = passwd 6 7 def __getattr__(self, item): 8 return 'Not find Attr.' 9 10 if __name__ == '__main__': 11 12 d = Demo('Bob', '123456') 13 14 print(d.User)
__getattribute__
class Demo(object): def __init__(self, user, passwd): self.user = user self.password = passwd def __getattr__(self, item): return 'Not find Attr.' def __getattribute__(self, item): print('Hello.') if __name__ == '__main__': d = Demo('Bob', '123456') print(d.User) # Hello. # None
在一个类中实现__get__()
、__set__()
和__delete__()
都是属性描述符。
1 import numbers 2 3 class IntField(object): 4 5 def __init__(self): 6 self.v = 0 7 8 def __get__(self, instance, owner): 9 return self.v 10 11 def __set__(self, instance, value): 12 if(not isinstance(value, numbers.Integral)): 13 raise ValueError("Int value need.") 14 self.v = value 15 16 def __delete__(self, instance): 17 pass
-
值得注意的是,只要实现这三种方法中的任何一个都是描述符。
-
仅实现
__get__()
方法的叫做非数据描述符,只有在初始化之后才能被读取。 -
同时实现
__get__()
和__set__()
方法的叫做数据描述符,属性是可读写的。
属性访问的优先规则
对象的属性一般是在__dict__
中存储,在Python中,__getattribute__()
实现了属性访问的相关规则。
假定存在实例obj
,属性number
在obj
中的查找过程是这样的:
-
搜索基类列表
type(b).__mro__
,直到找到该属性,并赋值给descr
。 -
判断
descr
的类型,如果是数据描述符则调用descr.__get__(b, type(b))
,并将结果返回。 -
如果是其他的(非数据描述符、普通属性、没找到的类型)则查找实例
obj
的实例属性,也就是obj.__dict__
。 -
如果在
obj.__dict__
没有找到相关属性,就会重新回到descr
的判断上。 -
如果再次判断
descr
类型为非数据描述符,就会调用descr.__get__(b, type(b))
,并将结果返回,结束执行。 -
如果
descr
是普通属性,直接就返回结果。 -
如果第二次没有找到,为空,就会触发
AttributeError
异常,并且结束查找。
用流程图表示:
-
__new__()
函数用来控制对象的生成过程,在对象上生成之前调用。 -
__init__()
函数用来对对象进行完善,在对象生成之后调用。 -
如果
__new__()
函数不返回对象,就不会调用__init__()
函数。
自定义元类
在Python中一切皆对象,类用来描述如何生成对象,在Python中类也是对象,原因是它具备创建对象的能力。当Python解释器执行到class
语句的时候,就会创建这个所谓类的对象。既然类是个对象,那么就可以动态的创建类。这里我们用到type()
函数,下面是此函数的构造函数源码:
1 def __init__(cls, what, bases=None, dict=None): # known special case of type.__init__ 2 """ 3 type(object_or_name, bases, dict) 4 type(object) -> the object's type 5 type(name, bases, dict) -> a new type 6 # (copied from class doc) 7 """ 8 pass
1 def bar(): 2 print("Hello...") 3 4 user = type('User', (object, ), { 5 'name': 'Bob', 6 'age': 20, 7 'bar': bar, 8 }) 9 10 user.bar() # Hello... 11 print(user.name, user.age) # Bob 20
1 class MetaClass(type): 2 def __new__(cls, *args, **kwargs): 3 return super().__new__(cls, *args, **kwargs) 4 5 class User(metaclass=MetaClass): 6 pass
元类的主要用途就是创建API,比如Python中的ORM框架。
“元类就是深度的魔法,99%的用户应该根本不必为此操心。如果你想搞清楚究竟是否需要用到元类,那么你就不需要它。那些实际用到元类的人都非常清楚地知道他们需要做什么,而且根本不需要解释为什么要用元类。”
当容器中的元素很多的时候,不可能全部读取到内存,那么就需要一种算法来推算下一个元素,这样就不必创建很大的容器,生成器就是这个作用。
Python中的生成器使用yield
返回值,每次调用yield
会暂停,因此生成器不会一下子全部执行完成,是当需要结果时才进行计算,当函数执行到yield
的时候,会返回值并且保存当前的执行状态,也就是函数被挂起了。我们可以使用next()
函数和send()
函数恢复生成器,将列表推导式的[]
换成()
就会变成一个生成器:
1 my_iter = (x for x in range(10)) 2 3 for i in my_iter: 4 print(i)
Python虚拟机中有一个栈帧的调用栈,栈帧保存了指定的代码的信息和上下文,每一个栈帧都有自己的数据栈和块栈,由于这些栈帧保存在堆内存中,使得解释器有中断和恢复栈帧的能力:
1 import inspect 2 3 frame = None 4 5 def foo(): 6 global frame 7 frame = inspect.currentframe() 8 9 def bar(): 10 foo() 11 12 bar() 13 14 print(frame.f_code.co_name) # foo 15 print(frame.f_back.f_code.co_name) # bar
迭代器是一种不同于for循环的访问集合内元素的一种方式,一般用来遍历数据,迭代器提供了一种惰性访问数据的方式。
可以使用for循环的有以下几种类型:
-
集合数据类型
-
生成器,包括生成器和带有
yield
的生成器函数
这些可以直接被for循环调用的对象叫做可迭代对象,可以使用isinstance()
判断一个对象是否为可Iterable
对象。集合数据类型如list
、dict
、str
等是Iterable
但不是Iterator
,可以通过iter()
函数获得一个Iterator
对象。send()
和next()
的区别就在于send()
可传递参数给yield()
表达式,这时候传递的参数就会作为yield
表达式的值,而yield
的参数是返回给调用者的值,也就是说send
可以强行修改上一个yield表达式值。
关于Python网络、并发、爬虫的原理详解请看我博客的其他文章。