defaultdict()和namedtuple()是collections模块里面2个很实用的扩展类型。一个继承自dict系统内置类型,一个继承自tuple系统内置类型。在扩展的同时都添加了额外的很酷的特性,而且在特定的场合都很实用。
返回一个和dictionary类似的对象,和dict不同主要体现在2个方面:
defaultdict_egfrom collections import defaultdict s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)] d = defaultdict(list) for k, v in s: d[k].append(v) print(list(d.items())) d_2 = {} for k, v in s: d_2.setdefault(k, []).append(v) print(list(d_2.items())) d_3 = {} for k, v in s: d_3[k].append(v) print(d_3.items())
输出:
defaultdict_result[('red', [1]), ('blue', [2, 4]), ('yellow', [1, 3])] [('red', [1]), ('blue', [2, 4]), ('yellow', [1, 3])] Traceback (most recent call last): File "C:/Users/Administrator/Desktop/Python Scripts/collection_eg.py", line 22, in <module> d_3[k].append(v) KeyError: 'yellow'
d = defaultdict(list),该语句创建一个defaultdict类型(你可以想象为dict类型),value的类型是list。通过对d_3的对比就能看到,defaultdict是可以直接就进行d[k]的操作,即使d此时还是一个空的字典。实际过程就是示例里d_2的处理过程。
主要使用根据数据创建字典时。当你需要为一些数据生成字典,而且对值的类型进行限定的时候,考虑defaultdict。
namedtuple是继承自tuple的子类。namedtuple和tuple比,有更多更酷的特性。namedtuple创建一个和tuple类似的对象,而且对象拥有可以访问的属性。这对象更像带有数据属性的类,不过数据属性是只读的。
namedtuple_eg>>> from collections import namedtuple >>> TPoint = namedtuple('TPoint', ['x', 'y']) >>> p = TPoint(x=10, y=10) >>> p TPoint(x=10, y=10) >>> p.x 10 >>> p.y 10 >>> p[0] 10 >>> type(p) <class '__main__.TPoint'> >>> for i in p: print(i) 10 10 >>>
TPoint = namedtuple('TPoint', ['x', 'y']) 创建一个TPoint类型,而且带有属性x, y.
通过上面的示例,可以看出不仅可以通过p.x, p.y的方式访问p的属性,而且还可以使用for来进行遍历。这些就和tuple是一样的。
还可以通过设置参数来看namedtuple的全貌。
namedtuple_eg2TPoint = namedtuple('TPoint', ['x', 'y'], verbose=True) from builtins import property as _property, tuple as _tuple from operator import itemgetter as _itemgetter from collections import OrderedDict class TPoint(tuple): 'TPoint(x, y)' __slots__ = () _fields = ('x', 'y') def __new__(_cls, x, y): 'Create new instance of TPoint(x, y)' return _tuple.__new__(_cls, (x, y)) @classmethod def _make(cls, iterable, new=tuple.__new__, len=len): 'Make a new TPoint object from a sequence or iterable' result = new(cls, iterable) if len(result) != 2: raise TypeError('Expected 2 arguments, got %d' % len(result)) return result def __repr__(self): 'Return a nicely formatted representation string' return self.__class__.__name__ + '(x=%r, y=%r)' % self def _asdict(self): 'Return a new OrderedDict which maps field names to their values' return OrderedDict(zip(self._fields, self)) __dict__ = property(_asdict) def _replace(_self, **kwds): 'Return a new TPoint object replacing specified fields with new values' result = _self._make(map(kwds.pop, ('x', 'y'), _self)) if kwds: raise ValueError('Got unexpected field names: %r' % list(kwds)) return result def __getnewargs__(self): 'Return self as a plain tuple. Used by copy and pickle.' return tuple(self) x = _property(_itemgetter(0), doc='Alias for field number 0') y = _property(_itemgetter(1), doc='Alias for field number 1')
这里就显示出了namedtuple的一些方法。很明显的看到namedtuple是直接继承自tuple的。
几个重要的方法:
1.把数据变成namedtuple类:
namedtuple_eg3>>> TPoint = namedtuple('TPoint', ['x', 'y']) >>> t = [11, 22] >>> p = TPoint._make(t) >>> p TPoint(x=11, y=22)
>>>
2. 根据namedtuple创建的类生成的类示例,其数据是只读的,如果要进行更新需要调用方法_replace.
namedtuple_eg4>>> p TPoint(x=11, y=22) >>> p.y 22 >>> p.y = 33 Traceback (most recent call last): File "<pyshell#18>", line 1, in <module> p.y = 33 AttributeError: can't set attribute >>> p._replace(y=33) TPoint(x=11, y=33) >>>
3.将字典数据转换成namedtuple类型。
namedtuple_eg5>>> d = {'x': 44, 'y': 55} >>> dp = TPoint(**d) >>> dp TPoint(x=44, y=55) >>>
namedtuple最常用还是出现在处理来csv或者数据库返回的数据上。利用map()函数和namedtuple建立类型的_make()方法。
namedtuple_eg6EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade') import csv for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))): print(emp.name, emp.title) # sqlite数据库 import sqlite3 conn = sqlite3.connect('/companydata') cursor = conn.cursor() cursor.execute('SELECT name, age, title, department, paygrade FROM employees') for emp in map(EmployeeRecord._make, cursor.fetchall()): print(emp.name, emp.title) # MySQL 数据库 import mysql from mysql import connector from collections import namedtuple user = 'herbert' pwd = '######' host = '127.0.0.1' db = 'world' cnx = mysql.connector.connect(user=user, password=pwd, host=host,database=db) cur.execute("SELECT Name, CountryCode, District, Population FROM CITY where CountryCode = 'CHN' AND Population > 500000") CityRecord = namedtuple('City', 'Name, Country, Dsitrict, Population') for city in map(CityRecord._make, cur.fetchall()): print(city.Name, city.Population)