首页 > 代码库 > 【Python模块学习】4、collections模块
【Python模块学习】4、collections模块
collections是Python内建的一个集合模块,提供了许多有用的集合类。
namedtuple() |
factory function for creating tuple subclasses with named fields |
deque |
list-like container with fast appends and pops on either end |
ChainMap |
dict-like class for creating a single view of multiple mappings |
Counter |
dict subclass for counting hashable objects |
OrderedDict |
dict subclass that remembers the order entries were added |
defaultdict |
dict subclass that calls a factory function to supply missing values |
UserDict |
wrapper around dictionary objects for easier dict subclassing |
UserList |
wrapper around list objects for easier list subclassing |
UserString |
wrapper around string objects for easier string subclassing |
1、namedtuple()
namedtuple 是一个函数,它用来创建一个自定义的元组对象,并且规定了元组元素的个数,并可以用属性而不是索引来引用元组的某个元素。可以通过 namedtuple 来定义一种数据类型,它具备元组的不变性,又可以根据属性来引用,十分方便。
1 >>> # 基本用法 2 >>> Point = namedtuple(‘Point‘, [‘x‘, ‘y‘]) 3 >>> p = Point(11, y=22) # instantiate with positional or keyword arguments 4 >>> p[0] + p[1] # 跟tuple一样,可用index 5 33 6 >>> x, y = p # 跟tuple一样赋值 7 >>> x, y 8 (11, 22) 9 >>> p.x + p.y # 可用name取值 10 33 11 >>> p # readable __repr__ with a name=value style 12 Point(x=11, y=22)
somenamedtuple._make(iterable) # 直接把可迭代的数列转换为namedtuple
somenamedtuple._asdict() # 把“name”和值对应起来
somenamedtuple._replace(**kwargs) # 该值,tuple是不可变的,这里有_replace方法可该值
somenamedtuple._source # ...
somenamedtuple._fields # 该namedtuple里值的名称
1 >>>#somenamedtuple._make(iterable) # 直接把可迭代的数列转换为namedtuple 2 >>> t = [11, 22] 3 >>> Point._make(t) 4 Point(x=11, y=22)
1 >>> # somenamedtuple._asdict() # 把“name”和值对应起来 2 >>> p = Point(x=11, y=22) 3 >>> p._asdict() 4 OrderedDict([(‘x‘, 11), (‘y‘, 22)])
1 >>> # somenamedtuple._replace(**kwargs) # tuple是不可变的,但namedtuple有_replace方法可改值 2 >>> p = Point(x=11, y=22) 3 >>> p._replace(x=33) 4 Point(x=33, y=22)
1 >>> # somenamedtuple._source # ...
1 >>> # somenamedtuple._fields # 该namedtuple里值的名称 2 >>> p._fields # view the field names 3 (‘x‘, ‘y‘)
2、deque()
使用list 存储数据时,按照索引访问元素很快,但是插入和删除元素就很慢了,因为list是线性存储,数据量大的时候,插入和删除效率很低。deque是为了高效实现插入和删除操作的双向列表,适合用于队列和栈。
1 >>> from collections import deque 2 >>> d = deque(‘ghi‘) # make a new deque with three items 3 >>> for elem in d: # iterate over the deque‘s elements 4 ... print(elem.upper()) 5 G 6 H 7 I 8 9 >>> d.append(‘j‘) # add a new entry to the right side 10 >>> d.appendleft(‘f‘) # add a new entry to the left side 11 >>> d # show the representation of the deque 12 deque([‘f‘, ‘g‘, ‘h‘, ‘i‘, ‘j‘]) 13 14 >>> d.pop() # return and remove the rightmost item 15 ‘j‘ 16 >>> d.popleft() # return and remove the leftmost item 17 ‘f‘ 18 >>> list(d) # list the contents of the deque 19 [‘g‘, ‘h‘, ‘i‘] 20 >>> d[0] # peek at leftmost item 21 ‘g‘ 22 >>> d[-1] # peek at rightmost item 23 ‘i‘ 24 25 >>> list(reversed(d)) # list the contents of a deque in reverse 26 [‘i‘, ‘h‘, ‘g‘] 27 >>> ‘h‘ in d # search the deque 28 True 29 >>> d.extend(‘jkl‘) # add multiple elements at once 30 >>> d 31 deque([‘g‘, ‘h‘, ‘i‘, ‘j‘, ‘k‘, ‘l‘]) 32 >>> d.rotate(1) # right rotation 33 >>> d 34 deque([‘l‘, ‘g‘, ‘h‘, ‘i‘, ‘j‘, ‘k‘]) 35 >>> d.rotate(-1) # left rotation 36 >>> d 37 deque([‘g‘, ‘h‘, ‘i‘, ‘j‘, ‘k‘, ‘l‘]) 38 39 >>> deque(reversed(d)) # make a new deque in reverse order 40 deque([‘l‘, ‘k‘, ‘j‘, ‘i‘, ‘h‘, ‘g‘]) 41 >>> d.clear() # empty the deque 42 >>> d.pop() # cannot pop from an empty deque 43 Traceback (most recent call last): 44 File "<pyshell#6>", line 1, in -toplevel- 45 d.pop() 46 IndexError: pop from an empty deque 47 48 >>> d.extendleft(‘abc‘) # extendleft() reverses the input order 49 >>> d 50 deque([‘c‘, ‘b‘, ‘a‘])
4、Counter()
Basic:可通过tuple、dict、list、str初始化Counter
>>> c = Counter() # a new, empty counter >>> c = Counter(‘gallahad‘) # a new counter from an iterable >>> c = Counter({‘red‘: 4, ‘blue‘: 2}) # a new counter from a mapping >>> c = Counter(cats=4, dogs=8) # a new counter from keyword args >>> c = Counter([‘eggs‘, ‘ham‘]) >>> c[‘bacon‘] # count of a missing element is zero 0
Counter对象类似于字典,如果某个项缺失,会返回0,而不是报出KeyError;
1 >>> c = Counter([‘eggs‘,‘ham‘]) 2 >>> c[‘bacon‘]#没有‘bacon‘ 3 0 4 >>> c[‘eggs‘]#有‘eggs‘ 5 1
将一个元素的数目设置为0,并不能将它从counter中删除,使用del可以将这个元素删除;
1 >>> c 2 Counter({‘eggs‘: 1, ‘ham‘: 1}) 3 >>> c[‘eggs‘] = 0 4 >>> c 5 Counter({‘ham‘: 1, ‘eggs‘: 0})#‘eggs‘依然存在 6 >>> del c[‘eggs‘] 7 >>> c 8 Counter({‘ham‘: 1})#‘eggs‘不存在
Counter对象支持以下三个字典不支持的方法,elements(),most_common(),subtract();
element(),返回一个迭代器,每个元素重复的次数为它的数目,顺序是任意的顺序,如果一个元素的数目少于1,那么elements()就会忽略它;
1 >>> c = Counter(a=2,b=4,c=0,d=-2,e = 1) 2 >>> c 3 Counter({‘b‘: 4, ‘a‘: 2, ‘e‘: 1, ‘c‘: 0, ‘d‘: -2}) 4 >>> list(c.elements()) 5 [‘a‘, ‘a‘, ‘b‘, ‘b‘, ‘b‘, ‘b‘, ‘e‘]
most_common(),返回一个列表,包含counter中n个最大数目的元素
,如果忽略n或者为None,most_common()将会返回counter中的所有元素,元素有着相同数目的将会以任意顺序排列;
1 >>> Counter(‘abracadabra‘).most_common(3) 2 [(‘a‘, 5), (‘r‘, 2), (‘b‘, 2)] 3 >>> Counter(‘abracadabra‘).most_common() 4 [(‘a‘, 5), (‘r‘, 2), (‘b‘, 2), (‘c‘, 1), (‘d‘, 1)] 5 >>> Counter(‘abracadabra‘).most_common(None) 6 [(‘a‘, 5), (‘r‘, 2), (‘b‘, 2), (‘c‘, 1), (‘d‘, 1)]
subtract(),从一个可迭代对象中或者另一个映射(或counter)中,元素相减,类似于dict.update(),但是subtracts 数目而不是替换它们,输入和输出都有可能为0或者为负;
1 >>> c = Counter(a=4,b=2,c=0,d=-2) 2 >>> d = Counter(a=1,b=2,c=-3,d=4) 3 >>> c.subtract(d) 4 >>> c 5 Counter({‘a‘: 3, ‘c‘: 3, ‘b‘: 0, ‘d‘: -6})
update(),从一个可迭代对象中或者另一个映射(或counter)中所有元素相加,类似于dict.update,是数目相加而非替换它们,另外,可迭代对象是一个元素序列,而非(key,value)对构成的序列;
1 >>> c 2 Counter({‘a‘: 4, ‘b‘: 2, ‘c‘: 0, ‘d‘: -2}) 3 >>> d 4 Counter({‘d‘: 4, ‘b‘: 2, ‘a‘: 1, ‘c‘: -3}) 5 >>> c.update(d) 6 >>> c 7 Counter({‘a‘: 5, ‘b‘: 4, ‘d‘: 2, ‘c‘: -3})
Counter对象常见的操作
1 >>> c 2 Counter({‘a‘: 5, ‘b‘: 4, ‘d‘: 2, ‘c‘: -3}) 3 >>> sum(c.values())# 统计所有的数目 4 8 5 >>> list(c)# 列出所有唯一的元素 6 [‘a‘, ‘c‘, ‘b‘, ‘d‘] 7 >>> set(c)# 转换为set 8 set([‘a‘, ‘c‘, ‘b‘, ‘d‘]) 9 >>> dict(c)# 转换为常规的dict 10 {‘a‘: 5, ‘c‘: -3, ‘b‘: 4, ‘d‘: 2} 11 >>> c.items()# 转换为(elem,cnt)对构成的列表 12 [(‘a‘, 5), (‘c‘, -3), (‘b‘, 4), (‘d‘, 2)] 13 >>> c.most_common()[:-4:-1]# 输出n个数目最小元素 14 [(‘c‘, -3), (‘d‘, 2), (‘b‘, 4)] 15 >>> c += Counter()# 删除数目为0和为负的元素 16 >>> c 17 Counter({‘a‘: 5, ‘b‘: 4, ‘d‘: 2}) 18 >>> Counter(dict(c.items()))# 从(elem,cnt)对构成的列表转换为counter 19 Counter({‘a‘: 5, ‘b‘: 4, ‘d‘: 2}) 20 >>> c.clear()# 清空counter 21 >>> c 22 Counter()
在Counter对象进行数学操作,得多集合(counter中元素数目大于0)加法和减法操作,是相加或者相减对应元素的数目;交集和并集返回对应数目的最小值和最大值;每个操作均接受暑促是有符号的数目,但是输出并不包含数目为0或者为负的元素;
1 >>> c = Counter(a=3,b=1,c=-2) 2 >>> d = Counter(a=1,b=2,c=4) 3 >>> c+d#求和 4 Counter({‘a‘: 4, ‘b‘: 3, ‘c‘: 2}) 5 >>> c-d#求差 6 Counter({‘a‘: 2}) 7 >>> c & d#求交集 8 Counter({‘a‘: 1, ‘b‘: 1}) 9 >>> c | d#求并集 10 Counter({‘c‘: 4, ‘a‘: 3, ‘b‘: 2})
5、OrderedDict()
OrderedDict类似于正常的词典,只是它记住了元素插入的顺序,当在有序的词典上迭代时,返回的元素就是它们第一次添加的顺序。
basic
1 >>> d = {"banana":3,"apple":2,"pear":1,"orange":4} 2 >>> # dict sorted by key 3 >>> OrderedDict(sorted(d.items(),key = lambda t:t[0])) 4 OrderedDict([(‘apple‘, 2), (‘banana‘, 3), (‘orange‘, 4), (‘pear‘, 1)]) 5 >>> # dict sorted by value 6 >>> OrderedDict(sorted(d.items(),key = lambda t:t[1])) 7 OrderedDict([(‘pear‘, 1), (‘apple‘, 2), (‘banana‘, 3), (‘orange‘, 4)]) 8 >>> # dict sorted by length of key string 9 >>>a = OrderedDict(sorted(d.items(),key = lambda t:len(t[0]))) 10 >>>a 11 OrderedDict([(‘pear‘, 1), (‘apple‘, 2), (‘orange‘, 4), (‘banana‘, 3)]) 12 >>> del a[‘apple‘] 13 >>> a 14 OrderedDict([(‘pear‘, 1), (‘orange‘, 4), (‘banana‘, 3)]) 15 >>> a["apple"] = 2 16 >>> a 17 OrderedDict([(‘pear‘, 1), (‘orange‘, 4), (‘banana‘, 3), (‘apple‘, 2)])
popitem
(last=True)
- popitem方法返回和删除一个(key,value)对,如果last=True,就以LIFO方式执行,否则以FIFO方式执行。
move_to_end
(key, last=True)- move_to_end方法,last=True时,把字典里某一元素移动到OrderedDict的最后一位;last=False时,把字典里某一元素移动到OrderedDict的第一位。
1 >>> d = OrderedDict.fromkeys(‘abcde‘) 2 >>> d.move_to_end(‘b‘) 3 >>> ‘‘.join(d.keys()) 4 ‘acdeb‘ 5 >>> d.move_to_end(‘b‘, last=False) 6 >>> ‘‘.join(d.keys()) 7 ‘bacde‘
实例用法:
当元素删除时,排好序的词典保持着排序的顺序;但是当新元素添加时,就会被添加到末尾,就不能保持已排序。
创建一个有序的词典,可以记住最后插入的key的顺序,如果一个新的元素要重写已经存在的元素,那么原始的插入位置就会改变成末尾,
1 >>> class LastUpdatedOrderedDict(OrderedDict): 2 ... def __setitem__(self,key,value): 3 ... if key in self: 4 ... del self[key] 5 ... OrderedDict.__setitem__(self, key, value) 6 ... 7 >>> obj = LastUpdatedOrderedDict() 8 >>> obj["apple"] = 2 9 >>> obj["windows"] = 3 10 >>> obj 11 LastUpdatedOrderedDict([(‘apple‘, 2), (‘windows‘, 3)]) 12 >>> obj["apple"] = 1 13 >>> obj 14 LastUpdatedOrderedDict([(‘windows‘, 3), (‘apple‘, 1)])
一个有序的词典可以和Counter类一起使用,counter对象就可以记住元素首次出现的顺序:
1 from collections import Counter,OrderedDict 2 class OrderedCounter(Counter,OrderedDict): 3 def __repr__(self): 4 return "%s(%r)"%(self.__class__.__name__,OrderedDict(self)) 5 6 def __reduce__(self): 7 return self.__class__,(OrderedDict(self)) 8 9 #和OrderDict一起使用的Counter对象 10 obj = OrderedCounter() 11 wordList = ["b","a","c","a","c","a"] 12 for word in wordList: 13 obj[word] += 1 14 print(obj) 15 16 # 普通的Counter对象 17 cnt = Counter() 18 wordList = ["b","a","c","a","c","a"] 19 for word in wordList: 20 cnt[word] += 1 21 print(cnt) 22 23 >>>OrderedCounter(OrderedDict([(‘b‘, 1), (‘a‘, 3), (‘c‘, 2)])) 24 >>>Counter({‘a‘: 3, ‘c‘: 2, ‘b‘: 1})
6、defaultdict
defaultdict是内置数据类型dict的一个子类,基本功能与dict一样,只是重写了一个方法__missing__(key)和增加了一个可写的对象变量default_factory。
basic:
以下的default_factory分别为list、int、set
1 >>> s = [(‘yellow‘, 1), (‘blue‘, 2), (‘yellow‘, 3), (‘blue‘, 4), (‘red‘, 1)] 2 >>> d = defaultdict(list) 3 >>> for k, v in s: 4 ... d[k].append(v) 5 ... 6 >>> sorted(d.items()) 7 [(‘blue‘, [2, 4]), (‘red‘, [1]), (‘yellow‘, [1, 3])] 8 >>> d = {} 9 >>> for k, v in s: 10 ... d.setdefault(k, []).append(v) 11 ... 12 >>> sorted(d.items()) 13 [(‘blue‘, [2, 4]), (‘red‘, [1]), (‘yellow‘, [1, 3])] 14 15 >>> s = ‘mississippi‘ 16 >>> d = defaultdict(int) 17 >>> for k in s: 18 ... d[k] += 1 19 ... 20 >>> sorted(d.items()) 21 [(‘i‘, 4), (‘m‘, 1), (‘p‘, 2), (‘s‘, 4)] 22 >>> a = defaultdict(set) 23 >>> a[‘1‘].add(1) 24 >>> a[‘1‘].add(‘a‘) 25 >>> a[‘2‘].add(2) 26 >>> a[‘2‘].add(‘b‘) 27 >>> a 28 defaultdict(<class ‘set‘>, {‘1‘: {1, ‘a‘}, ‘2‘: {2, ‘b‘}})
1 >>> s = [(‘red‘, 1), (‘blue‘, 2), (‘red‘, 3), (‘blue‘, 4), (‘red‘, 1), (‘blue‘, 2 4)] 3 >>> d = defaultdict(set) 4 >>> for k,v in s:d[k].add(v) 5 ... 6 >>> d.items() 7 [(‘blue‘, set([2, 4])), (‘red‘, set([1, 3]))]
默认的,初始化后,未输入数据前,会有一个默认值如下:
1 >>> l = defaultdict(list) 2 >>> s = defaultdict(set) 3 >>> i = defaultdict(int) 4 >>> l[‘test‘] 5 [] 6 >>> s[‘test‘] 7 set() 8 >>> i[‘test‘] 9 0
此时,可初始化一个lambda函数,则默认值可为自己定义的值:
>>> def constant_factory(value): ... return lambda: value >>> d = defaultdict(constant_factory(‘<missing>‘)) >>> d.update(name=‘John‘, action=‘ran‘) >>> ‘%(name)s %(action)s to %(object)s‘ % d ‘John ran to <missing>‘
参考:老顽童;官网。
【Python模块学习】4、collections模块