2018-08-26

Bitmap算法

我们可能在算法书中都看过，对于海量数据的处理是有一些独特的算法的，通常来说如下六种：

序号算法

1 分而治之/hash映射 + hash统计 + 堆/快速/归并排序

2 双层桶划分

3 Bloom filter/Bitmap

4 Trie树/数据库/倒排索引

5 外排序

6 分布式处理之Hadoop/Mapreduce

这里我介绍的是Bitmap，BitMap就是用一个bit位来标记某个元素对应的Value，而Key即是该元素，该方法在快速查找、去重、排序、压缩数据上都有应用

假设我们要对0-7内的5个元素(4,7,2,5,3)排序，因为要表示8个数，我们就只需要8个bit（1Bytes）。

元素无无 2 3 4 5 无 7

占位 × × √ √ √ √ × √

地址 0 1 2 3 4 5 6 7

这样就排序完成了，该方法难在数对二进制位的映射，因为类型到底多长是和平台和环境有关的，我们假定int是32bit，之后假设我们现在有320个数据需要排序，则int a[1+10]，a[0]可以表示从0-31的共32个数字，a[1]可以表示从32-61的共32个数字，我们可以想象这是一个二位数组，但是其实并不是

我们可以很容易得出，对于一个十进制数n，对应在数组a[n/32][n%32]中

# encoding: utf-8

from collections import namedtuple

from copy import copy

Colour = namedtuple('Colour','r,g,b')

Colour.copy = lambda self: copy(self)

black = Colour(0,0,0)

white = Colour(255,255,255) # Colour ranges are not enforced.

class Bitmap():

def __init__(self, width = 40, height = 40, background=white):

assert width > 0 and height > 0 and type(background) == Colour

self.width = width

self.height = height

self.background = background

self.map = [[background.copy() for w in range(width)] for h in range(height)]

def fillrect(self, x, y, width, height, colour=black):

assert x >= 0 and y >= 0 and width > 0 and height > 0 and type(colour) == Colour

for h in range(height):

for w in range(width):

self.map[y+h][x+w] = colour.copy()

def chardisplay(self):

txt = [''.join(' ' if bit==self.background else '@'

for bit in row)

for row in self.map]

# Boxing

txt = ['|'+row+'|' for row in txt]

txt.insert(0, '+' + '-' * self.width + '+')

txt.append('+' + '-' * self.width + '+')

print('\n'.join(reversed(txt)))

def set(self, x, y, colour=black):

assert type(colour) == Colour

self.map[y][x]=colour

def get(self, x, y):

return self.map[y][x]

bitmap = Bitmap(20,10)

bitmap.fillrect(4, 5, 6, 3)

assert bitmap.get(5, 5) == black

assert bitmap.get(0, 1) == white

bitmap.set(0, 1, black)

assert bitmap.get(0, 1) == black

bitmap.chardisplay()

2018-08-26

你可能感兴趣的:(2018-08-26)