Dynamic Count Filter(DCF)布隆过滤器 Python实现

做竞赛项目用到了这个,看到网上只有C的实现,项目用的py想都一起写成py吧,于是用numpy写了一个。

参考:https://blog.csdn.net/jiaomeng/article/details/1619321
里面有几种不同的布隆筛(Bloom Filter & CBF & DCF)的C实现,很棒。
btw,上面的链接中那位博主提供的代码需科学上网才能下载

:博主水平非常一般,有错误的话欢迎指正…

以下是源码:

import numpy as np
import mmh3

class DCF(set):
	def __init__(self, m, x, y, k):
		self.CBFV = np.zeros((m, x), dtype=np.bool)
		self.OFV = np.zeros((m, y), dtype=np.bool)
		self.hash_count = k
		self.len = m
		self.CBFV_bit = x
		self.OFV_bit = y

	def add(self, item):
		for i in range(self.hash_count):
			index = mmh3.hash(item, i) % self.len
			count = compute(self.CBFV[index])
			if (count == 2**self.CBFV_bit):
				f = compute(self.OFV[index])
				f += 1
				self.OFV[index] = decompute(f, self.OFV_bit)
			else:
				count += 1
				self.CBFV[index] = decompute(count, self.CBFV_bit)

	def find_times(self, item):
		final = 0
		for i in range(self.hash_count):
			index = mmh3.hash(item, i) % self.len
			count = compute(self.CBFV[index])
			if count == 0:
				return 0
			elif count == 2**self.CBFV_bit:
				count += compute(self.OFV[index])

			if final == 0:
				final = count
			elif final > count:
				final = count
		return final


def compute(li):
	L = len(li)
	sum = 0
	for i in range(L):
		sum += li[L - i - 1]*(2**i)
	return sum

def decompute(num, L):
	li = np.zeros(L, dtype=np.bool)
	i = L-1
	while num != 0:
		li[i] = num % 2
		num = num // 2
		i -= 1
	return li

# 懒得改项目中用的数据了 用的时候自己改吧
# 变量的意义和下面的原论文中相同字母的意义相同

m = 2**17
k = 9
x = 4
y = 7
bloom = DCF(m, x, y, k)

参考:原论文:http://delivery.acm.org/10.1145/1130000/1122000/p26-aguilar-saborit.pdf?ip=202.112.129.243&id=1122000&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2E478E8F2EC4A762F8%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&acm=1526393650_7a7f5c8becc5625a7b316db92667e87a

再注:博主水平非常一般,如有错误欢迎评论指正…

你可能感兴趣的:(随便写写python)