做竞赛项目用到了这个,看到网上只有C的实现,项目用的py想都一起写成py吧,于是用numpy写了一个。
参考:https://blog.csdn.net/jiaomeng/article/details/1619321
里面有几种不同的布隆筛(Bloom Filter & CBF & DCF)的C实现,很棒。
btw,上面的链接中那位博主提供的代码需科学上网才能下载
注:博主水平非常一般,有错误的话欢迎指正…
以下是源码:
import numpy as np
import mmh3
class DCF(set):
def __init__(self, m, x, y, k):
self.CBFV = np.zeros((m, x), dtype=np.bool)
self.OFV = np.zeros((m, y), dtype=np.bool)
self.hash_count = k
self.len = m
self.CBFV_bit = x
self.OFV_bit = y
def add(self, item):
for i in range(self.hash_count):
index = mmh3.hash(item, i) % self.len
count = compute(self.CBFV[index])
if (count == 2**self.CBFV_bit):
f = compute(self.OFV[index])
f += 1
self.OFV[index] = decompute(f, self.OFV_bit)
else:
count += 1
self.CBFV[index] = decompute(count, self.CBFV_bit)
def find_times(self, item):
final = 0
for i in range(self.hash_count):
index = mmh3.hash(item, i) % self.len
count = compute(self.CBFV[index])
if count == 0:
return 0
elif count == 2**self.CBFV_bit:
count += compute(self.OFV[index])
if final == 0:
final = count
elif final > count:
final = count
return final
def compute(li):
L = len(li)
sum = 0
for i in range(L):
sum += li[L - i - 1]*(2**i)
return sum
def decompute(num, L):
li = np.zeros(L, dtype=np.bool)
i = L-1
while num != 0:
li[i] = num % 2
num = num // 2
i -= 1
return li
# 懒得改项目中用的数据了 用的时候自己改吧
# 变量的意义和下面的原论文中相同字母的意义相同
m = 2**17
k = 9
x = 4
y = 7
bloom = DCF(m, x, y, k)
参考:原论文:http://delivery.acm.org/10.1145/1130000/1122000/p26-aguilar-saborit.pdf?ip=202.112.129.243&id=1122000&acc=ACTIVE%20SERVICE&key=BF85BBA5741FDC6E%2E478E8F2EC4A762F8%2E4D4702B0C3E38B35%2E4D4702B0C3E38B35&acm=1526393650_7a7f5c8becc5625a7b316db92667e87a
再注:博主水平非常一般,如有错误欢迎评论指正…