《OpenCV系列教程》
《深度学习-如何提高数据集质量》
代码位置:FindSimilarPicsMultithreading.py
经过上面的多篇博文对图片资源的整理,终于可以运行到这一部分了,这个是对像素分布进行对比,Demo运行成果后会把查找结果以网页的形式展现出来。
$file 0fab5c6288a6c43560c8b0a71fc632cb.jpeg d2aa75db5503af7bd7eb522919a26161.jpeg
0fab5c6288a6c43560c8b0a71fc632cb.jpeg: JPEG image data, JFIF standard 1.01, aspect ratio, density 1x1, segment length 16, baseline, precision 8, 1024x639, frames 3
d2aa75db5503af7bd7eb522919a26161.jpeg: JPEG image data, baseline, precision 8, 1024x639, frames 3
$ll 0fab5c6288a6c43560c8b0a71fc632cb.jpeg d2aa75db5503af7bd7eb522919a26161.jpeg
-rw-rw-r-- 1 king king 255617 5月 23 15:44 0fab5c6288a6c43560c8b0a71fc632cb.jpeg
-rw-rw-r-- 1 king king 255599 5月 23 15:44 d2aa75db5503af7bd7eb522919a26161.jpeg
从基础信息里可以看到,大小相差很小,分别率也是相同的。但这两张图片显示的内容是一个内容。
代码如下:
import cv2
import os
DirList = [
'/home/king/PycharmProjects/nsfw_data_scrapper/raw_data/drawings',
# '/home/king/PycharmProjects/nsfw_data_scrapper/raw_data/hentai',
# '/home/king/PycharmProjects/nsfw_data_scrapper/raw_data/neutral',
# '/home/king/PycharmProjects/nsfw_data_scrapper/raw_data/porn',
# '/home/king/PycharmProjects/nsfw_data_scrapper/raw_data/sexy'
]
m = dict()
html_start = '\n' \
'\n\n' \
'\n\n'
html_end = '\n' \
'\n'
div_string = '\n' \
'\n' \
'\n' \
'\n\n'
html = open('/home/king/Desktop/a.html', 'w')
html.write(html_start)
same = 0
num = 0
for path in DirList:
for filename in os.listdir(path):
fullName = os.path.join(path, filename)
if os.path.isfile(fullName):
im = cv2.imread(fullName, 0)
m[fullName] = cv2.calcHist([im], [0], None, [256], [0, 256])
num = num + 1
if num % 500 == 0:
print(num)
print('to compareHist')
keyList = list(m.keys())
for i in range(len(keyList)):
for j in range(i + 1, len(keyList)):
cmp = cv2.compareHist(m[keyList[i]], m[keyList[j]], cv2.HISTCMP_CHISQR_ALT)
score = cmp * 100
#print(score)
if score < 1:
html.write(div_string % (keyList[i], keyList[j]))
print(keyList[i], keyList[j])
html.write(html_end)
html.close()