着手做验证码后又查了查相关的文章,发现做这个东西不用ocr的话思路都是一样的,估计是大家相互参考的问题吧,都是先降噪,然后二值化,然后建立自己的样本库,再进行比对。
在降噪进行二值化前,需要处理下颜色的问题,中间查到有人用php在做验证码,对于进行验证码识别的具体步骤和注意点,下面两篇文章讲的是最清楚的,比前面的文章讲的都要仔细。
http://blog.csdn.net/ugg/archive/2009/03/03/3953137.aspx
http://blog.csdn.net/ugg/archive/2009/03/09/3972368.aspx
大概的处理验证码,几个步骤:
首先的一步是单值化,去噪点,然后生成单个数字的库。
1.先进行去噪点,二值化。
参考这里讲的,很清楚,直接可以拿来用于图像二值化。
http://dream-people.iteye.com/blog/378907
上面是我们需要进行处理的验证码图片
#! /usr/bin/env python #coding=utf-8 #coding:utf-8 import sys,os from PIL import Image,ImageDraw #二值数组 t2val = {} def twoValue(image,G): for y in xrange(0,image.size[1]): for x in xrange(0,image.size[0]): g = image.getpixel((x,y)) if g > G: t2val[(x,y)] = 1 else: t2val[(x,y)] = 0 # 降噪 # 根据一个点A的RGB值,与周围的8个点的RBG值比较,设定一个值N(0 <N <8),当A的RGB值与周围8个点的RGB相等数小于N时,此点为噪点 # G: Integer 图像二值化阀值 # N: Integer 降噪率 0 <N <8 # Z: Integer 降噪次数 # 输出 # 0:降噪成功 # 1:降噪失败 def clearNoise(image,N,Z): for i in xrange(0,Z): t2val[(0,0)] = 1 t2val[(image.size[0] - 1,image.size[1] - 1)] = 1 for x in xrange(1,image.size[0] - 1): for y in xrange(1,image.size[1] - 1): nearDots = 0 L = t2val[(x,y)] if L == t2val[(x - 1,y - 1)]: nearDots += 1 if L == t2val[(x - 1,y)]: nearDots += 1 if L == t2val[(x- 1,y + 1)]: nearDots += 1 if L == t2val[(x,y - 1)]: nearDots += 1 if L == t2val[(x,y + 1)]: nearDots += 1 if L == t2val[(x + 1,y - 1)]: nearDots += 1 if L == t2val[(x + 1,y)]: nearDots += 1 if L == t2val[(x + 1,y + 1)]: nearDots += 1 if nearDots < N: t2val[(x,y)] = 1 def saveImage(filename,size): image = Image.new("1",size) draw = ImageDraw.Draw(image) for x in xrange(0,size[0]): for y in xrange(0,size[1]): draw.point((x,y),t2val[(x,y)]) image.save(filename) image = Image.open("0.gif").convert('L') twoValue(image,200) clearNoise(image,4,1) saveImage("t5.jpg",image.size) def prtrbg(image): for y in xrange(0,image.size[1]): for x in xrange(0,image.size[0]): g = image.getpixel((x,y)) if g>0: g=1 else: g=0 print g , print " " ims = Image.open("t5.jpg") ims = ims.convert('1') prtrbg(ims) #print ims.tostring()
代码直接copy上面博客的内容,还都是javaeye的邻居。
降噪处理得到的图片如下
基本上噪点都去掉了,再进行数值化。
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 1 0 0 1 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1 0 0 0 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 0 0 1 0 0 1 1 0 0 0 1 1 1 0 0 1 0 0 0 1 1 0 0 0 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 0 1 1 1 0 0 0 0 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 0 0 1 1 1 0 0 0 1 1 0 0 1 1 0 0 1 1 1 0 0 0 1 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 1 1 0 0 1 1 1 0 0 0 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 1 0 0 1 1 0 0 0 1 1 0 0 0 1 0 0 0 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1