爬取交通违章信息的脚本
会开车了,也慢慢不规矩起来了,于是乎,违章信息也慢慢多起来了,但是无法第一时间通知到自己。虽说,有个网站: http://www.hzti.com/service/qry/violation_veh.aspx?pgid=&type=1&node=249
可以查询非现场违章情况,
不过:
1.我是懒人,主动去查询的时候不太乐意做
2.车辆识别码,永远记不住
3.每次输验证码,挺麻烦的
所以,我希望,只要有违规信息,能主动通知到我。
程序员嘛,自己动手,丰衣足食。 :)
实现思路:
1.爬取 http://www.hzti.com/service/qry/violation_veh.aspx?pgid=&type=1&node=249的信息
2.使用飞信命令发送短信消息(http://www.it-adv.net/)--免费的哦。
实现难点:
1.验证码破击
我对图片毫无研究,幸好它的验证码太简单了,只有#000000(黑色)才是验证码信息,其他的都是无用信息。并且元素的位置都是固定的。
2.网站用.net实现,在post信息的时候,有一陀无用的数据信息。所幸,post数据的时候,并没有对数据做加密,不然都挺难判断具体参数的意义
验证码破解思路:
1.自我学习:通过程序下载多张验证码图片,去噪点,切割图片元素,找出最清楚的0-9这10个元素图片。作为样本;
2.破解:下载需要破解的验证码图片,去噪点,切割图片元素,和样本像素比较,相似度在95%以上的视为一致。
得到图片信息并且去噪点的方法:
1
//
得到图片信息,去除噪点的图片
2 BufferedImage getBufferedImage(InputStream in) throws Exception {
3 BufferedImage img = ImageIO.read(in);
4 for ( int i = 0 ; i < img.getWidth(); i ++ ) {
5 for ( int j = 0 ; j < img.getHeight(); j ++ ) {
6 // 黑色的才是验证码
7 if ( ! CODE_RGB.equals(Integer.toHexString(img.getRGB(i, j)))) {
8 img.setRGB(i, j, - 1 );
9 }
10 }
11 }
12 in.close();
13 return img;
14 }
得到图片像素信息(RGB值)
2 BufferedImage getBufferedImage(InputStream in) throws Exception {
3 BufferedImage img = ImageIO.read(in);
4 for ( int i = 0 ; i < img.getWidth(); i ++ ) {
5 for ( int j = 0 ; j < img.getHeight(); j ++ ) {
6 // 黑色的才是验证码
7 if ( ! CODE_RGB.equals(Integer.toHexString(img.getRGB(i, j)))) {
8 img.setRGB(i, j, - 1 );
9 }
10 }
11 }
12 in.close();
13 return img;
14 }
1
int
[][] getImageFeature(BufferedImage img)
throws
Exception {
2 int [][] feature = new int [WIDTH][HEIGHT];
3 for ( int i = 0 ; i < WIDTH; i ++ ) {
4 for ( int j = 0 ; j < HEIGHT; j ++ ) {
5 if (img.getWidth() < i || img.getHeight() < j) {
6 continue ;
7 }
8 feature[i][j] = img.getRGB(i, j);
9 }
10 }
11 return feature;
12 }
2 int [][] feature = new int [WIDTH][HEIGHT];
3 for ( int i = 0 ; i < WIDTH; i ++ ) {
4 for ( int j = 0 ; j < HEIGHT; j ++ ) {
5 if (img.getWidth() < i || img.getHeight() < j) {
6 continue ;
7 }
8 feature[i][j] = img.getRGB(i, j);
9 }
10 }
11 return feature;
12 }
下载多张图片,切割,保存,人工寻找最清晰的0-9这10个元素图片
public
void
learn(
int
count)
throws
Exception {
for ( int c = 0 ; c < count; c ++ ) {
BufferedImage img = getBufferedImage( new URL(IMG_URL).openStream());
// 保存样本
for ( int i = 1 ; i < 5 ; i ++ ) {
BufferedImage bi = img.getSubimage(i * X_BASE, Y_BASE, WIDTH, HEIGHT);
String name = String.valueOf(c) + " _ " + String.valueOf(i);
ImageIO.write(bi, IMAGE_FORMAT, new File(IMG_REF + " learn/ " + name + " .bmp " ));
}
}
}
备注:
for ( int c = 0 ; c < count; c ++ ) {
BufferedImage img = getBufferedImage( new URL(IMG_URL).openStream());
// 保存样本
for ( int i = 1 ; i < 5 ; i ++ ) {
BufferedImage bi = img.getSubimage(i * X_BASE, Y_BASE, WIDTH, HEIGHT);
String name = String.valueOf(c) + " _ " + String.valueOf(i);
ImageIO.write(bi, IMAGE_FORMAT, new File(IMG_REF + " learn/ " + name + " .bmp " ));
}
}
}
GifImageWriter有bug。对于BufferedImage的subBufferedImage输出一直有问题。Sun很早就有人提交了bug,貌似还没有修复。
破解代码(说破解,有点汗。。。其实压根儿和破解没关系)
1
public
String crack(InputStream in)
throws
Exception {
2 BufferedImage img = getBufferedImage(in);
3 StringBuilder ret = new StringBuilder();
4
5 for ( int c = 1 ; c < 5 ; c ++ ) {
6 BufferedImage bi = img.getSubimage(c * X_BASE, Y_BASE, WIDTH, HEIGHT);
7 int [][] feature = getImageFeature(bi);
8
9 for (ImageCode ic : imageCodes) {
10 int success = 0 ;
11 for ( int i = 0 ; i < WIDTH; i ++ ) {
12 for ( int j = 0 ; j < HEIGHT; j ++ ) {
13 if (feature[i][j] == ic.getFeature()[i][j]) {
14 success ++ ;
15 }
16 }
17 }
18 // 判断是否匹配上
19 if (success * 100.0 / (WIDTH * HEIGHT) > 95 ) {
20 ret.append(ic.getCode());
21
22 continue ;
23 }
24 }
25
26 }
27
28 return ret.toString();
29 }
其中,imageCodes是初始化准备的0-9这10个元素图片的特征信息和对应的值
2 BufferedImage img = getBufferedImage(in);
3 StringBuilder ret = new StringBuilder();
4
5 for ( int c = 1 ; c < 5 ; c ++ ) {
6 BufferedImage bi = img.getSubimage(c * X_BASE, Y_BASE, WIDTH, HEIGHT);
7 int [][] feature = getImageFeature(bi);
8
9 for (ImageCode ic : imageCodes) {
10 int success = 0 ;
11 for ( int i = 0 ; i < WIDTH; i ++ ) {
12 for ( int j = 0 ; j < HEIGHT; j ++ ) {
13 if (feature[i][j] == ic.getFeature()[i][j]) {
14 success ++ ;
15 }
16 }
17 }
18 // 判断是否匹配上
19 if (success * 100.0 / (WIDTH * HEIGHT) > 95 ) {
20 ret.append(ic.getCode());
21
22 continue ;
23 }
24 }
25
26 }
27
28 return ret.toString();
29 }
之后的事情就简单了。通过LiveHttpHeader找到Post提交的数据,通过shell脚本,模拟。搞定。
主要用到curl命令
1
curl
-
b
"
ASP.NET_SessionId=$SESSION_ID
"
"
$IMAGE_CODE_URL
"
-
o
$TMP_CODE_BASE
2 code = `java - cp lib / traffic . jar TrafficImgCracker $TMP_CODE_BASE `
3
6 curl - A " $AGENT " - e $TRAFFIC_URL - b " ASP.NET_SessionId=$SESSION_ID;isLoginedWeb=T;ImageV=$code " - d " $QUERY_PARAM " " $TRAFFIC_URL " - o $TMP_CONTENT_BASE
7
8 now = `cat $TMP_CONTENT_BASE | grep - r ' <td class="xxcxsspoptds"> ' | sed ' s/<[^>]*>//g ' | sed ' s/\s\+/ /g ' `
9 his = `cat $HIS_CONTENT_BASE | grep - r ' <td class="xxcxsspoptds"> ' | sed ' s/<[^>]*>//g ' | sed ' s/\s\+/ /g ' `
10
11 if [ " $now " != " $his " ];then
12 msg = `cat $TMP_CONTENT_BASE | grep - r ' <td class="xxcxsspoptds"> ' | sed ' s/<[^>]*>//g ' | sed ' s/\s\+/ /g ' | head - n 6 `
13 echo $msg
14 # send
15 fi
shell一直不熟,代码估计挺猥琐的。。。
2 code = `java - cp lib / traffic . jar TrafficImgCracker $TMP_CODE_BASE `
3
6 curl - A " $AGENT " - e $TRAFFIC_URL - b " ASP.NET_SessionId=$SESSION_ID;isLoginedWeb=T;ImageV=$code " - d " $QUERY_PARAM " " $TRAFFIC_URL " - o $TMP_CONTENT_BASE
7
8 now = `cat $TMP_CONTENT_BASE | grep - r ' <td class="xxcxsspoptds"> ' | sed ' s/<[^>]*>//g ' | sed ' s/\s\+/ /g ' `
9 his = `cat $HIS_CONTENT_BASE | grep - r ' <td class="xxcxsspoptds"> ' | sed ' s/<[^>]*>//g ' | sed ' s/\s\+/ /g ' `
10
11 if [ " $now " != " $his " ];then
12 msg = `cat $TMP_CONTENT_BASE | grep - r ' <td class="xxcxsspoptds"> ' | sed ' s/<[^>]*>//g ' | sed ' s/\s\+/ /g ' | head - n 6 `
13 echo $msg
14 # send
15 fi
最后通过飞信命令发送消息:
1
$FETION_HOME
/
fetion
--
mobile
=
12345678901
--
pwd
=
bugaosuni
--
to
=
12345678901
--
msg
-
utf8
=
"
$msg
"
广告下:
飞信命令行,挺好的东东,像系统报警等,都能考虑用它来完成。
最后,附上图片破解的代码:
1
import
java.awt.image.BufferedImage;
2 import java.io.File;
3 import java.io.FileInputStream;
4 import java.io.InputStream;
5 import java.net.URL;
6 import java.util.ArrayList;
7 import java.util.List;
8
9 import javax.imageio.ImageIO;
10
11 /**
12 * 交通网图片分析工具
13 *
14 * @author <a href="mailto:[email protected]">Stone.J</a> 2010-10-10
15 */
16 public class TrafficImgCracker {
17
18 private static final String IMG_URL = " http://www.hzti.com/government/CreateCheckCode.aspx " ;
19 private static final String IMG_REF = " /home/stone/tmp/imgref/ " ;
20
21 private static final int X_BASE = 9 ;
22 private static final int Y_BASE = 0 ;
23 private static final int WIDTH = 7 ;
24 private static final int HEIGHT = 22 ;
25 private static final String IMAGE_FORMAT = " bmp " ;
26 private static final String CODE_RGB = " ff000000 " ;
27
28 private List < ImageCode > imageCodes = new ArrayList < ImageCode > ( 10 );
29
30 public static void main(String[] args) throws Exception {
31 if (args.length != 1 ) {
32 System.out.println( " java TrafficImgCracker $file_path " );
33 System.exit( 1 );
34 }
35
36 String file = args[ 0 ];
37
38 TrafficImgCracker cracker = new TrafficImgCracker();
39 cracker.init();
40 System.out.print(cracker.crack(file));
41 }
42
43 /**
44 * init
45 *
46 * @throws Exception
47 */
48 public void init() throws Exception {
49 File ref = new File(IMG_REF);
50 File[] files = ref.listFiles();
51 for (File f : files) {
52 if ( ! f.isFile()) {
53 continue ;
54 }
55 String code = f.getName().substring( 0 , f.getName().indexOf( " . " ));
56 BufferedImage img = getBufferedImage( new FileInputStream(f));
57 int [][] feature = getImageFeature(img);
58 // add to image codes
59 imageCodes.add( new ImageCode(code, feature));
60 }
61 }
62
63 /**
64 * crack
65 *
66 * @param file
67 * @return
68 * @throws Exception
69 */
70 public String crack(String file) throws Exception {
71 return crack( new FileInputStream(file));
72 }
73
74 /**
75 * crack image code
76 *
77 * @param in image input stream
78 * @return code
79 */
80 public String crack(InputStream in) throws Exception {
81 BufferedImage img = getBufferedImage(in);
82 StringBuilder ret = new StringBuilder();
83
84 for ( int c = 1 ; c < 5 ; c ++ ) {
85 BufferedImage bi = img.getSubimage(c * X_BASE, Y_BASE, WIDTH, HEIGHT);
86 int [][] feature = getImageFeature(bi);
87
88 for (ImageCode ic : imageCodes) {
89 int success = 0 ;
90 for ( int i = 0 ; i < WIDTH; i ++ ) {
91 for ( int j = 0 ; j < HEIGHT; j ++ ) {
92 if (feature[i][j] == ic.getFeature()[i][j]) {
93 success ++ ;
94 }
95 }
96 }
97 // 判断是否匹配上
98 if (success * 100.0 / (WIDTH * HEIGHT) > 95 ) {
99 ret.append(ic.getCode());
100
101 continue ;
102 }
103 }
104
105 }
106
107 return ret.toString();
108 }
109
110 /**
111 * learn
112 *
113 * @param count learning count
114 * @throws Exception
115 */
116 public void learn( int count) throws Exception {
117 for ( int c = 0 ; c < count; c ++ ) {
118 BufferedImage img = getBufferedImage( new URL(IMG_URL).openStream());
119 // 保存样本
120 for ( int i = 1 ; i < 5 ; i ++ ) {
121 BufferedImage bi = img.getSubimage(i * X_BASE, Y_BASE, WIDTH, HEIGHT);
122 String name = String.valueOf(c) + " _ " + String.valueOf(i);
123 ImageIO.write(bi, IMAGE_FORMAT, new File(IMG_REF + " learn/ " + name + " .bmp " ));
124 }
125 }
126 }
127
128 // 得到图片信息,去除噪点的图片
129 private BufferedImage getBufferedImage(InputStream in) throws Exception {
130 BufferedImage img = ImageIO.read(in);
131 for ( int i = 0 ; i < img.getWidth(); i ++ ) {
132 for ( int j = 0 ; j < img.getHeight(); j ++ ) {
133 // 黑色的才是验证码
134 if ( ! CODE_RGB.equals(Integer.toHexString(img.getRGB(i, j)))) {
135 img.setRGB(i, j, - 1 );
136 }
137 }
138 }
139 in.close();
140 return img;
141 }
142
143 // 得到图片特征
144 private int [][] getImageFeature(BufferedImage img) throws Exception {
145 int [][] feature = new int [WIDTH][HEIGHT];
146 for ( int i = 0 ; i < WIDTH; i ++ ) {
147 for ( int j = 0 ; j < HEIGHT; j ++ ) {
148 if (img.getWidth() < i || img.getHeight() < j) {
149 continue ;
150 }
151 feature[i][j] = img.getRGB(i, j);
152 }
153 }
154 return feature;
155 }
156
157 /**
158 * image code bean
159 *
160 * @author <a href="mailto:[email protected]">Stone.J</a> 2010-10-10
161 */
162 public static class ImageCode {
163
164 private String code;
165 private int [][] feature;
166
167 public ImageCode(String code, int [][] feature){
168 this .code = code;
169 this .feature = feature;
170 }
171
172 public String getCode() {
173 return code;
174 }
175
176 public void setCode(String code) {
177 this .code = code;
178 }
179
180 public int [][] getFeature() {
181 return feature;
182 }
183
184 public void setFeature( int [][] feature) {
185 this .feature = feature;
186 }
187 }
188
189 }
2 import java.io.File;
3 import java.io.FileInputStream;
4 import java.io.InputStream;
5 import java.net.URL;
6 import java.util.ArrayList;
7 import java.util.List;
8
9 import javax.imageio.ImageIO;
10
11 /**
12 * 交通网图片分析工具
13 *
14 * @author <a href="mailto:[email protected]">Stone.J</a> 2010-10-10
15 */
16 public class TrafficImgCracker {
17
18 private static final String IMG_URL = " http://www.hzti.com/government/CreateCheckCode.aspx " ;
19 private static final String IMG_REF = " /home/stone/tmp/imgref/ " ;
20
21 private static final int X_BASE = 9 ;
22 private static final int Y_BASE = 0 ;
23 private static final int WIDTH = 7 ;
24 private static final int HEIGHT = 22 ;
25 private static final String IMAGE_FORMAT = " bmp " ;
26 private static final String CODE_RGB = " ff000000 " ;
27
28 private List < ImageCode > imageCodes = new ArrayList < ImageCode > ( 10 );
29
30 public static void main(String[] args) throws Exception {
31 if (args.length != 1 ) {
32 System.out.println( " java TrafficImgCracker $file_path " );
33 System.exit( 1 );
34 }
35
36 String file = args[ 0 ];
37
38 TrafficImgCracker cracker = new TrafficImgCracker();
39 cracker.init();
40 System.out.print(cracker.crack(file));
41 }
42
43 /**
44 * init
45 *
46 * @throws Exception
47 */
48 public void init() throws Exception {
49 File ref = new File(IMG_REF);
50 File[] files = ref.listFiles();
51 for (File f : files) {
52 if ( ! f.isFile()) {
53 continue ;
54 }
55 String code = f.getName().substring( 0 , f.getName().indexOf( " . " ));
56 BufferedImage img = getBufferedImage( new FileInputStream(f));
57 int [][] feature = getImageFeature(img);
58 // add to image codes
59 imageCodes.add( new ImageCode(code, feature));
60 }
61 }
62
63 /**
64 * crack
65 *
66 * @param file
67 * @return
68 * @throws Exception
69 */
70 public String crack(String file) throws Exception {
71 return crack( new FileInputStream(file));
72 }
73
74 /**
75 * crack image code
76 *
77 * @param in image input stream
78 * @return code
79 */
80 public String crack(InputStream in) throws Exception {
81 BufferedImage img = getBufferedImage(in);
82 StringBuilder ret = new StringBuilder();
83
84 for ( int c = 1 ; c < 5 ; c ++ ) {
85 BufferedImage bi = img.getSubimage(c * X_BASE, Y_BASE, WIDTH, HEIGHT);
86 int [][] feature = getImageFeature(bi);
87
88 for (ImageCode ic : imageCodes) {
89 int success = 0 ;
90 for ( int i = 0 ; i < WIDTH; i ++ ) {
91 for ( int j = 0 ; j < HEIGHT; j ++ ) {
92 if (feature[i][j] == ic.getFeature()[i][j]) {
93 success ++ ;
94 }
95 }
96 }
97 // 判断是否匹配上
98 if (success * 100.0 / (WIDTH * HEIGHT) > 95 ) {
99 ret.append(ic.getCode());
100
101 continue ;
102 }
103 }
104
105 }
106
107 return ret.toString();
108 }
109
110 /**
111 * learn
112 *
113 * @param count learning count
114 * @throws Exception
115 */
116 public void learn( int count) throws Exception {
117 for ( int c = 0 ; c < count; c ++ ) {
118 BufferedImage img = getBufferedImage( new URL(IMG_URL).openStream());
119 // 保存样本
120 for ( int i = 1 ; i < 5 ; i ++ ) {
121 BufferedImage bi = img.getSubimage(i * X_BASE, Y_BASE, WIDTH, HEIGHT);
122 String name = String.valueOf(c) + " _ " + String.valueOf(i);
123 ImageIO.write(bi, IMAGE_FORMAT, new File(IMG_REF + " learn/ " + name + " .bmp " ));
124 }
125 }
126 }
127
128 // 得到图片信息,去除噪点的图片
129 private BufferedImage getBufferedImage(InputStream in) throws Exception {
130 BufferedImage img = ImageIO.read(in);
131 for ( int i = 0 ; i < img.getWidth(); i ++ ) {
132 for ( int j = 0 ; j < img.getHeight(); j ++ ) {
133 // 黑色的才是验证码
134 if ( ! CODE_RGB.equals(Integer.toHexString(img.getRGB(i, j)))) {
135 img.setRGB(i, j, - 1 );
136 }
137 }
138 }
139 in.close();
140 return img;
141 }
142
143 // 得到图片特征
144 private int [][] getImageFeature(BufferedImage img) throws Exception {
145 int [][] feature = new int [WIDTH][HEIGHT];
146 for ( int i = 0 ; i < WIDTH; i ++ ) {
147 for ( int j = 0 ; j < HEIGHT; j ++ ) {
148 if (img.getWidth() < i || img.getHeight() < j) {
149 continue ;
150 }
151 feature[i][j] = img.getRGB(i, j);
152 }
153 }
154 return feature;
155 }
156
157 /**
158 * image code bean
159 *
160 * @author <a href="mailto:[email protected]">Stone.J</a> 2010-10-10
161 */
162 public static class ImageCode {
163
164 private String code;
165 private int [][] feature;
166
167 public ImageCode(String code, int [][] feature){
168 this .code = code;
169 this .feature = feature;
170 }
171
172 public String getCode() {
173 return code;
174 }
175
176 public void setCode(String code) {
177 this .code = code;
178 }
179
180 public int [][] getFeature() {
181 return feature;
182 }
183
184 public void setFeature( int [][] feature) {
185 this .feature = feature;
186 }
187 }
188
189 }
====================================================================
10月29日编辑的内容:
最近在学python,用python写了个版本,发觉确实简洁很多:
1
import
os, sys, Image
2
3 class TrafficImgCracker(object):
4 """
5 交通信息网站图片破解
6 """
7
8 def __init__ (self, base):
9 """
10 base:参考图片基本路径
11 codes:图片code和feature的元组集合
12 """
13 self.base = base
14 self.codes = []
15
16 for i in [images for images in os.listdir(base) if images.endswith( ' bmp ' )]:
17 self.codes.append((i[0], list(Image.open(base + i).getdata())))
18
19
20 def crack(self, imgfile):
21 img = Image.open(imgfile)
22 ret = []
23 for i in range( 1 , 5 ):
24 i = img.crop((i * 9 , 0, i * 9 + 7 , 22 ))
25 feature = [f if (f == 40 ) else 15 for f in i.getdata()]
26 ret.append(self._getcode(feature))
27 return "" .join(ret)
28
29
30 def _getcode(self, feature):
31 for code in self.codes:
32 ref = zip(code[ 1 ], feature)
33 if (len([x for x in ref if x[0] == x[ 1 ]]) * 100.0 / len(ref) > 95 ):
34 return code[0]
35 return None
2
3 class TrafficImgCracker(object):
4 """
5 交通信息网站图片破解
6 """
7
8 def __init__ (self, base):
9 """
10 base:参考图片基本路径
11 codes:图片code和feature的元组集合
12 """
13 self.base = base
14 self.codes = []
15
16 for i in [images for images in os.listdir(base) if images.endswith( ' bmp ' )]:
17 self.codes.append((i[0], list(Image.open(base + i).getdata())))
18
19
20 def crack(self, imgfile):
21 img = Image.open(imgfile)
22 ret = []
23 for i in range( 1 , 5 ):
24 i = img.crop((i * 9 , 0, i * 9 + 7 , 22 ))
25 feature = [f if (f == 40 ) else 15 for f in i.getdata()]
26 ret.append(self._getcode(feature))
27 return "" .join(ret)
28
29
30 def _getcode(self, feature):
31 for code in self.codes:
32 ref = zip(code[ 1 ], feature)
33 if (len([x for x in ref if x[0] == x[ 1 ]]) * 100.0 / len(ref) > 95 ):
34 return code[0]
35 return None