(正则表达式)邮件地址爬虫

这两天看了关于正则表达式的几个问题,想着之前的邮件地址的爬虫程序还不错,简单易懂,适合初学者,有需要的朋友可以参考下。

关于爬虫和正则表达式的知识请各位自己百度或博客。

主要的代码如下,主要是源代码,有些//注释可自行忽略。(大神出门右拐,不谢~)

public class MailCheck {
    public static StringBuffer getWebMail(String addressUrl) throws Exception{
//      URL url = new URL("http://192.168.56.1:8080/myweb/mail.html");
//      URL url = new URL("http://www.douban.com/group/topic/24022171/");
        URL url =new URL(addressUrl);
        URLConnection conn = url.openConnection();
        InputStream in = conn.getInputStream();
        BufferedReader bufin = new BufferedReader(new InputStreamReader(in));
        String mailreg = "\\w+@\\w+(\\.\\w+)+";//相对不太精确
//      String reg = "[a-zA-Z0-9_]+@[a-zA-Z0-9]+(\\.[a-zA-Z]+)+";//较为精确
        String line = null;
        Pattern p = Pattern.compile(mailreg);
//      File file = new File("F:\\java_p\\MailFromWeb.txt");
//      FileOutputStream out = new FileOutputStream(file);
        StringBuffer sbuf = new StringBuffer();
        int count = 0;
        while((line = bufin.readLine())!=null){
            Matcher m = p.matcher(line);
            while(m.find()){
//              System.out.println(m.group());
                ++count;
                sbuf.append(m.group());
                sbuf.append("\r\n");
//              byte[] b = m.group().getBytes();
//              for (int i = 0; i < b.length; i++) {
//                  out.write(b[i]);
//              }
//              out.write("\r\n".getBytes()); //换行
            }
        }
//      out.close();
        sbuf.append("总共找到"+count+"个邮箱");
        return sbuf;
    }
    //还未修改
    public static StringBuffer getLocalMail(String addressLocal) throws Exception{
//      BufferedReader buff = new BufferedReader(new FileReader("F:\\java_p\\webmail.txt"));
        BufferedReader buff = new BufferedReader(new FileReader(addressLocal));
        String mailreg = "\\w+@\\w+(\\.\\w+)+";//相对不太精确的匹配。
        String line = null;
        Pattern p = Pattern.compile(mailreg);
        StringBuffer sbuf = new StringBuffer();
//      File file = new File("F:\\java_p\\mail2.txt");
//      FileOutputStream out = new FileOutputStream(file);
        int count = 0;
        while((line = buff.readLine())!=null){
            Matcher m = p.matcher(line);
            while(m.find()){
//              System.out.println(m.group()+"----"+(++count));
                ++count;
                sbuf.append(m.group());
                sbuf.append("\r\n");
//              byte[] b = m.group().getBytes();
//              for (int i = 0; i < b.length; i++) {
//                  out.write(b[i]);
//              }
//              out.write("\r\n".getBytes()); //换行
            }
        }
//      System.out.println("总共找到"+count+"个邮箱");

//      out.close();
        sbuf.append("总共找到"+count+"个邮箱");
        return sbuf;
    }
}

简单的界面(初学者,大神勿喷):
网络地址爬虫:
(正则表达式)邮件地址爬虫_第1张图片

(正则表达式)邮件地址爬虫_第2张图片

本地地址爬虫:
(正则表达式)邮件地址爬虫_第3张图片

邮箱地址爬虫工具,我已经上传到CSDN,各位有需要的可下载,链接如下:
http://download.csdn.net/detail/shangguanyunlan/9349699

为了满足初学者(大神请忽略)的好奇心,我还是把源代码上传吧:
http://download.csdn.net/detail/shangguanyunlan/9349765

你可能感兴趣的:(Java)