初学爬虫——java实现

1.寻找本地指定文件上的邮箱帐号

public class TestDemo {
    public static void main(String[] args) throws IOException {
        List list = getMails();
        for(String Mails:list){
            System.out.println(Mails);
        }
    }
    
    public static List getMails() throws IOException{
        BufferedReader br = new BufferedReader(new FileReader("d:\\BugReport.txt"));//D盘中放了一个BugReport.txt文件
        String regex = "\\w+@\\w+(\\.\\w+)+";
        List list = new ArrayList();
        Pattern p = Pattern.compile(regex);
        String line = null;
        while((line=br.readLine())!=null){
            Matcher m = p.matcher(line);
            while(m.find()){
                list.add(m.group());
            }
        }
        return list;
    }
}

2.寻找任一网页上的邮箱帐号,这里以贴吧上的留邮箱帖子为例,获取该网页上的所有邮箱:

public class TestDemo {
    public static void main(String[] args) throws IOException {
        List list = getMailsByWEB();
        for(String Mails:list){
            System.out.println(Mails);
        }
    }
    
    public static List getMailsByWEB() throws IOException{
        URL url = new URL("http://tieba.baidu.com/p/2314539885");
        BufferedReader br = new BufferedReader(new InputStreamReader(url.openStream()));
        String regex = "\\w+@\\w+(\\.\\w+)+";
        List list = new ArrayList();
        Pattern p = Pattern.compile(regex);
        String line = null;
        while((line=br.readLine())!=null){
            Matcher m = p.matcher(line);
            while(m.find()){
                list.add(m.group());
            }
        }
        return list;
    }
}

你可能感兴趣的:(初学爬虫——java实现)