大数据爬虫基础(四)MAVEN的安装配置和使用(下)--JAVA简单爬虫

eclipse maven

环境:

windows 10pro x64 jdk1.8 eclipse mars

1、安装设置maven插件

window->preferences->Installations->Add

具体见下边的参考链接一

2、新建maven project
File->new project->maven project->maven-archetype-quickstart(默认选中)
groupId:com.mvntest
artifactId:crawler
finish
3、创建爬虫程序TTT.java
在src/main/java右键->new class,输入TTT回车,将爬虫代码粘到里边
4、添加依赖httpclient 
4.1 搜httpclient 4.5.2依赖包
在如下网站搜到httpclient 3.1的maven pom.xml
http://mvnrepository.com/artifact/commons-httpclient/commons-httpclient/4.5.2



    org.apache.httpcomponents
    httpclient
    4.5.2



4.2 在eclipse添加pom.xml的依赖项
左边项目管理栏双击pom.xml,弹出表单,在下部找到dependancies,add,将上边的groupId,artifactId,版本填入,ok。
4.3 maven install
右键pom.xml,run as->maven install
如果报错,JRE不是JDK之类的,说明JRE reference不对,需要重新指定为JDK下边的JRE。
右键项目->property->java build path->libraries
选择JRE System Library->Edit
弹出框点Alternate JRE,点installed jres,点search,选择JDK下的JRE目录,apply,OK。
重新maven install 
build ok
[也可以在window -> preferences-java里把JRE路径改了,一劳永逸]
4.4 运行项目
如果没有mave clean,可以:MAVEN项目->右键->run as-> maven build->Goals填clean package,ok。
下载dependency
项目->右键->run as -> maven install
运行
项目->右键->run as -> java application -> TTT
输出抓取结果。完成。


参考:
http://jingyan.baidu.com/article/295430f136e8e00c7e0050b9.html
http://www.iteye.com/topic/1123225
http://www.blogjava.net/fancydeepin/archive/2012/06/12/380605.html
http://mvnrepository.com/artifact/commons-httpclient/commons-httpclient/3.1
http://bbs.csdn.net/topics/390172911
TTT.java
package com.mvntest.crawler;


import java.io.BufferedReader;  
import java.io.IOException;  
import java.io.InputStream;  
import java.io.InputStreamReader;  
  
import org.apache.http.HttpEntity;  
import org.apache.http.HttpResponse;  
import org.apache.http.client.ClientProtocolException;  
import org.apache.http.client.HttpClient;  
import org.apache.http.client.methods.HttpGet;  
import org.apache.http.impl.client.DefaultHttpClient;  
  
public class TTT  
{  
  
    /** 
     * @param args 
     * @throws IOException  
     * @throws ClientProtocolException  
     */  
    public static void main(String[] args) throws ClientProtocolException, IOException  
    {  
        // 创建HttpClient实例     
        HttpClient httpclient = new DefaultHttpClient();  
        // 创建Get方法实例     
        HttpGet httpgets = new HttpGet("http://mvnrepository.com/artifact/org.apache.httpcomponents/httpclient/4.5.2");    
        HttpResponse response = httpclient.execute(httpgets);    
        HttpEntity entity = response.getEntity();    
        if (entity != null) {    
            InputStream instreams = entity.getContent();    
            String str = convertStreamToString(instreams);  
            System.out.println("Do something");   
            System.out.println(str);  
            // Do not need the rest    
            httpgets.abort();    
        }  
    }  
      
    public static String convertStreamToString(InputStream is) {      
        BufferedReader reader = new BufferedReader(new InputStreamReader(is));      
        StringBuilder sb = new StringBuilder();      
       
        String line = null;      
        try {      
            while ((line = reader.readLine()) != null) {  
                sb.append(line + "\n");      
            }      
        } catch (IOException e) {      
            e.printStackTrace();      
        } finally {      
            try {      
                is.close();      
            } catch (IOException e) {      
               e.printStackTrace();      
            }      
        }      
        return sb.toString();      
    }  
  



你可能感兴趣的:(大数据爬虫基础)