java如何获取网页全部内容

URL获取

之前在项目中,遇到这样一个问题。需要读取一个网页的内容,却发现只读取到了网页内容的一部分。
下面是代码:

	public static void read1(String urlStr) {
		URL url = null;
		InputStream is = null;
		InputStreamReader isr = null;
		BufferedReader br = null;
		StringBuffer sb = new StringBuffer();
		try {
			url = new URL(urlStr);
			is = url.openStream();
			isr = new InputStreamReader(is,"utf-8");
			br = new BufferedReader(isr);
			char[] c = new char[1024];
			while(br.read(c) != -1) {
				final String ss = new String(c);
				sb.append(ss);
			}
		} catch (MalformedURLException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}finally {
			try {
				br.close();
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
		System.out.println(sb.toString());
	}

将获取的内容存储到本地文件中,发现,其占用空间大小为4096字节,这正好为系统一页的大小(不同系统的页的大小是不一样的,一般都是512字节的整数倍)。说明在请求服务端资源时,服务端只读了资源地一部分,那么如何才能源源不断地读取整个资源呢?目前,没有找到解决方案。
但是想到,用浏览器访问时,是可以访问到全部内容的。于是,我改成了org.apache.commons.httpclient.HttpClient,果然,获取到了全部内容。

HttpClient获取

	public static String doGet(String url) {
		String respstr = "";
		try {
			GetMethod getMethod = new GetMethod(url);
			HttpClient httpclient = new HttpClient();
			httpclient.executeMethod(getMethod );
			InputStream inputStream = getMethod.getResponseBodyAsStream();  
			BufferedReader br = new BufferedReader(new InputStreamReader(inputStream));  
			StringBuffer stringBuffer = new StringBuffer();  
			String str= "";  
			while((str = br.readLine()) != null){  
				stringBuffer .append(str );  
			}
			respstr=stringBuffer.toString();
		}catch (Exception e) {
			e.printStackTrace();
			throw new RuntimeException(e);
		}
		return respstr;
	}

你可能感兴趣的:(随笔,Url,httpclient,UrlConnection)