jsoup解析与爬虫

大家先看哈下面的网页:jsoup解析与爬虫_第1张图片

现在有个需求,项目组要求我们爬取到“子专业名称”,直接上代码。

   /**
	 * 获得子专业名称
	 * @param url
	 * @return
	 */
	public static String getSonSubjectName(String url){
		String sonSubjectName=null;
		try {
		if(url!=null&&!"".equals(url.trim())){
			// 创建httpClient实例
			CloseableHttpClient httpClient = HttpClients.createDefault();
			// 创建httpGet实例
			HttpGet httpGet = new HttpGet(url);
			CloseableHttpResponse response = httpClient.execute(httpGet);
			String content = null;
			if(response != null){
			HttpEntity entity = response.getEntity();   
			content = EntityUtils.toString(entity, "UTF-8");  // 获取网页内容
			int firstEndIndex=content.indexOf("navcrumbId-1");
			int secondEndIndex=content.indexOf("navcrumbId-2");
			String resultStr=content.substring(firstEndIndex,secondEndIndex);
			Document document = Jsoup.parse(resultStr);  // 解析网页,得到文档对象
			Elements elements1=document.getElementsByClass("navcrumb-item");// 获得节点
				sonSubjectName=elements1.get(1).text();
			}
			if(response != null){
				response.close();
			}
			if(httpClient != null){
				httpClient.close();
			}
		}
	} catch (Exception e) {
			logger.error("WyUtil.getSonSubjectName()----error", e);
	}
		return sonSubjectName;
}

创建实体类

public static void main(String[] args) {
    System.out.println(getSonSubjectName("https://study.163.com/course/introduction.htm?courseId=1006073263&_trace_c_p_k2_=bca7cf19265c4b66b5e9cdcd63e59bbc"));
}

运行,结果如下:

奶思

 

你可能感兴趣的:(Java,java)