java实现爬虫,爬取网易歌单信息

之前一直对爬虫很好奇,觉得它很神秘,而我有个朋友是做爬虫的,最近有空就向他学习了一下,并试着写了个小程序。
首先是获得httpclient对象及httpresponse对象,此两者是用于发送请求及接受数据。
	CloseableHttpClient httpClient = null;
	CloseableHttpResponse httpResponse = null;
	try {
		RequestConfig requestConfig = RequestConfig.custom().setConnectTimeout(10000).setSocketTimeout(10000)
					.setConnectionRequestTimeout(10000).build();
		httpClient = HttpClients.createDefault();
	}



然后是配置请求,去获得网站里的数据。
	HttpGet httpGet = new HttpGet("http://music.163.com/discover/toplist?id=3778678");
	httpGet.setConfig(requestConfig);

	httpGet.setHeader("Host", "music.163.com");
	httpGet.setHeader("Referer", "http://music.163.com/");
	httpGet.setHeader("User-Agent",
	"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.101 Safari/537.36");
上面包括url,请求头,代理等等,封装进httpget对象中。
	httpResponse = httpClient.execute(httpGet);
	String musicName = EntityUtils.toString(httpResponse.getEntity(), "UTF-8");
	logger.info(musicName);
执行该请求,通过 http. util.EntityUtils把请求的数据转为string,这里把它写进日志文件里。下面是抓取的数据信息,可以看到歌名等以及网页的信息也出来了。
后面步骤需要对此数据进行解析,毕竟要的只是排行榜信息。

areastyle="display:none;">[{
	"copyrightId": 14026,
	"mvid": 0,
	"transNames": null,
	"status": 0,
	"ftype": 0,
	"privilege": {
		"st": 0,
		"flag": 0,
		"subp": 1,
		"fl": 320000,
		"fee": 0,
		"dl": 320000,
		"cp": 1,
		"cs": false,
		"toast": false,
		"maxbr": 999000,
		"id": 515803379,
		"pl": 320000,
		"sp": 7,
		"payed": 0
	},
	"djid": 0,
	"album": {
		"id": 36681200,
		"name": "别",
		"picUrl": "http://p1.music.126.net/NUUQurj2vr85-ugkwORjWQ==/109951163052989882.jpg",
		"tns": [],
		"pic_str": "109951163052989882",
		"pic": 109951163052989882
	},
	"artists": [{
		"id": 5781,
		"name": "薛之谦",
		"tns": [],
		"alias": []
	}],
	"no": 0,
	"alias": [],
	"score": 100.0,
	"commentThreadId": "R_SO_4_515803379",
	"fee": 0,
	"name": "别",
	"id": 515803379,
	"type": 0,
	"duration": 215664
},

你可能感兴趣的:(Javaweb开发)