之前我们能得到省份,市,区的唯一ID,则我们可以根据这个ID获得当前想要查看的地区的天气状况。
继续使用我们的HttpWatch分析。例如我们希望知道江苏省南京市南京市区的天气状况。通过之前的城市ID分析,我们知道江苏省ID为10119,南京市ID01,南京市区ID为01,则查询ID为101190101,天气查询URL为:http://www.weather.com.cn/weather/101190101.shtml
这个返回的信息很符合HTML的标准。
我们可以使用HTMLParser提取信息,当然也可以使用正则表达式。
// 制定得到天气的Filter
NodeFilter prosFilter = new TagNameFilter("table");
NodeFilter prosAttrFilter = new HasAttributeFilter("class",
"yuBaoTable");
AndFilter filters = new AndFilter(prosFilter, prosAttrFilter);
// 得到对应的天气
NodeList nodes = parser.extractAllNodesThatMatch(filters);
for (int i = 0; i < nodes.size(); i++) {
Node node = nodes.elementAt(i);
String weatherInfo = node.toPlainTextString();
weatherInfo = weatherInfo.replaceAll("\\s", "");
}
因为提取出来的信息会有很多的空格符,就将其替换为空,weatherInfo.replaceAll("\\s", "");
最终得到的信息为:
15日星期三白天小雨高温23℃东风3-4级夜间阴低温19℃东风3-4级15日,星期三,白天:小雨,高温:高温23℃,风:东风,风力3-4级,夜间:阴,温度:低温19℃,风:东风,风力:3-4级
16日星期四白天阴高温28℃东风3-4级夜间多云低温21℃东南风3-4级16日,星期四,白天:阴,高温:高温28℃,风:东风,风力3-4级,夜间:多云,温度:低温21℃,风:东南风,风力:3-4级
17日星期五白天多云高温31℃东南风3-4级夜间阴低温23℃东南风3-4级17日,星期五,白天:多云,高温:高温31℃,风:东南风,风力3-4级,夜间:阴,温度:低温23℃,风:东南风,风力:3-4级
18日星期六白天雷阵雨高温28℃东风3-4级夜间中雨低温22℃东北风3-4级18日,星期六,白天:雷阵雨,高温:高温28℃,风:东风,风力3-4级,夜间:中雨,温度:低温22℃,风:东北风,风力:3-4级
19日星期日白天中雨高温25℃东北风3-4级夜间阵雨低温21℃东北风3-4级19日,星期日,白天:中雨,高温:高温25℃,风:东北风,风力3-4级,夜间:阵雨,温度:低温21℃,风:东北风,风力:3-4级
20日星期一白天阴高温29℃北风3-4级夜间多云低温22℃东风3-4级20日,星期一,白天:阴,高温:高温29℃,风:北风,风力3-4级,夜间:多云,温度:低温22℃,风:东风,风力:3-4级
21日星期二白天阴高温31℃东风3-4级21日,星期二,白天:阴,高温:高温31℃,风:东风,风力3-4级
为了方便信息,我们将每天的天气信息创建一个类。
类的属性有:日期,星期,白天天气,白天最高温,白天风向,白天风力,夜间天气,夜间最低温,夜间风向,夜间风力。
但是比较不爽的是,不是每天都会有白天信息或者夜间信息。例如21日。
所以在使用正则表达式提取上诉信息的时候分为三种情况:
1.白天和夜间都有
2.只有白天
3.只有夜间
不要抓狂,抓狂的来了。
不是所有的天气状况都是有风向和风力的。
我们得到了如下的天气状况:
15日星期三白天雷阵雨高温32℃无持续风向微风夜间中雨低温19℃无持续风向微风
竟然出现了什么无持续风向和微风。
好吧~使用正则表达式提取信息的时候,我们又得分出一种情况了!
代码如下:
参数content为城市天气信息页面返回的信息;
public static Vector<Day> GetWeather(String content) {
Vector<Day> days = new Vector<Day>();
// 根据所获得的天气网页信息得到未来七天温度信息
Parser parser;
try {
parser = new Parser(content);
// 制定得到天气的Filter
NodeFilter prosFilter = new TagNameFilter("table");
NodeFilter prosAttrFilter = new HasAttributeFilter("class",
"yuBaoTable");
AndFilter filters = new AndFilter(prosFilter, prosAttrFilter);
// 得到对应的天气
NodeList nodes = parser.extractAllNodesThatMatch(filters);
for (int i = 0; i < nodes.size(); i++) {
Node node = nodes.elementAt(i);
String weatherInfo = node.toPlainTextString();
weatherInfo = weatherInfo.replaceAll("\\s", "");
// System.out.print(weatherInfo);
// 得到的天气信息有两种,一种白天和夜间都有,一种只有白天
if (weatherInfo.indexOf("夜") != -1
&& weatherInfo.indexOf("白") != -1) {
String day = "", week = "", h_weather = "", high = "", h_wind = "", h_wind_level = "", l_weather = "", low = "", l_wind = "", l_wind_level = "";
if (weatherInfo.indexOf("风向") != -1) {
Pattern adayPattern = Pattern
.compile(MyRegex.WeatherRegexNoWind);
Matcher adayMatcher = adayPattern.matcher(weatherInfo);
while (adayMatcher.find()) {
day = adayMatcher.group(1);
week = adayMatcher.group(2);
h_weather = adayMatcher.group(3);
high = "高温" + adayMatcher.group(4) + "℃";
h_wind = adayMatcher.group(5) + "风向"; // 白天的风向
h_wind_level = adayMatcher.group(6) + "风"; // 风力
l_weather = adayMatcher.group(7);
low = "低温" + adayMatcher.group(8) + "℃";
l_wind = adayMatcher.group(9) + "风向";
l_wind_level = adayMatcher.group(10) + "风";
}
} else {
Pattern adayPattern = Pattern
.compile(MyRegex.WeatherRegex);
Matcher adayMatcher = adayPattern.matcher(weatherInfo);
while (adayMatcher.find()) {
day = adayMatcher.group(1);
week = adayMatcher.group(2);
h_weather = adayMatcher.group(3);
high = "高温" + adayMatcher.group(4) + "℃";
h_wind = adayMatcher.group(5) + "风"; // 白天的风向
h_wind_level = adayMatcher.group(6) + "级"; // 风力
l_weather = adayMatcher.group(7);
low = "低温" + adayMatcher.group(8) + "℃";
l_wind = adayMatcher.group(9) + "风";
l_wind_level = adayMatcher.group(10) + "级";
}
}
Day aday = new Day(day, week, h_weather, high, h_wind,
h_wind_level, l_weather, low, l_wind, l_wind_level);
days.add(aday);
// System.out.println(aday);
} else if (weatherInfo.indexOf("白") != -1) {
String day = "", week = "", h_weather = "", high = "", h_wind = "", h_wind_level = "";
if (weatherInfo.indexOf("风向") != -1) {
Pattern adayPattern = Pattern
.compile(MyRegex.WeatherDayRegexNoWind);
Matcher adayMatcher = adayPattern.matcher(weatherInfo);
while (adayMatcher.find()) {
day = adayMatcher.group(1);
week = adayMatcher.group(2);
h_weather = adayMatcher.group(3);
high = "高温" + adayMatcher.group(4) + "℃";
h_wind = adayMatcher.group(5) + "风向"; // 白天的风向
h_wind_level = adayMatcher.group(6) + "风"; // 风力
}
} else {
Pattern adayPattern = Pattern
.compile(MyRegex.WeatherDayRegex);
Matcher adayMatcher = adayPattern.matcher(weatherInfo);
while (adayMatcher.find()) {
day = adayMatcher.group(1);
week = adayMatcher.group(2);
h_weather = adayMatcher.group(3);
high = "高温" + adayMatcher.group(4) + "℃";
h_wind = adayMatcher.group(5) + "风"; // 白天的风向
h_wind_level = adayMatcher.group(6) + "级"; // 风力
}
}
Day aday = new Day(day, week, h_weather, high, h_wind,
h_wind_level, "", "", "", "");
days.add(aday);
// System.out.println(aday);
} else {
String day = "", week = "", l_weather = "", low = "", l_wind = "", l_wind_level = "";
if (weatherInfo.indexOf("风向") != -1) {
Pattern adayPattern = Pattern
.compile(MyRegex.WeatherNightRegexNoWind);
Matcher adayMatcher = adayPattern.matcher(weatherInfo);
while (adayMatcher.find()) {
day = adayMatcher.group(1);
week = adayMatcher.group(2);
l_weather = adayMatcher.group(3);
low = "低温" + adayMatcher.group(4) + "℃";
l_wind = adayMatcher.group(5) + "风向";
l_wind_level = adayMatcher.group(6) + "风";
}
} else {
Pattern adayPattern = Pattern
.compile(MyRegex.WeatherNightRegex);
Matcher adayMatcher = adayPattern.matcher(weatherInfo);
while (adayMatcher.find()) {
day = adayMatcher.group(1);
week = adayMatcher.group(2);
l_weather = adayMatcher.group(3);
low = "低温" + adayMatcher.group(4) + "℃";
l_wind = adayMatcher.group(5) + "风";
l_wind_level = adayMatcher.group(6) + "级";
}
Day aday = new Day(day, week, "", "", "", "",
l_weather, low, l_wind, l_wind_level);
days.add(aday);
// System.out.println(aday);
}
}
}
} catch (ParserException e) {
e.printStackTrace();
}
return days;
}
这部分其实就是信息的提取和分析。
目前测试使用的界面