HtmlParser疑似Bug
最近的项目中,使用到了HtmlParser(1.5版本).在使用过程中(如访问url为: http://athena2002.vip.china.alibaba.com/ ),遇到了异常:
Exception in thread
"
main
"
java.lang.IllegalArgumentException: invalid cookie name: Discard
at org.htmlparser.http.Cookie. < init > (Cookie.java: 136 )
at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java: 1126 )
at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java: 621 )
at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java: 792 )
at org.htmlparser.Parser. < init > (Parser.java: 251 )
at org.htmlparser.Parser. < init > (Parser.java: 261 )
检查代码,发现:
at org.htmlparser.http.Cookie. < init > (Cookie.java: 136 )
at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java: 1126 )
at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java: 621 )
at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java: 792 )
at org.htmlparser.Parser. < init > (Parser.java: 251 )
at org.htmlparser.Parser. < init > (Parser.java: 261 )
org.htmlparser.http.Cookie
1
public
Cookie (String name, String value)
2 {
3 if ( ! isToken (name) || name.equalsIgnoreCase ( " Comment " ) // rfc2019
4 || name.equalsIgnoreCase ( " Discard " ) // 2019++
5 || name.equalsIgnoreCase ( " Domain " )
6 || name.equalsIgnoreCase ( " Expires " ) // (old cookies)
7 || name.equalsIgnoreCase ( " Max-Age " ) // rfc2019
8 || name.equalsIgnoreCase ( " Path " )
9 || name.equalsIgnoreCase ( " Secure " )
10 || name.equalsIgnoreCase ( " Version " ))
11 throw new IllegalArgumentException ( " invalid cookie name: " + name);
12 mName = name;
13 mValue = value;
14 mComment = null ;
15 mDomain = null ;
16 mExpiry = null ; // not persisted
17 mPath = " / " ;
18 mSecure = false ;
19 mVersion = 0 ;
20 }
一旦发现name值为“Discard”,则抛异常。
2 {
3 if ( ! isToken (name) || name.equalsIgnoreCase ( " Comment " ) // rfc2019
4 || name.equalsIgnoreCase ( " Discard " ) // 2019++
5 || name.equalsIgnoreCase ( " Domain " )
6 || name.equalsIgnoreCase ( " Expires " ) // (old cookies)
7 || name.equalsIgnoreCase ( " Max-Age " ) // rfc2019
8 || name.equalsIgnoreCase ( " Path " )
9 || name.equalsIgnoreCase ( " Secure " )
10 || name.equalsIgnoreCase ( " Version " ))
11 throw new IllegalArgumentException ( " invalid cookie name: " + name);
12 mName = name;
13 mValue = value;
14 mComment = null ;
15 mDomain = null ;
16 mExpiry = null ; // not persisted
17 mPath = " / " ;
18 mSecure = false ;
19 mVersion = 0 ;
20 }
而在org.htmlparser.http.ConnectionManager.parseCookies (URLConnection connection) 解析cookie的代码中,见代码片段
if
(key.equals (
"
domain
"
))
cookie.setDomain (value);
else
if (key.equals ( " path " ))
cookie.setPath (value);
else
if (key.equals ( " secure " ))
cookie.setSecure ( true );
else
if (key.equals ( " comment " ))
cookie.setComment (value);
else
if (key.equals ( " version " ))
cookie.setVersion (Integer.parseInt (value));
else
if (key.equals ( " max-age " ))
{
Date date = new Date ();
long then = date.getTime () + Integer.parseInt (value) * 1000 ;
date.setTime (then);
cookie.setExpiryDate (date);
}
else
{ // error,? unknown attribute,
// maybe just another cookie not separated by a comma
cookie = new Cookie (name, value); //出问题的地方
cookies.addElement (cookie);
}
没有对Discard做特殊处理。
cookie.setDomain (value);
else
if (key.equals ( " path " ))
cookie.setPath (value);
else
if (key.equals ( " secure " ))
cookie.setSecure ( true );
else
if (key.equals ( " comment " ))
cookie.setComment (value);
else
if (key.equals ( " version " ))
cookie.setVersion (Integer.parseInt (value));
else
if (key.equals ( " max-age " ))
{
Date date = new Date ();
long then = date.getTime () + Integer.parseInt (value) * 1000 ;
date.setTime (then);
cookie.setExpiryDate (date);
}
else
{ // error,? unknown attribute,
// maybe just another cookie not separated by a comma
cookie = new Cookie (name, value); //出问题的地方
cookies.addElement (cookie);
}
无奈之下,覆写了此方法,加上对Discard的处理--直接continue :)
今天在写blog的时候,拿了1.6的代码测试,发现没有问题,分析代码后发现
1. ConnectionManager parserCookie之前,加了条件判断
if
(getCookieProcessingEnabled ())
parseCookies (ret);
默认情况下,条件为false
parseCookies (ret);
2. parserCookie的时候,catch了异常
1
//
error,? unknown attribute,
2 // maybe just another cookie
3 // not separated by a comma
4 try
5 {
6 cookie = new Cookie (name,
7 value);
8 cookies.addElement (cookie);
9 }
10 catch (IllegalArgumentException iae)
11 {
12 // should print a warning
13 // for now just bail
14 break ;
15 }
虽然解决了问题,但是明显还没有意识到Discard的问题。
2 // maybe just another cookie
3 // not separated by a comma
4 try
5 {
6 cookie = new Cookie (name,
7 value);
8 cookies.addElement (cookie);
9 }
10 catch (IllegalArgumentException iae)
11 {
12 // should print a warning
13 // for now just bail
14 break ;
15 }
从我的理解看,最合理的解决方案是:
1. org.htmlparser.http.Cookie中添加 boolean discard方法
2. org.htmlparser.http.ConnectionManager parserCookies()方法,对Discard做处理,如有值,则设置cookie.discard=true
关于discard的解释,见 http://www.faqs.org/rfcs/rfc2965.html:
Discard
OPTIONAL. The Discard attribute instructs the user agent to
discard the cookie unconditionally when the user agent terminates