HtmlParser疑似Bug

HtmlParser疑似Bug

最近的项目中,使用到了HtmlParser(1.5版本).在使用过程中(如访问url为: http://athena2002.vip.china.alibaba.com/ ),遇到了异常:
Exception in thread  " main "  java.lang.IllegalArgumentException: invalid cookie name: Discard
    at org.htmlparser.http.Cookie.
< init > (Cookie.java: 136 )
    at org.htmlparser.http.ConnectionManager.parseCookies(ConnectionManager.java:
1126 )
    at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:
621 )
    at org.htmlparser.http.ConnectionManager.openConnection(ConnectionManager.java:
792 )
    at org.htmlparser.Parser.
< init > (Parser.java: 251 )
    at org.htmlparser.Parser.
< init > (Parser.java: 261 )
检查代码,发现:
org.htmlparser.http.Cookie
 1  public  Cookie (String name, String value)
 2      {
 3           if  ( ! isToken (name)  ||  name.equalsIgnoreCase ( " Comment " //  rfc2019
 4                   ||  name.equalsIgnoreCase ( " Discard " //  2019++
 5                   ||  name.equalsIgnoreCase ( " Domain " )
 6                   ||  name.equalsIgnoreCase ( " Expires " //  (old cookies)
 7                   ||  name.equalsIgnoreCase ( " Max-Age " //  rfc2019
 8                   ||  name.equalsIgnoreCase ( " Path " )
 9                   ||  name.equalsIgnoreCase ( " Secure " )
10                   ||  name.equalsIgnoreCase ( " Version " ))
11               throw   new  IllegalArgumentException ( " invalid cookie name:  "   +  name);
12          mName  =  name;
13          mValue  =  value;
14          mComment  =   null ;
15          mDomain  =   null ;
16          mExpiry  =   null //  not persisted
17          mPath  =   " / " ;
18          mSecure  =   false ;
19          mVersion  =   0 ;
20      }
一旦发现name值为“Discard”,则抛异常。

而在org.htmlparser.http.ConnectionManager.parseCookies (URLConnection connection) 解析cookie的代码中,见代码片段
if  (key.equals ( " domain " ))
                            cookie.setDomain (value);
                        
else
                            
if  (key.equals ( " path " ))
                                cookie.setPath (value);
                            
else
                                
if  (key.equals ( " secure " ))
                                    cookie.setSecure (
true );
                                
else
                                    
if  (key.equals ( " comment " ))
                                        cookie.setComment (value);
                                    
else
                                        
if  (key.equals ( " version " ))
                                            cookie.setVersion (Integer.parseInt (value));
                                        
else
                                            
if  (key.equals ( " max-age " ))
                                            {
                                                Date date 
=   new  Date ();
                                                
long  then  =  date.getTime ()  +  Integer.parseInt (value)  *   1000 ;
                                                date.setTime (then);
                                                cookie.setExpiryDate (date);
                                            }
                                            
else
                                            {   
//  error,? unknown attribute,
                                                
//  maybe just another cookie not separated by a comma
                                                cookie  =   new  Cookie (name, value); //出问题的地方
                                                cookies.addElement (cookie);
                                            }
没有对Discard做特殊处理。
无奈之下,覆写了此方法,加上对Discard的处理--直接continue :)

今天在写blog的时候,拿了1.6的代码测试,发现没有问题,分析代码后发现
1. ConnectionManager parserCookie之前,加了条件判断
if  (getCookieProcessingEnabled ())
  parseCookies (ret);
默认情况下,条件为false
2. parserCookie的时候,catch了异常
 1  //  error,? unknown attribute,
 2  //  maybe just another cookie
 3  //  not separated by a comma
 4  try
 5  {
 6      cookie  =   new  Cookie (name,
 7          value);
 8      cookies.addElement (cookie);
 9  }
10  catch  (IllegalArgumentException iae)
11  {
12       //  should print a warning
13       //  for now just bail
14       break ;
15  }
虽然解决了问题,但是明显还没有意识到Discard的问题。

从我的理解看,最合理的解决方案是:
1. org.htmlparser.http.Cookie中添加 boolean discard方法
2. org.htmlparser.http.ConnectionManager parserCookies()方法,对Discard做处理,如有值,则设置cookie.discard=true

关于discard的解释,见 http://www.faqs.org/rfcs/rfc2965.html:
Discard
OPTIONAL. The Discard attribute instructs the user agent to
discard the cookie unconditionally when the user agent terminates

你可能感兴趣的:(HtmlParser疑似Bug)