HtmlUnit(一)Fix File handler bug

HtmlUnit(一)Fix File handler bug

MainPage
http://htmlunit.sourceforge.net/

document
http://www.w3.org/TR/html401/interact/forms.html#adef-tabindex

Our version is htmlunit2.6.  After we used this opensource project, we came across a problem about 'file handler leak'.
Our system will throw 'too many open files' exception, and we are sure that it is caused by the TCP status CLOSE_WAIT. Our TCP status CLOSE_WAIT will increase from jboss start, and
it will never stop to a stable status.

The log from the server told us there were many connections URLs like this:
TCP yo-in-f190.ie100.net:http (CLOSE_WAIT)
TCP iad04s01-in-f99.ie100.net:http (CLOSE_WAIT)
TCP a72-246-208-9.deploy.akamaitechnologies.com:http (CLOSE_WAIT)
TCP a72-246-113-163.deploy.akamaitechnologies.com:http (CLOSE_WAIT)

we have done some changes about the server settings.
1. We modified the configuration of TCP_KEEPALIVE_TIME to a short value 1800 seconds.
I changed /etc/sysctl.conf by adding the following lines in the /etc/sysctl.conf and restart the newwork on my test Linux server.
net.ipv4.tcp_keepalive_time = 600
net.ipv4.tcp_keepalive_probes = 2
net.ipv4.tcp_keepalive_intvl = 2
2. We increased the ulimit=8000 to handle more files.

But that did work well. It can not fix the problem. After that, we think it is the problem of htmlunit itself.

We improved some source codes of htmlunit2.6. Our changes follow:
1.configuration file
src/main/java/com/gargoylesoftware/htmlunit/http_connection_pool.properties, some configurations:
DEFAULT_MAX_CONNECTIONS_PER_HOST=50
#Timeout in milliseconds
CONNECTION_TIMEOUT=300000
SO_TIMEOUT=300000
MAX_TOTAl_CONNECTIONS=500
#RECEIVE_BUFFER_SIZE=65535
#SEND_BUFFER_SIZE=65535
DEFAULT_MAX_CONNECTIONS_PER_HOST=60000
IDLE_TIMEOUT=30000

2.Web Connection class
src/main/java/com/gargoylesoftware/htmlunit/HttpWebConnection.java
modify the method of create connection to make MultiThreadedHttpConnectionManager singleton in our system.
protected HttpClient createHttpClient(){
// final MultiThreadedHttpConnectionManager connectionManager = new MultiThreadedHttpConnectionManager();
final MultiThreadedHttpConnectionManager connectionManager = com.gargoylesoftware.htmlunit.MultiThreadedHttpConnectionManagerFactory
                .getInstance();
HttpClient client = new HttpClient(connectionManager);
HostConfiguration hostConf = client.getHostConfiguration();
List<Header> headers = new ArrayList<Header>();
headers.add(new Header("Connection", "close"));
hostConf.getParams().setParameter("http.default-headers", headers);
return client;
}

3.Factory mode class to create MultiThreadedHttpConnectionManager
src/main/java/com/gargoylesoftware/htmlunit/MultiThreadedHttpConnectionManagerFactory.java:

package com.gargoylesoftware.htmlunit;

import java.io.IOException;
import java.io.InputStream;
import java.util.Properties;

import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
import org.apache.commons.httpclient.params.HttpConnectionManagerParams;
import org.apache.commons.httpclient.util.IdleConnectionTimeoutThread;

public class MultiThreadedHttpConnectionManagerFactory
{
    private static MultiThreadedHttpConnectionManager instance;

    public static MultiThreadedHttpConnectionManager getInstance()
    {
        InputStream is = null;
        HttpConnectionManagerParams param = null;
        Properties prop = null;
        if (null == instance)
        {
            synchronized (MultiThreadedHttpConnectionManagerFactory.class)
            {
                if (null == instance)
                {
                    param = new HttpConnectionManagerParams();
                    is = MultiThreadedHttpConnectionManagerFactory.class.getResourceAsStream("http_connection_pool.properties");
                    prop = new Properties();
                    try
                    {
                        prop.load(is);
                    }
                    catch (IOException e)
                    {

                        e.printStackTrace();
                    }

                    param.setDefaultMaxConnectionsPerHost(Integer.parseInt(prop.getProperty("DEFAULT_MAX_CONNECTIONS_PER_HOST", "50")));
                    param.setSoTimeout(Integer.parseInt(prop.getProperty("SO_TIMEOUT", "30000")));
                    param.setConnectionTimeout(Integer.parseInt(prop.getProperty("CONNECTION_TIMEOUT", "30000")));
                    param.setMaxTotalConnections(Integer.parseInt(prop.getProperty("MAX_TOTAl_CONNECTIONS", "500")));

                    MultiThreadedHttpConnectionManager newM = new MultiThreadedHttpConnectionManager();
                    newM.setParams(param);
                    instance = newM;
                    // register a idleConnect time out
                    IdleConnectionTimeoutThread idleThread = new IdleConnectionTimeoutThread();
                    idleThread.setTimeoutInterval(1000 * 30);
                    idleThread.setConnectionTimeout(Integer.parseInt(prop.getProperty("CONNECTION_TIMEOUT", "30000")));
                    idleThread.addConnectionManager(instance);
                    idleThread.start();
                }
            }
        }

        return instance;
    }
}

4. Right way to use htmlunit2.6-cusomer.jar
And after we finished call the webpages, we will do this clean work:
if (this.wc != null){
List<TopLevelWindow> windows = this.wc.getTopLevelWindows();
if (Log.isDebugEnabled(this)){
  if (windows != null && !windows.isEmpty()){
   for (int i = 0; i < windows.size(); i++){
    TopLevelWindow window = windows.get(i);
    History histories = window.getHistory();
    for (int j = 0; j < histories.getLength(); j++){
     URL url = histories.getUrl(j);
     Log.info(this, "Window=" + window.getName() + " : url=" + url.toString());
    }
   }
  }
}
this.wc.closeAllWindows();
this.wc = null;
}
The this.wc is short for WebClient. We have to call closeAllWindows() method according to the suggestion of the official website.

The most import changes are :
IdleConnectionTimeoutThread idleThread = new IdleConnectionTimeoutThread();
idleThread.setTimeoutInterval(1000 * 30);
idleThread.setConnectionTimeout(Integer.parseInt(prop.getProperty("CONNECTION_TIMEOUT", "30000")));
idleThread.addConnectionManager(instance);
idleThread.start();

We use a thread to watch the free connection and release them. This solved the problem. I hope this useful if you use htmlunit too.




你可能感兴趣的:(apache,.net,windows,linux,OpenSource)