最近的项目中,在大并发的场景下,go的后台程序出现了connect:cannot-assign-requested-address的问题。查看网络状态,发现了大量的TIME-WAIT。本文将分析问题出现的原因,以及http.Client的合理与正确使用。
在并发4000的情况下,很快出现connect:cannot-assign-requested-address这样的错误。贴一下出问题的代码:
type HttpClient struct {
host *url.URL
HTTPClient *http.Client
}
func NewHttpClient(host string) *HttpClient {
var hostURL *url.URL = nil
var err error
if host != "" {
hostURL, err = url.Parse(host)
if err != nil {
panic(err.Error())
}
}
return &HttpClient{host: hostURL}
}
func (c *HttpClient) httpClient() *http.Client {
if c.HTTPClient == nil {
return http.DefaultClient
}
return c.HTTPClient
}
func (c *HttpClient) Do(req *http.Request, v interface{}) (*http.Response, error) {
response, err := c.httpClient().Do(req)
if err != nil {
return nil, err
}
defer response.Body.Close()
if v != nil {
err = osjson.NewDecoder(response.Body).Decode(v)
if err == io.EOF {
err = nil // ignore EOF, empty response body
}
}
return response, err
}
这里的http.Client用的是http库自带的DefaultClient。可以看一下它的初始化:
// DefaultTransport is the default implementation of Transport and is
// used by DefaultClient. It establishes network connections as needed
// and caches them for reuse by subsequent calls. It uses HTTP proxies
// as directed by the environment variables HTTP_PROXY, HTTPS_PROXY
// and NO_PROXY (or the lowercase versions thereof).
var DefaultTransport RoundTripper = &Transport{
Proxy: ProxyFromEnvironment,
DialContext: defaultTransportDialContext(&net.Dialer{
Timeout: 30 * time.Second,
KeepAlive: 30 * time.Second,
}),
ForceAttemptHTTP2: true,
MaxIdleConns: 100,
IdleConnTimeout: 90 * time.Second,
TLSHandshakeTimeout: 10 * time.Second,
ExpectContinueTimeout: 1 * time.Second,
}
// DefaultMaxIdleConnsPerHost is the default value of Transport's
// MaxIdleConnsPerHost.
const DefaultMaxIdleConnsPerHost = 2
func (t *Transport) maxIdleConnsPerHost() int {
if v := t.MaxIdleConnsPerHost; v != 0 {
return v
}
return DefaultMaxIdleConnsPerHost
}
这里可以看到,这个http.Client虽然做到了公用,但是没有初始化。用的默认的DefaultClient。在分析之前,先理清楚Transport里面三个重要的参数:
// MaxIdleConns controls the maximum number of idle (keep-alive)
// connections across all hosts. Zero means no limit.
MaxIdleConns int
// MaxIdleConnsPerHost, if non-zero, controls the maximum idle
// (keep-alive) connections to keep per-host. If zero,
// DefaultMaxIdleConnsPerHost is used.
MaxIdleConnsPerHost int
// MaxConnsPerHost optionally limits the total number of
// connections per host, including connections in the dialing,
// active, and idle states. On limit violation, dials will block.
//
// Zero means no limit.
MaxConnsPerHost int
MaxIdleConns:最大空闲连接数,控制了所有host的最大空闲连接数。0表示没有上限。
MaxIdleConnsPerHost:每个host的最大空闲连接数。如果是0,使用DefaultMaxIdleConnsPerHost
MaxConnsPerHost:每个host的最大连接数,包括:活跃、空闲和dailing中的。如果连接超过这个数,dail新连接会阻塞。0表示没有上限。
现在回过头来看DefaultClient的初始化:
MaxIdleConns,最大空闲连接数100
MaxIdleConnsPerHost,0,所以使用默认值是2.
MaxConnsPerHost,0,意味着没有上限。
所以在高并发场景下,因为每个host的最大连接数没有限制,所以会不停的dail新连接。但是每个host的最大空闲连接数为2,所以只能维持两个空闲连接,其他的连接必须关闭。因此出现了大量的TIME_WAIT和connect:cannot-assign-requested-address,客户端的本地端口被耗尽。
所以在生产环境中,一定不能使用DefaultClient发起请求。要设置每个host的最大连接数,当空闲、活跃和dail的连接数超过这个值,会挂起dail。MaxIdleConns一般要比MaxConnsPerHost小。
其实这个DefaultClient坑就坑在MaxIdleConnsPerHost默认设置成了2,MaxConnsPerHost不该用默认的0。也设置成2。
在另外一个go后台程序中出现了:connect:cannot-assign-requested-address和大量的TIME-WAIT。
检查client的初始化:
func NewHttpClientManager() *HttpClientManager {
mgr := &HttpClientManager{
}
mgr.transport = &http.Transport{
MaxIdleConns: 6000,
MaxConnsPerHost: 1200,
MaxIdleConnsPerHost: 1200,
IdleConnTimeout: 60 * time.Second,
}
...
}
func (mgr *HttpClientManager) createHttpClient() *HttpClientInfo {
client := &http.Client{
Transport: mgr.transport,
}
...
}
发现其实并不是client初始化的问题。
经过一番查找:
https://blog.csdn.net/sinat_36436112/article/details/118698978
// Body represents the response body.
//
// The response body is streamed on demand as the Body field
// is read. If the network connection fails or the server
// terminates the response, Body.Read calls return an error.
//
// The http Client and Transport guarantee that Body is always
// non-nil, even on responses without a body or responses with
// a zero-length body. It is the caller's responsibility to
// close Body. The default HTTP client's Transport may not
// reuse HTTP/1.x "keep-alive" TCP connections if the Body is
// not read to completion and closed.
//
// The Body is automatically dechunked if the server replied
// with a "chunked" Transfer-Encoding.
//
// As of Go 1.12, the Body will also implement io.Writer
// on a successful "101 Switching Protocols" response,
// as used by WebSockets and HTTP/2's "h2c" mode.
Body io.ReadCloser
调用者的责任去close body。如果body没有被读完且关闭,tcp连接不会被复用。
结合业务代码:
func (server *Server) HandleReq(w http.ResponseWriter, req *http.Request) {
...
resp, err := server.PassReq(req)
if err != nil {
log.Error(err.Error())
return
}
if resp.StatusCode != 404 || .... {
_, err = io.Copy(w, resp.Body)
if err != nil {
log.Error(err.Error())
}
_ = resp.Body.Close()
return
}
_ = resp.Body.Close()
...
}
如果状态码等于404,未读空body,直接close,导致tcp连接不复用。
func (server *Server) HandleReq(w http.ResponseWriter, req *http.Request) {
...
resp, err := server.PassReq(req)
if err != nil {
log.Error(err.Error())
return
}
defer func() {
_, _ = io.CopyN(io.Discard, resp.Body, 1024*4)
_ = resp.Body.Close()
}()
if resp.StatusCode != 404 || .... {
_, err = io.Copy(w, resp.Body)
if err != nil {
log.Error(err.Error())
}
return
}
...
}
添加:
defer func() {
_, _ = io.CopyN(io.Discard, resp.Body, 1024*4)
_ = resp.Body.Close()
}()
这个代码就保证了在关闭body之前,将剩余数据读取出来,即使之前body已经被读取完了,这里再次重复读取也没问题。