我的dubbo应用, 刚开始的时候,启动一两个是没有问题的, 启动多了就大量出现:
2019-05-08 20:41:24.869 ERROR 2548 --- [TaskScheduler-1] o.s.c.a.nacos.discovery.NacosWatch : Error watching Nacos Service change java.lang.IllegalStateException: failed to req API:/nacos/v1/ns/service/list after all servers([192.168.11.196:8848]) tried: failed to req API:http://192.168.11.196:8848/nacos/v1/ns/service/list. code:500 msg: java.net.BindException: Address already in use: connect at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:380) ~[nacos-client-1.0.0.jar:na] at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:346) ~[nacos-client-1.0.0.jar:na] at com.alibaba.nacos.client.naming.net.NamingProxy.reqAPI(NamingProxy.java:294) ~[nacos-client-1.0.0.jar:na] at com.alibaba.nacos.client.naming.net.NamingProxy.getServiceList(NamingProxy.java:276) ~[nacos-client-1.0.0.jar:na] at com.alibaba.nacos.client.naming.net.NamingProxy.getServiceList(NamingProxy.java:252) ~[nacos-client-1.0.0.jar:na] at com.alibaba.nacos.client.naming.NacosNamingService.getServicesOfServer(NacosNamingService.java:525) ~[nacos-client-1.0.0.jar:na] at org.springframework.cloud.alibaba.nacos.discovery.NacosWatch.nacosServicesWatch(NacosWatch.java:127) ~[spring-cloud-alibaba-nacos-discovery-0.9.0.RELEASE.jar:0.9.0.RELEASE] at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.1.5.RELEASE.jar:5.1.5.RELEASE] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_201] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[na:1.8.0_201] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_201] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[na:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[na:1.8.0_201] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[na:1.8.0_201] at java.lang.Thread.run(Thread.java:748) ~[na:1.8.0_201]
可以看到 NamingProxy 的reqAPI 方法返回了错误。
根本原因是:
public String callServer(String api, Mapparams, String curServer, String method) throws NacosException { long start = System.currentTimeMillis(); long end = 0L; this.checkSignature(params); List headers = this.builderHeaders(); if (!curServer.contains(":")) { curServer = curServer + ":" + this.serverPort; } String url = HttpClient.getPrefix() + curServer + api; HttpResult result = HttpClient.request(url, headers, params, "UTF-8", method); // 这里 end = System.currentTimeMillis(); MetricsMonitor.getNamingRequestMonitor(method, url, String.valueOf(result.code)).observe((double)(end - start)); if (200 == result.code) { return result.content; } else if (304 == result.code) { return ""; } else { throw new NacosException(500, "failed to req API:" + HttpClient.getPrefix() + curServer + api + ". code:" + result.code + " msg: " + result.content); // 这里!! } }
再进去
public static HttpClient.HttpResult request(String url, Listheaders, Map paramValues, String encoding, String method) { HttpURLConnection conn = null; HttpClient.HttpResult var7; try { String encodedContent = encodingParams(paramValues, encoding); url = url + (StringUtils.isEmpty(encodedContent) ? "" : "?" + encodedContent); conn = (HttpURLConnection)(new URL(url)).openConnection(); setHeaders(conn, headers, encoding); conn.setConnectTimeout(CON_TIME_OUT_MILLIS); conn.setReadTimeout(TIME_OUT_MILLIS); conn.setRequestMethod(method); conn.setDoOutput(true); if ("POST".equals(method) || "PUT".equals(method)) { byte[] b = encodedContent.getBytes(); conn.setRequestProperty("Content-Length", String.valueOf(b.length)); conn.getOutputStream().write(b, 0, b.length); conn.getOutputStream().flush(); conn.getOutputStream().close(); } conn.connect();// 发生错误的时候,执行达到这里, 就到了下面的 catch部分。 LogUtils.NAMING_LOGGER.debug("Request from server: " + url); var7 = getResult(conn); return var7; } catch (Exception var13) { try { if (conn != null) { LogUtils.NAMING_LOGGER.warn("failed to request " + conn.getURL() + " from " + InetAddress.getByName(conn.getURL().getHost()).getHostAddress()); } } catch (Exception var12) { LogUtils.NAMING_LOGGER.error("[NA] failed to request ", var12); } LogUtils.NAMING_LOGGER.error("[NA] failed to request ", var13); var7 = new HttpClient.HttpResult(500, var13.toString(), Collections.emptyMap()); } finally { if (conn != null) { conn.disconnect(); } } return var7; }
开始的时候,总是以为是服务端出了什么问题, 应该就是 httpclient 发送GET 请求, 服务端没有正确响应吧! 于是反复折腾 nacos服务端,还跟进去了其源码。 后面发现搞错了方向!
反复折腾,无果。 慢慢的发现, 发生Address already in use: connect错误,也不是有规律的,有时候启动一个应用出错,有时候不会,有时候需要启动很多个才会。 而且出现这个错误的概率也是不一样的,有时候一个应用10分钟 出现 1-2个,有的10分钟几十个。
网上搜索一把Address already in use: connect,发现都是 junit相关的错误,明显跟我这个不同(后面才知道, 其实错误的原因是一样的!)。 后面慢慢认识到, com.alibaba.nacos.client.naming.net.HttpClient#request 创建HttpURLConnection 没有使用连接处, 这样的问题我之前也遇到过。 如果过多的HttpURLConnection, 可能会导致系统资源不够用。调试发现 nacos 似乎以非常快的速度调用com.alibaba.nacos.client.naming.net.NamingProxy#callServer(java.lang.String, java.util.Map
tcp6 0 0 192.168.11.196:49632 192.168.11.196:8848 TIME_WAIT
netstat -na | grep TIME_WAIT | wc -l 返回 28028 , 这肯定是不对的! 太恐怖了有没有!
[root@localhost logs]# ulimit -a core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited scheduling priority (-e) 0 file size (blocks, -f) unlimited pending signals (-i) 63250 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 1024 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 real-time priority (-r) 0 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes (-u) 63250 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
修改 ulimit -n 为65535 , 无果。 -n 是用来修改 可打开文件数的,可能不是修改端口数的吧。 搞半天无果。
忽然意识到本机可能也端口不够用, 果然, netstat查看了一些本机的 端口,发现大量的 端口未释放,15000 以上的量!
原来不是 nacos 的坑! 而是 我这个客户端的问题, 我本机(win10)的临时端口不够用了!! 难怪nacos的日志找不到错误,我本机就 有,原来请求都没有发送过去!!
linux 默认是可以打开很多端口的,win不是。
据说win 是 1024 -5000, 后面发现这个不对! 那个是之前的win系统,我的win10 不是这样的,而是:
修改本机
netsh int ipv4 set dynamicport tcp start=20000 num=40000
确定。
netsh int ipv4 show dynamicport tcp 协议 tcp 动态端口范围 --------------------------------- 启动端口 : 20000 端口数 : 40000
搞定!
不过,确实跟nacos 有关, 这货需要打开这么多的端口??? nacos 的坑啊!
wiindows 上容易出现这个问题, linux 一般是不会的:
[root@localhost nacos]# sysctl -a | grep file-max fs.file-max = 1604751 sysctl: reading key "net.ipv6.conf.all.stable_secret" sysctl: reading key "net.ipv6.conf.default.stable_secret" sysctl: reading key "net.ipv6.conf.lo.stable_secret" sysctl: reading key "net.ipv6.conf.p4p1.stable_secret" [root@localhost nacos]# [root@localhost nacos]# sysctl -a | grep ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 32768 60999 sysctl: reading key "net.ipv6.conf.all.stable_secret" sysctl: reading key "net.ipv6.conf.default.stable_secret" sysctl: reading key "net.ipv6.conf.lo.stable_secret" sysctl: reading key "net.ipv6.conf.p4p1.stable_secret" [root@localhost nacos]# ulimit -u 63250 [root@localhost nacos]# netstat -an | wc -l 22136
参考