"">

nginx启动错误: [emerg] host not found in upstream ""

  1. 碰到的问题是:

在nginx的错误日志里面:

2019/12/27 22:59:38 [error] 195#195: *360 no resolver defined to resolve , client: 127.0.0.1, server: , request: "GET /api/v1/_health HTTP/1.1", host: "localhost:8080"

就是说不可达,解析不了;但我们实际上能ping通这个地址,说明地址是有效的。

nginx的配置例子:

  upstream myserver {
    server myhost:8080;
  }

  location / {
    proxy_pass http://myserver;
  }
  

Google了一下,看到这么个解释:
原文:https://stackoverflow.com/questions/17685674/nginx-proxy-pass-with-remote-addr

If the proxy_pass statement has no variables in it, then it will use the "gethostbyaddr" system call during start-up or reload and will cache that value permanently.

if there are any variables, such as using either of the following:

set $originaddr http://origin.example.com;
proxy_pass $originaddr;
# or even
proxy_pass http://origin.example.com$request_uri;

Then nginx will use a built-in resolver, and the "resolver" directive must be present. "resolver" is probably a misnomer; think of it as "what DNS server will the built-in resolver use". Since nginx 1.1.9 the built-in resolver will honour DNS TTL values. Before then it used a fixed value of 5 minutes.

这正好是我的场景。

  1. nginx运行在docker swarm的container里面。
  2. nginx里面的配置内容为:
        location ~ ^/service/.+ {
            rewrite ^/service/service(\d+)/(.*)$ /$2 break;
            proxy_pass http://$1:8080;
        }

我们根据路径的serviceID路由到一个真正的swarm service上面去,这里正好使用了变量$1,表示service ID。举例来说,就是把nginx的请求:http://:/service/service1/api/v1/hello地址转发到swarm service:http://service1:8080/api/v1/hello

  1. 如果我们把proxy_pass语句中的变量去掉,直接写死一个swarm service ID,那么就不存在这个unresolve的问题。
  1. 原因

nginx启动的时候需要去解析upstream的DNS,如果解析失败就无法启动(这可能是nginx预留的问题)。

怎么办呢,最简单的办法是把机器名换成IP地址,这样虽然可以绕开问题,但是一般不会用IP地址吧,都是用的机器名域名。

  1. 结论

如果proxy_pass的URL里面包含变量,那么nginx就使用内置的resolver,而这个内置的resolver无法解析docker swarm service的地址。

  1. 解决办法:

使用resolver指令,让它指向docker swarm的内置DNS地址,这样就可以解析swarm service的地址了。

        location ~ ^/service/.+ {
            rewrite ^/service/service(\d+)/(.*)$ /$2 break;
            resolver 127.0.0.11;
            proxy_pass http://$1:8080;
        }

127.0.0.11是docker swarm的内资DNS server,可以用来解析swarm service地址。
另外一般建议使用valid=30s,因为resolver缺省的刷新时间是5分钟,那是对普通静态DNS的,而对于docker swarm这种类型服务,由于swarm可能会不确定的重启,为降低service down的时间,可以把valid值缩短:

resolver 127.0.0.11 valid=30s;

你可能感兴趣的:(nginx启动错误: [emerg] host not found in upstream "")