nginx的upstream异常

异常

upstream server temporarily disabled while connecting to upstream

no live upstreams while connecting to upstream

max_fails与fail_timeout

max_fails默认值为1,fail_timeout默认值为10秒。

nginx可以通过设置max_fails(最大尝试失败次数)和fail_timeout(失效时间,在到达最大尝试失败次数后,在fail_timeout的时间范围内节点被置为失效,除非所有节点都失效,否则该时间内,节点不进行恢复)对节点失败的尝试次数和失效时间进行设置,当超过最大尝试次数或失效时间未超过配置失效时间,则nginx会对节点状会置为失效状态,nginx不对该后端进行连接,直到超过失效时间或者所有节点都失效后,该节点重新置为有效,重新探测.

upstream backend {
    server backend1.example.com weight=5;
    server 127.0.0.1:8080       max_fails=3 fail_timeout=30s;
    server unix:/tmp/backend3;

    server backup1.example.com  backup;
}

fail的标准

比如

connect() failed (111: Connection refused) while connecting to upstream, client: 127.0.0.1, server: , request: "POST /demo HTTP/1.1", subrequest: "/capture/getstatus", upstream: "http://192.168.99.100:8080/api/demo/

比如

upstream timed out (110: Connection timed out) while reading response header from upstream

Nginx 默认判断失败节点状态以connect refuse和time out状态为准,不以HTTP错误状态进行判断失败,因为HTTP只要能返回状态说明该节点还可以正常连接,所以nginx判断其还是存活状态;除非添加了proxy_next_upstream指令设置对404、502、503、504、500和time out等错误进行转到备机处理,在next_upstream过程中,会对fails进行累加,如果备用机处理还是错误则直接返回错误信息(但404不进行记录到错误数,如果不配置错误状态也不对其进行错误状态记录),综述,nginx记录错误数量只记录timeout 、connect refuse、502、500、503、504这6种状态,timeout和connect refuse是永远被记录错误状态,而502、500、503、504只有在配置proxy_next_upstream后nginx才会记录这4种HTTP错误到fails中,当fails大于等于max_fails时,则该节点失效.

探测机制

如果探测所有节点均失效,备机也为失效时,那么nginx会对所有节点恢复为有效,重新尝试探测有效节点,如果探测到有效节点则返回正确节点内容,如果还是全部错误,那么继续探测下去,当没有正确信息时,节点失效时默认返回状态为502,但是下次访问节点时会继续探测正确节点,直到找到正确的为止。

实验log

upstream test_server{
        server 192.168.99.100:80801;
        server 192.168.99.100:80802;
        server 192.168.99.100:80803;
    }
##for capture
location /api/test/demo{
            proxy_pass http://test_server/api/demo;
}    
location /api/demo{
            default_type application/json;
            content_by_lua_file conf/lua/demo.lua;
}

lua

local cjson = require "cjson.safe"
testres = ngx.location.capture("/api/test/demo",{
    method= ngx.HTTP_POST,
    body = "arg1=xxxx&arg2=xxxxx"
})
ngx.log(ngx.ERR,"status"..testres.status)
local testbody = cjson.decode(testres.body)
ngx.log(ngx.ERR,testbody==nil)

请求192.168.99.100:8080/api/demo,里头的lua会发起一个capture,请求/api/test/demo

请求一次

2017/02/09 14:48:57 [error] 5#5: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80801/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [warn] 5#5: *1 upstream server temporarily disabled while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80801/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [error] 5#5: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80802/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [warn] 5#5: *1 upstream server temporarily disabled while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80802/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [error] 5#5: *1 connect() failed (111: Connection refused) while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80803/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [warn] 5#5: *1 upstream server temporarily disabled while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://192.168.99.100:80803/api/demo", host: "192.168.99.100:8080"
2017/02/09 14:48:57 [error] 5#5: *1 [lua] demo.lua:44: status502 while sending to client, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", host: "192.168.99.100:8080"

对upstream逐个请求,都失败,则capture的subrequest返回502,对client返回的status code取决于lua脚本

再请求一次

2017/02/09 15:09:34 [error] 6#6: *11 no live upstreams while connecting to upstream, client: 192.168.99.1, server: , request: "POST /api/demo HTTP/1.1", subrequest: "/api/test/demo", upstream: "http://test_server/api/demo", host: "192.168.99.100:8080"

该upstream下面的server都挂的情况下出现no live upstreams while connecting to upstream

doc

  • ngx_http_upstream_module

  • nginx中健康检查(health_check)机制深入分析

  • nginx upstream 容错机制 原创-胡志广

  • 线上nginx的一次“no live upstreams while connecting to upstream ”分析

你可能感兴趣的:(nginx)