故障:管理端连接172.16.1.8的受控端不能管理
[root@m01 ansible]# ansible 172.16.1.8 -m command -a "w"
172.16.1.8 | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: ",
"unreachable": true
}
第一步:查询排错:查询详细连接过程发现受控端连接头部异常
[root@m01 ansible]# ansible 172.16.1.8 -m ping -vvvv
Using /etc/ansible/ansible.cfg as config file
Loading callback plugin minimal of type stdout, v2.0 from /usr/lib/python2.6/site-packages/ansible/plugins/callback/__init__.pyc
META: ran handlers
Using module file /usr/lib/python2.6/site-packages/ansible/modules/system/ping.py
<172.16.1.8> ESTABLISH SSH CONNECTION FOR USER: None
<172.16.1.8> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/923ebeb605 172.16.1.8 '/bin/sh -c '"'"'echo ~ && sleep 0'"'"''
<172.16.1.8> (255, '', 'OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_request_forwards: requesting forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 22508\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 12\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Control master terminated unexpectedly\r\n')
172.16.1.8 | UNREACHABLE! => {
"changed": false,
"msg": "Failed to connect to the host via ssh: OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_request_forwards: requesting forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 22508\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 12\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Control master terminated unexpectedly\r\n",
"unreachable": true
}Reading configuration data /etc/ssh/ssh_config\r\ndebug1: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_request_forwards: requesting forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 22508\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 12\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Control master terminated unexpectedly\r\n",
"unreachable": true》这个错误大=大致的意思就是连接的时候读取ssh头部异常。
第二步:在172.16.1.8主机上进行进程检查:
[root@web01 ssh]# ps -ef|grep ssh
root 21204 1 0 15:08 ? 00:00:00 sshd: root@pts/1
root 21272 1 0 15:14 ? 00:00:00 sshd: root@notty
root 21818 1 0 15:43 ? 00:00:00 /usr/sbin/sshd
root 21845 21206 0 15:46 pts/1 00:00:00 grep ssh
[root@web01 ssh]# kill 21272
[root@web01 ssh]# kill 21272
-bash: kill: (21272) - No such process
[root@web01 ssh]# kill 21272
-bash: kill: (21272) - No such process
[root@web01 ssh]# kill 21272
-bash: kill: (21272) - No such process
解决说明:root 21272 1 0 15:14 ? 00:00:00 sshd: root@notty 这一条进程卡死了连接请求,需要杀掉这一条进程然后到管理端重新建立管理测试。
第三步:查看管理端:连通性过程详细过程
[root@m01 ansible]# ansible 172.16.1.8 -m ping
172.16.1.8 | SUCCESS => {
"changed": false,
"ping": "pong"
}
说明:pong表示连通正常
第四步:再来管理端测试一下:
[root@m01 ansible]# ansible oldboy -m command -a "w"
172.16.1.8 | SUCCESS | rc=0 >>
15:47:04 up 7:28, 3 users, load average: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root tty1 - 31Aug17 8:45 0.00s 0.00s -bash
root pts/0 m01 15:47 0.00s 0.11s 0.00s /bin/sh -c /usr
root pts/1 10.0.0.253 31Aug17 23.00s 0.06s 0.06s -bash
172.16.1.31 | SUCCESS | rc=0 >>
15:47:05 up 3 days, 4:14, 2 users, load average: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root pts/0 10.0.0.253 08:08 15:37 0.02s 0.02s -bash
root pts/2 m01 15:47 1.00s 0.09s 0.00s /bin/sh -c /usr
172.16.1.41 | SUCCESS | rc=0 >>
15:47:05 up 2 days, 22:58, 3 users, load average: 0.00, 0.00, 0.00
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
root tty1 - 09:21 6:24m 0.00s 0.00s -bash
root pts/0 10.0.0.253 09:23 10:11 0.02s 0.02s -bash
root pts/1 m01 15:47 1.00s 0.18s 0.00s /bin/sh -c /usr
说明:此时已经管理端显示正常,故障解决