故障:管理端连接172.16.1.8的受控端不能管理

[root@m01 ansible]# ansible 172.16.1.8 -m command -a "w"

172.16.1.8 | UNREACHABLE! => {

    "changed": false, 

    "msg": "Failed to connect to the host via ssh: ", 

    "unreachable": true

}

第一步:查询排错:查询详细连接过程发现受控端连接头部异常

[root@m01 ansible]# ansible 172.16.1.8 -m ping -vvvv

Using /etc/ansible/ansible.cfg as config file

Loading callback plugin minimal of type stdout, v2.0 from /usr/lib/python2.6/site-packages/ansible/plugins/callback/__init__.pyc

META: ran handlers

Using module file /usr/lib/python2.6/site-packages/ansible/modules/system/ping.py

<172.16.1.8> ESTABLISH SSH CONNECTION FOR USER: None

<172.16.1.8> SSH: EXEC ssh -vvv -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/923ebeb605 172.16.1.8 '/bin/sh -c '"'"'echo ~ && sleep 0'"'"''

<172.16.1.8> (255, '', 'OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_request_forwards: requesting forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 22508\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 12\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Control master terminated unexpectedly\r\n')

172.16.1.8 | UNREACHABLE! => {

    "changed": false, 

    "msg": "Failed to connect to the host via ssh: OpenSSH_5.3p1, OpenSSL 1.0.1e-fips 11 Feb 2013\ndebug1: Reading configuration data /etc/ssh/ssh_config\r\ndebug1: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_request_forwards: requesting forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 22508\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 12\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Control master terminated unexpectedly\r\n", 

    "unreachable": true

}Reading configuration data /etc/ssh/ssh_config\r\ndebug1: Applying options for *\r\ndebug1: auto-mux: Trying existing master\r\ndebug2: fd 3 setting O_NONBLOCK\r\ndebug2: mux_client_hello_exchange: master version 4\r\ndebug3: mux_client_request_forwards: requesting forwardings: 0 local, 0 remote\r\ndebug3: mux_client_request_session: entering\r\ndebug3: mux_client_request_alive: entering\r\ndebug3: mux_client_request_alive: done pid = 22508\r\ndebug3: mux_client_request_session: session request sent\r\ndebug1: mux_client_request_session: master session id: 12\r\ndebug3: mux_client_read_packet: read header failed: Broken pipe\r\ndebug2: Control master terminated unexpectedly\r\n", 

    "unreachable": true》这个错误大=大致的意思就是连接的时候读取ssh头部异常。


第二步:在172.16.1.8主机上进行进程检查:

[root@web01 ssh]# ps -ef|grep ssh

root      21204      1  0 15:08 ?        00:00:00 sshd: root@pts/1 

root      21272      1  0 15:14 ?        00:00:00 sshd: root@notty 

root      21818      1  0 15:43 ?        00:00:00 /usr/sbin/sshd

root      21845  21206  0 15:46 pts/1    00:00:00 grep ssh

[root@web01 ssh]# kill 21272 

[root@web01 ssh]# kill 21272 

-bash: kill: (21272) - No such process

[root@web01 ssh]# kill 21272 

-bash: kill: (21272) - No such process

[root@web01 ssh]# kill 21272 

-bash: kill: (21272) - No such process

解决说明:root      21272      1  0 15:14 ?        00:00:00 sshd: root@notty 这一条进程卡死了连接请求,需要杀掉这一条进程然后到管理端重新建立管理测试。

第三步:查看管理端:连通性过程详细过程

[root@m01 ansible]# ansible 172.16.1.8 -m ping

172.16.1.8 | SUCCESS => {

    "changed": false, 

    "ping": "pong"

}

说明:pong表示连通正常

第四步:再来管理端测试一下:

[root@m01 ansible]# ansible oldboy -m command -a "w"

172.16.1.8 | SUCCESS | rc=0 >>

 15:47:04 up  7:28,  3 users,  load average: 0.00, 0.00, 0.00

USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT

root     tty1     -                31Aug17  8:45   0.00s  0.00s -bash

root     pts/0    m01              15:47    0.00s  0.11s  0.00s /bin/sh -c /usr

root     pts/1    10.0.0.253       31Aug17 23.00s  0.06s  0.06s -bash


172.16.1.31 | SUCCESS | rc=0 >>

 15:47:05 up 3 days,  4:14,  2 users,  load average: 0.00, 0.00, 0.00

USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT

root     pts/0    10.0.0.253       08:08   15:37   0.02s  0.02s -bash

root     pts/2    m01              15:47    1.00s  0.09s  0.00s /bin/sh -c /usr


172.16.1.41 | SUCCESS | rc=0 >>

 15:47:05 up 2 days, 22:58,  3 users,  load average: 0.00, 0.00, 0.00

USER     TTY      FROM              LOGIN@   IDLE   JCPU   PCPU WHAT

root     tty1     -                09:21    6:24m  0.00s  0.00s -bash

root     pts/0    10.0.0.253       09:23   10:11   0.02s  0.02s -bash

root     pts/1    m01              15:47    1.00s  0.18s  0.00s /bin/sh -c /usr

说明:此时已经管理端显示正常,故障解决