salt-master IPC文件的一次报错

   最近公司的两个salt-master要合并管理,所以我就将其中一个master做了个salt-syndic,在安装“salt-syndic”的过程中自动升级了salt-master和salt-minion,并重启了master和syndic,其中发生了一些位置的问题,导致salt-master的“/var/run/salt/master/publish_pull.ipc”文件损坏,导致salt无法进行正常启动和通讯,而且这个文件无法删除,最终我修改了文件夹的名字重新启动salt-master,服务恢复正常。



第一步

这是一次在线升级salt-master后,重启发现master进程启动但是,无法打开端口和误以为自己未启动,

执行行命令报错

[root@ntest1 ~]# tail -f /var/log/salt/master
2015-11-16 18:53:10,166 [salt.client ][ERROR ][3103] Unable to connect to the salt master publisher at /var/run/salt/master
2015-11-16 18:54:02,895 [salt.client ][ERROR   ][3763] Unable to connect to the salt master publisher at /var/run/salt/master


第二步

停掉起来发现地址被占用,查看发现为占用

[root@ntest1 salt]# /etc/init.d/salt-master start
Starting salt-master
daemon: WARNING: Unable to bind socket, error: [Errno 98] Address already in
use The ports are notavailable to bind
                                                          [FAILED]


第三步

继续重启发现正常启动

[root@ntest1 salt]# /etc/init.d/salt-master start

Starting salt-master daemon:                               [  OK  ]


执行命令报错,这里可以确定salt-master是不正常的,原因就是因为“ /var/run/salt/master”目录下的publisher

[root@ntest1 salt]# salt '*' test.ping
[ERROR   ] Unable to connect to the salt master publisher at /var/run/salt/master
The salt master could not be contacted. Is master running?


打开debug日志发现下面日志,果然是这个

2015-11-16 19:02:59,379 [salt.utils.process                       ][DEBUG   ][6976] Started 'salt.master.<type 'type'>.Publisher' with pid 7627

2015-11-16 19:02:59,381 [salt.master                              ][INFO    ][7627] Starting the Salt Publisher on tcp://0.0.0.0:4505

2015-11-16 19:02:59,382 [salt.master                              ][INFO    ][7627] Starting the Salt Puller on ipc:///var/run/salt/master/publish_pull.ipc

2015-11-16 19:02:59,391 [salt.utils.process                       ][INFO    ][6976] Process <class 'salt.master.Publisher'> (7627) died with exit status None, restarting...

2015-11-16 19:03:00,395 [salt.utils.process                       ][DEBUG   ][6976] Started 'salt.master.<type 'type'>.Publisher' with pid 7630

2015-11-16 19:03:00,396 [salt.master                              ][INFO    ][7630] Starting the Salt Publisher on tcp://0.0.0.0:4505

2015-11-16 19:03:00,397 [salt.master                              ][INFO    ][7630] Starting the Salt Puller on ipc:///var/run/salt/master/publish_pull.ipc

2015-11-16 19:03:00,406 [salt.utils.process                       ][INFO    ][6976] Process <class 'salt.master.Publisher'> (7630) died with exit status None, restarting...

2015-11-16 19:03:01,409 [salt.utils.process                       ][DEBUG   ][6976] Started 'salt.master.<type 'type'>.Publisher' with pid 7633

2015-11-16 19:03:01,411 [salt.master                              ][INFO    ][7633] Starting the Salt Publisher on tcp://0.0.0.0:4505

2015-11-16 19:03:01,412 [salt.master                              ][INFO    ][7633] Starting the Salt Puller on ipc:///var/run/salt/master/publish_pull.ipc

2015-11-16 19:03:01,421 [salt.utils.process                       ][INFO    ][6976] Process <class 'salt.master.Publisher'> (7633) died with exit status None, restarting...

2015-11-16 19:03:02,424 [salt.utils.process                       ][DEBUG   ][6976] Started 'salt.master.<type 'type'>.Publisher' with pid 7636

2015-11-16 19:03:02,426 [salt.master                              ][INFO    ][7636] Starting the Salt Publisher on tcp://0.0.0.0:4505

2015-11-16 19:03:02,427 [salt.master                              ][INFO    ][7636] Starting the Salt Puller on ipc:///var/run/salt/master/publish_pull.ipc

2015-11-16 19:03:02,435 [salt.utils.process                       ][INFO    ][6976] Process <class 'salt.master.Publisher'> (7636) died with exit status None, restarting...

2015-11-16 19:03:03,439 [salt.utils.process                       ][DEBUG   ][6976] Started 'salt.master.<type 'type'>.Publisher' with pid 7639

2015-11-16 19:03:03,441 [salt.master                              ][INFO    ][7639] Starting the Salt Publisher on tcp://0.0.0.0:4505

2015-11-16 19:03:03,442 [salt.master                              ][INFO    ][7639] Starting the Salt Puller on ipc:///var/run/salt/master/publish_pull.ipc

2015-11-16 19:03:03,512 [salt.utils.process                       ][INFO    ][6976] Process <class 'salt.master.Publisher'> (7639) died with exit status None, restarting...

2015-11-16 19:03:04,516 [salt.utils.process                       ][DEBUG   ][6976] Started 'salt.master.<type 'type'>.Publisher' with pid 7642

2015-11-16 19:03:04,517 [salt.master                              ][INFO    ][7642] Starting the Salt Publisher on tcp://0.0.0.0:4505


这里注意

   查看目录下文件,果然有个“publish_pull.ipc”的文件出现损坏,这些本是salt启动生成,删除重新启动,就恢复正常。


[root@ntest1 master]# ll

ls: cannot access publish_pull.ipc: Input/output error

total 0

srwxrwxrwx 1 root root 0 Nov 16 19:06 master_event_pub.ipc

srwxrwxrwx 1 root root 0 Nov 16 19:06 master_event_pull.ipc

s????????? ? ?    ?    ?            ? publish_pull.ipc

srwxrwxrwx 1 root root 0 Nov 16 19:06 workers.ipc

[root@ntest1 master]# ps -ef|grep salt


第四步

删除文件,发现各种方法都无法删除“publish_pull.ipc”,最终退而求其次把“master”目录修改名字,

salt-master服务恢复正常(但是有时候无法删除就得需要第五步了)

[root@ntest1 salt]# rm master -rf
rm: cannot remove `master/publish_pull.ipc': Input/output error

[root@ntest1 salt]#ls
master  minion

[root@ntest1 salt]# mv master 123
[root@ntest1 salt]#ll
total 8
drwxrwxrwx 2 root root 4096 Nov 16 19:57 123
drwxrwxrwx 2 root root 4096 Nov 16 19:07 minion

[root@ntest1 salt]# /etc/init.d/salt-master start
Starting salt-master daemon:              [  OK  ]
[root@ntest1 salt]# salt '*' test.ping
ntest1.dianjoy.com:
    True
ntest2.dianjoy.com:
    True


你可能感兴趣的:(salt-master,salt报错,ipc的报错)