启动服务cloudera-scm-server时会遇到过一段时间自己挂掉,并返回cloudera-scm-server dead but pid file exists的问题。
以下为根源在cloudera-scm-server-db没有正常启动的情况。
【过程】
cloudera-scm-server启动后过一段时间自己挂掉
[root@gyvm-4 data]# service cloudera-scm-server start
Starting cloudera-scm-server: [ OK ]
[root@gyvm-4 data]#
[root@gyvm-4 data]# service cloudera-scm-server status
cloudera-scm-server (pid 60761) is running...
[root@gyvm-4 data]# service cloudera-scm-server status
cloudera-scm-server (pid 60761) is running...
[root@gyvm-4 data]# service cloudera-scm-server status
cloudera-scm-server (pid 60761) is running...
[root@gyvm-4 data]# service cloudera-scm-server status
cloudera-scm-server dead but pid file exists
这时候想要完整重启cloudera-scm server-db/server
发现cloudera-scm-server-db无法重启
[root@gyvm-4 data]# service cloudera-scm-server-db stop
waiting for server to shut down............................................................... failed
pg_ctl: server does not shut down
无法停止server-db的原因是残留了一个pid文件,status显示不正确,删除该文件,通过status查看,server-db其实已经停止了。
[root@gyvm-4 data]# cd /var/lib/cloudera-scm-server-db/data
[root@gyvm-4 data]# service cloudera-scm-server-db status
pg_ctl: server is running (PID: 17378)
/usr/bin/postgres "-D" "/var/lib/cloudera-scm-server-db/data"
[root@gyvm-4 data]# rm postmaster.pid
rm: remove regular file `postmaster.pid'? y
[root@gyvm-4 data]# service cloudera-scm-server-db status
pg_ctl: no server running
此时启动server-db,失败
[root@gyvm-4 data]# service cloudera-scm-server-db start
DB initialization done.
waiting for server to start...............................................................could not start server
查看log,tcp/ip端口7432 被占用
[root@gyvm-4 cloudera-scm-server]# tail db.log
LOG: could not bind IPv4 socket: Address already in use
HINT: Is another postmaster already running on port 7432? If not, wait a few seconds and retry.
LOG: could not bind IPv6 socket: Address already in use
HINT: Is another postmaster already running on port 7432? If not, wait a few seconds and retry.
WARNING: could not create listen socket for "*"
FATAL: could not create any TCP/IP sockets
杀掉占用该端口的进程
[root@gyvm-4 cloudera-scm-server]# netstat -ntp | grep 7432
tcp 0 0 192.168.1.17:7432 192.168.1.17:49784 ESTABLISHED 37118/postgres
tcp 0 0 192.168.1.17:7432 192.168.1.8:35818 ESTABLISHED 36807/postgres
tcp 0 0 192.168.1.17:7432 192.168.1.17:49779 ESTABLISHED 37060/postgres
tcp 0 0 192.168.1.17:49783 192.168.1.17:7432 ESTABLISHED 36306/java
tcp 0 0 192.168.1.17:7432 192.168.1.8:35813 ESTABLISHED 36778/postgres
tcp 0 0 192.168.1.17:49779 192.168.1.17:7432 ESTABLISHED 36306/java
tcp 0 0 192.168.1.17:49784 192.168.1.17:7432 ESTABLISHED 36306/java
tcp 0 0 192.168.1.17:49778 192.168.1.17:7432 ESTABLISHED 36306/java
tcp 0 0 192.168.1.17:7432 192.168.1.17:49778 ESTABLISHED 37059/postgres
tcp 0 0 192.168.1.17:7432 192.168.1.8:35814 ESTABLISHED 36779/postgres
tcp 0 0 192.168.1.17:7432 192.168.1.8:35817 ESTABLISHED 36804/postgres
tcp 0 0 192.168.1.17:7432 192.168.1.17:49783 ESTABLISHED 37117/postgres
[root@gyvm-4 cloudera-scm-server]# kill -9 37118
再次开启server-db,成功,启动server,成功。
[root@gyvm-4 data]# service cloudera-scm-server-db start
DB initialization done.
waiting for server to start.... done
server started
[root@gyvm-4 data]# service cloudera-scm-server start
Starting cloudera-scm-server: [ OK ]
此时,cloudera管理界面可以正常访问。
【结论】
究其原因,是cloudera-server-db没有正常启动,但是残留了pid文件postmaster.pid。
所以查看cloudera-server-db状态时,显示有误,返回cloudera-server-db是启动的状态。
在此基础上,每次启动cloudera-server就会失败。
而cloudera-server-db启动失败的原因是该服务需要的端口号被占用。