操作系统发行版:CentOS7
RabbitMQ版本:3.6.11
服务器主机规划:
10.168.17.102 mq07.mq-cluster.mall.lt.com
10.168.17.98 mq08.mq-cluster.mall.lt.com
10.168.17.64 mq09.mq-cluster.mall.lt.com
1,在三台服务器上分别编辑以下文件:
vim /etc/rabbitmq/rabbitmq-env.conf
NODENAME=rabbit@mq07-mq-cluster
vim /etc/rabbitmq/rabbitmq-env.conf
NODENAME=rabbit@mq08-mq-cluster
vim /etc/rabbitmq/rabbitmq-env.conf
NODENAME=rabbit@mq09-mq-cluster
这里最好配置一下NODENAME。
2,添加解析,修改配置文件/etc/hosts
10.168.17.102 mq07.mq-cluster.mall.lt.com mq07-mq-cluster
10.168.17.98 mq08.mq-cluster.mall.lt.com mq08-mq-cluster
10.168.17.64 mq09.mq-cluster.mall.lt.com mq09-mq-cluster
注意:hosts中配置的这几条后面的简称主机名必须跟上面的NODENAME变量中@后面的那个字符串一致
3,/usr/lib/systemd/system/rabbitmq-server.service
务必注意,centos7上的rabbitmq和es之类的service文件中必须指定下面标黄的两个参数,不然systemd不会去读取/etc/security/limits.conf配置,也就是不生效,rabbitmq的disk节点一旦打满会导致整个集群挂掉;今天就是遇到了这个线上的问题,可打开文件描述符耗尽,导致rabbitmq集群挂掉,而且重启后立即挂掉,因为业务比较繁忙,所以导致重启后的rabbitmq会立即耗尽1024。
说明:默认安装rabbitmq之后,直接启动,文件描述符为1024,proc也是1024,即使你修改了/etc/security/limits.conf以及limits.conf.d目录下的子文件为65536,依然如此,这一点务必注意;
[Unit]
Description=RabbitMQ broker
After=syslog.target network.target
[Service]
Type=notify
User=rabbitmq
Group=rabbitmq
LimitNOFILE=65536
LimitNPROC=65535
WorkingDirectory=/var/lib/rabbitmq
ExecStart=/usr/sbin/rabbitmq-server
ExecStop=/usr/sbin/rabbitmqctl stop
ExecStop=/bin/sh -c "while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done"
NotifyAccess=all
TimeoutStartSec=3600
[Install]
WantedBy=multi-user.target
4,配置文件
默认是0.4,现在改成是0.8,机器的内存为64G。
创建或修改配置文件:
/etc/rabbitmq/rabbitmq.config
[
{rabbit,
[
{vm_memory_high_watermark, 0.8}
%% {vm_memory_high_watermark, {absolute, "40G"}}
]
}
].
注意:最后面的点结尾“.”
5,问题:
[root@mq08 ~]# journalctl -xe
Oct 19 19:48:04 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service: main process exited, code=exited, status=1/FAILURE
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: Error: Failed to initialize erlang distribution: {{shutdown,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {failed_to_start_child,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: auth,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {"Cookie file ./.erlang.cookie must be accessible by owner only",
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{auth,init_cookie,0,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"auth.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,286}]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {auth,init,1,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"auth.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,140}]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {gen_server,init_it,2,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: "gen_server.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,365}]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {gen_server,init_it,6,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: "gen_server.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,333}]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {proc_lib,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: init_p_do_apply,3,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [{file,"proc_lib.erl"},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {line,247}]}]}}},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {child,undefined,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: net_sup_dynamic,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: {erl_distribution,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: start_link,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [['rabbitmq-cli-27',
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: shortnames],
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: false]},
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: permanent,1000,supervisor,
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com rabbitmqctl[4516]: [erl_distribution]}}.
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service: control process exited, code=exited status=75
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: Failed to start RabbitMQ broker.
-- Subject: Unit rabbitmq-server.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
--
-- Unit rabbitmq-server.service has failed.
--
-- The result is failed.
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: Unit rabbitmq-server.service entered failed state.
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service failed.
Oct 19 19:48:05 mq08.mq-cluster.mall.lt.com polkitd[1055]: Unregistered Authentication Agent for unix-process:4237:24929114 (system bus name :1.6179, object path /org/freedesktop/PolicyKit1/AuthenticationAgen
解决办法:
chown rabbitmq:rabbitmq /var/lib/rabbitmq/.erlang.cookie
chmod 600 /var/lib/rabbitmq/.erlang.cookie
6,创建账号
rabitmqctl enable rabbitmq_management
rabbitmqctl add_user limu 123456
rabbitmqctl set_user_tags limu administrator
rabbitmqctl set_permissions -p / limu ".*" ".*" ".*"
7,问题
[root@mq07 ~]# systemctl status rabbitmq-server.service
● rabbitmq-server.service - RabbitMQ broker
Loaded: loaded (/usr/lib/systemd/system/rabbitmq-server.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Fri 2018-10-19 20:02:17 CST; 9s ago
Process: 20821 ExecStop=/bin/sh -c while ps -p $MAINPID >/dev/null 2>&1; do sleep 1; done (code=exited, status=0/SUCCESS)
Process: 20481 ExecStop=/usr/sbin/rabbitmqctl stop (code=exited, status=0/SUCCESS)
Process: 20202 ExecStart=/usr/sbin/rabbitmq-server (code=exited, status=1/FAILURE)
Main PID: 20202 (code=exited, status=1/FAILURE)
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: attempted to contact: ['rabbit@mq07-mq-cluster']
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: rabbit@mq07-mq-cluster:
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: * unable to connect to epmd (port 4369) on mq07-mq-cluster: address (cannot connect to host/port)
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: current node details:
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - node name: 'rabbitmq-cli-79@mq07'
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - home dir: .
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com rabbitmqctl[20481]: - cookie hash: 5lJVl9Km+lOXAsr8i4xIVA==
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: Failed to start RabbitMQ broker.
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: Unit rabbitmq-server.service entered failed state.
Oct 19 20:02:17 mq07.mq-cluster.mall.lt.com systemd[1]: rabbitmq-server.service failed.
最终问题:
这个报错信息的意思是:无法解析mq07-mq-cluster主机名,或者解析了该域名得到的IP地址不是本机的。
解决办法:
1,场景一:本机机器IP为10.168.17.102,但/etc/hosts错配置成了10.168.17.10 mq07.mq-cluster.mall.lt.com mq07-mq-cluster。
修正IP即可10.168.17.102 mq07.mq-cluster.mall.lt.com mq07-mq-cluster
2,场景二:/etc/rabbitmq/rabbitmq-env.conf文件中NODENAME=rabbit@mq09-mq-cluster,但是/etc/hosts中配置的是
10.168.17.64 mq09.mq-cluster.mall.lt.com mq09-cluster
解决办法:把/etc/hosts中的mq09-cluster改成mq09-mq-cluster
8,添加镜像队列的策略
因为策略是针对vhost添加的,所以每添加一个vhost,都要执行一下添加镜像队列的这条命令
rabbitmqctl set_policy -p /admin "ha-allqueue" '{"ha-mode":"all","ha-sync-mode":"automatic"}