Erlang: v22+
rabbitmq: v3.7.15
windows
http://erlang.org/download/otp_win64_22.0.exe
https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.15/rabbitmq-server-3.7.15.exe
linux
wget https://github.com/rabbitmq/erlang-rpm/releases/download/v21.3.8.3/erlang-21.3.8.3-1.el7.x86_64.rpm
wget https://github.com/rabbitmq/rabbitmq-server/releases/download/v3.7.15/rabbitmq-server-3.7.15-1.el7.noarch.rpm
yum install -y erlang-21.3.8.3-1.el7.x86_64.rpm
yum install -y rabbitmq-server-3.7.15-1.el7.noarch.rpm
node1(种子节点)在windows安装并且启动
node2 在centerOS7安装并且启动
rabbit1 rabbit2需要在各环境配置好hosts(即主机名和ip的映射),用于集群服务发现
支持2种风格格式,后缀.conf(key=value) .config([].)
Generic UNIX: $RABBITMQ_HOME/etc/rabbitmq/
Debian: /etc/rabbitmq/
RPM: /etc/rabbitmq/
Mac OS (Homebrew): ${install_prefix}/etc/rabbitmq/, the Homebrew cellar prefix is usually /usr/local
Windows: %APPDATA%\RabbitMQ\
通过环境变量重写配置文件位置
RABBITMQ_CONFIG_FILE
rabbitmq-plugins enable rabbitmq_management
访问地址127.0.0.1:15672
username:guest
password:guset
默认guest只能访问127.0.0.1,或者关闭限制,在config中调整属性为loopback_users.guest = false
firewall-cmd --zone=public --add-port=15672/tcp --permanent
firewall-cmd --reload
把其中一个拷贝过去,保证一样,不然在加入集群时报错:连接成功但是认证失败
C:\Windows\System32\config\systemprofile.erlang.cookie
/var/lib/rabbitmq/.erlang.cookie
# on rabbit2
rabbitmqctl stop_app
# => Stopping node rabbit@rabbit2 ...done.
rabbitmqctl reset
# => Resetting node rabbit@rabbit2 ...
rabbitmqctl join_cluster rabbit@rabbit1
# => Clustering node rabbit@rabbit2 with [rabbit@rabbit1] ...done.
rabbitmqctl start_app
# => Starting node rabbit@rabbit2 ...done.
# on rabbit1
rabbitmqctl cluster_status
# => Cluster status of node rabbit@rabbit1 ...
# => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
# => {running_nodes,[rabbit@rabbit2,rabbit@rabbit1]}]
# => ...done.
# on rabbit2
rabbitmqctl cluster_status
# => Cluster status of node rabbit@rabbit2 ...
# => [{nodes,[{disc,[rabbit@rabbit1,rabbit@rabbit2]}]},
# => {running_nodes,[rabbit@rabbit1,rabbit@rabbit2]}]
# => ...done.
至此,普通集群搭建完成。所有发布的消息,权限信息都会自动同步到集群的任意节点。
主节点的不可用时,分支节点的队列也不可用。
因为普通集群任意一个节点在收到publish消息时,会转发到home节点(创建队列的节点)进行操作(投递,消费,创建等),分支节点只有队列的元数据,没有实际存储和投递的功能。
绿色队列由于增加了ha策略,所以具备了主从复制+选主的特性。遵循CP原则,强一致性
红色队列是普通队列。只有owner(创建队列的节点也叫home
node)能收发、持久消息。即使路由到非owner也会被locate回owner节点。并且owner一旦宕机,非owner节点队列也会失效(不能收发消息)
每个rabbitMQ Server节点都是对等的,主从的概念是相对queue而言的
可以通过策略给指定队列建立镜像,达到主从复制的目的,同时有自动选主(一旦主不可用,会选择集群的其他节点的镜像队列为master)的功能。
解决了上述普通队列集群的问题。
默认都是磁盘节点,可以通过cmd手工更新。
RAM节点只将元数据保存在内存中。由于RAM节点不必像磁盘节点那样向磁盘写入数据,因此它们可以执行得更好。但是,请注意,由于持久队列数据总是存储在磁盘上,因此性能改进只会影响资源管理(例如,添加/删除队列、交换或vhosts),而不会影响发布或消耗速度。
RAM节点是一个高级用例;在设置第一个集群时,您不应该使用它们。您应该有足够的磁盘节点来处理您的冗余需求,然后在必要时为扩展添加额外的RAM节点。
只包含RAM节点的集群是脆弱的;如果集群停止,您将无法再次启动它,并且将丢失所有数据。rabbitmq在许多情况下都会阻止只创建RAM节点集群,但它不能绝对阻止。
集群节点分散在2个网络中,A网(node1,node2,node3),B网(node4,node5,node6)。
AB通过交换机互联,一旦AB网络不通,由于rabbitmq默认60s健康检测,势必导致2个子网分裂为2个集群,各自选主,引发了脑裂问题。
节点之间认为对方宕机是建立在互相不能联系的固定周期上的,默认是60s。同时会在运行节点上剔除故障节点,即使恢复联系后,依然认为对方是宕机的,所以这时会产生分区。换句话说,一个集群分裂为多个了,镜像队列会去各自选主,这就是脑裂。
Nodes determine if its peer is down if another node is unable to contact it for a period of time, 60 seconds by default. If two nodes come back into contact, both having thought the other is down, the nodes will determine that a partition has occurred.
Recovering From a Split-Brain
从脑裂中恢复,第一步选择一个信任的节点,换句话说这个决定是非常重要的;
任何在其他分区上的改变会造成数据丢失。
To recover from a split-brain, first choose one partition which you trust the most. This partition will become the authority for the state of the system (schema, messages) to use; any changes which have occurred on other partitions will be lost.
Stop all nodes in the other partitions, then start them all up again. When they rejoin the cluster they will restore state from the trusted partition.
分区策略
假设是3:3的集群(换句话说,分区数量对等–> A(a,b,c),B(c,d,f) )
如果使用pause_minority,会导致全部不可用,因为没有少数派。
建议使用的策略配置:
loopback_users.guest = false
cluster_partition_handling = pause_if_all_down
## Recovery strategy. Can be either 'autoheal' or 'ignore'
cluster_partition_handling.pause_if_all_down.recover = ignore
## Node names to check,每个分区选一个代表,如果都故障,则暂停
cluster_partition_handling.pause_if_all_down.nodes.1 = rabbit@DEV-b
cluster_partition_handling.pause_if_all_down.nodes.2 = rabbit@DEV-d
amqp-client v4.0.3
ConnectionFactory factory = new ConnectionFactory();
factory.setUsername(userName);
factory.setPassword(password);
factory.setVirtualHost(virtualHost);
factory.setHost(hostName);
factory.setPort(portNumber);
factory.setAutomaticRecoveryEnabled(true);
// connection that will recover automatically
Connection conn = factory.newConnection();
ConnectionFactory factory = new ConnectionFactory();
Address[] addresses = {new Address("192.168.1.4"), new Address("192.168.1.5")};
factory.newConnection(addresses);
org.springframework.amqp
spring-rabbit
1.7.6.RELEASE
xml conf
http://www.springframework.org/schema/rabbit/spring-rabbit-1.7.xsd
org.springframework.boot
spring-boot-starter-amqp
1.5.10.RELEASE
application.properties
spring.rabbitmq.addresses=192.168.103.34:5672,192.168.73.129:5672
参考资料:
https://github.com/spring-projects/spring-amqp/issues/1029
https://github.com/rabbitmq/rabbitmq-java-client/issues/138
https://github.com/rabbitmq/rabbitmq-java-client/issues/153
https://github.com/rabbitmq/rabbitmq-java-client/pull/169
https://github.com/rabbitmq/rabbitmq-dotnet-client/issues/195
幂等性
所有投递的消息必须有一个与业务无关的id(分布式id生成策略保证)。
举例:给这个id建立唯一索引列,在插入数据报duplicate key异常时,直接确认,不执行业务逻辑。