Rabbitmq--vhost may be inaccessible虚主机无法访问问题

目录

    • 1. rabbitmq环境信息
    • 2. 问题描述
    • 3. 问题排错
      • 3.1 web报错
      • 3.2 日志报错
    • 4. 问题解决
      • 4.1 recovery.dets
      • 4.2 问题解决

1. rabbitmq环境信息

         两节点rabbitmq主备集群

节点 通信端口 版本
master 5672(监听端口)/15672(web端口)/25672(集群通信端口) rabbitmq-server-3.7.0/erlang-19.3.6.4
slave 5672(监听端口)/15672(web端口)/25672(集群通信端口) rabbitmq-server-3.7.0/erlang-19.3.6.4

2. 问题描述

1. 主节点机器磁盘爆满后主机宕机,清理和扩容后启动主机,rabbitmq服务停止
2. 重启rabbitmq服务,发现服务有问题,连接rabbitmq报错

3. 问题排错

3.1 web报错

         进入rabbitmq的web控制台,发现在首页页面报错,Virtual host xxxx experienced an error on node rabbit@rbtnode1 and may be inaccessible,好几个vhost都访问rabbitmq第一个节点失败。
Rabbitmq--vhost may be inaccessible虚主机无法访问问题_第1张图片

3.2 日志报错

         从rabbitmq的日志中,很明显的看出两类报错,一个是用户访问vhost is down已经连接不上了,一个是Unable to recover vhsot无法恢复了,而且很明显看出是rabbitmq的数据路径下A6B2YT8CK302DQET37CULUA99/recovery.dets的问题;实际进入该路径下,看出recovery.dets是0kB大小,文件已经损坏了。

[error] <0.1173.0> Unable to recover vhost <<"prod_XXXX">> data. Reason {badmatch,{error,{{{badmatch,{error,{not_a_dets_file,"/XXX/XXX/XXX/rabbitmq/mnesia/rabbit@rbtnode1/msg_stores/vhosts/A6B2YT8CK302DQET37CULUA99/recovery.dets"}}},[{rabbit_recovery_terms,open_table,1,[{file,"src/rabbit_recovery_terms.erl"},{line,191}]},{rabbit_recovery_terms,init,1,[{file,"src/rabbit_recovery_terms.erl"},{line,171}]},{gen_server,init_it,6,[{file,"gen_server.erl"},{line,328}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,247}]}]},{child,undefined,rabbit_recovery_terms,{rabbit_recovery_terms,start_link,[<<"prod_bk_nodeman">>]},transient,30000,worker,[rabbit_recovery_terms]}}}}

Error on AMQP connection <0.2915.0> (86.5.XX.XX:44194 -> 86.5.XX.XX:5672, vhost: 'none', 
user: 'bk_XXX', state: opening), channel 0:
 {handshake_error,opening,
                 {amqp_error,internal_error,
                             "access to vhost 'prod_XXX' refused for user 'bk_XXX': vhost 'prod_XXXX' is down",
                             'connection.open'}}

4. 问题解决

4.1 recovery.dets

In RabbitMQ versions starting with 3.7.0 all messages data is combined in the msg_stores/vhosts directory and stored in a subdirectory per vhost. Each vhost directory is named with a hash and contains a .vhost file with the vhost name, so a specific vhost's message set can be backed up separately.
In RabbitMQ versions prior to 3.7.0 messages are stored in several directories under the node data directory: queues, msg_store_persistent and msg_store_transient. Also there is a recovery.dets file which contains recovery metadata if the node was stopped gracefully.

在从3.7.0开始的RabbitMQ版本中,所有消息数据都组合在 msg_stores / vhosts目录中,并存储在每个vhost的子目录中。每个虚拟主机目录都用一个哈希命名,并包含一个带有虚拟主机名的.vhost文件,因此可以单独备份特定虚拟主机的消息集。
在3.7.0之前的RabbitMQ版本中,消息存储在节点数据目录下的多个目录中:queues,msg_store_persistent和msg_store_transient。另外,还有一个recovery.dets文件,如果该节点正常停止,则该文件包含恢复元数据。

https://www.rabbitmq.com/backup.html

4.2 问题解决

    从官网描述 recovery.dets 文件看出,正常情况下,该文件记录了rabbitmq内的元数据信息,
因为节点主机由于磁盘爆满导致了意外宕机,所以该文件没有正常写入数据,导致文件损坏。

1)从日志读出所有vhost损坏的文件,进入rabbitmq数据路径下,删除或mv掉recovery.dets文件,
  之后重启rabbitmq服务,恢复vhsot的连接,解决问题,recovery.dets也重新写入了元数据。
  
2)删除有问题的vhost,新建解决问题

你可能感兴趣的:(数据库,自动运维)