rabbitmq服务器进程突然挂掉问题

相关错误:
/var/log/rabbitmq/startup_err错误日志

Crash dump was written to: erl_crash.dump eheap_alloc: Cannot
allocate 762886488 bytes of memory (of type “heap”). Aborted (core
dumped)

celery-rank错误日志

[2019-03-13 07:13:57,717: WARNING/MainProcess] consumer: Connection to
broker lost. Trying to re-establish the connection… Traceback (most
recent call last): File
“/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/celery/worker/consumer.py”,
line 279, in start
blueprint.start(self) File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/celery/bootsteps.py”,
line 123, in start
step.start(parent) File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/celery/worker/consumer.py”,
line 838, in start
c.loop(*c.loop_args()) File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/celery/worker/loops.py”,
line 76, in asynloop
next(loop) File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/kombu/async/hub.py”,
line 340, in create_loop
cb(*cbargs) File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/kombu/transport/base.py”,
line 164, in on_readable
reader(loop) File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/kombu/transport/base.py”,
line 146, in _read
drain_events(timeout=0) File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/amqp/connection.py”,
line 303, in drain_events
chanmap, None, timeout=timeout, File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/amqp/connection.py”,
line 366, in _wait_multiple
channel, method_sig, args, content = read_timeout(timeout) File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/amqp/connection.py”,
line 337, in read_timeout
return self.method_reader.read_method() File “/home/deploy/virtualenv/mid-system/local/lib/python2.7/site-packages/amqp/method_framing.py”,
line 189, in read_method
raise m error: [Errno 104] Connection reset by peer [2019-03-13 07:13:57,772: ERROR/MainProcess] consumer: Cannot connect to
amqp://guest:**@172.16.2.10:5672//: [Errno 111] Connection refused.
Trying again in 2.00 seconds…

解决办法:
重启rabbitmq服务器

错误分析:
从错误原因eheap_alloc: Cannot allocate 762886488 bytes of memory (of type “heap”).看,有可能是操作系统已经分不出足够的内存给erlang,看起来也就需要700多M,然而服务器就是分配不起这个级别的内存?好像还真分配不起。。。,
可以看下当前服务器的实时内存
rabbitmq服务器进程突然挂掉问题_第1张图片
服务器的的最大内存约为8G,当前可用内存约为800多M,那么也是差不多刚刚好给上面的问题分配内存,不过这个时候还没考虑交换内存的问题
现在查一下崩溃日志文件,使用 locate erl_crash.dump找到崩溃日志文件路径:/var/lib/rabbitmq
然后使用网上提供的分析工具https://github.com/ferd/recon/blob/master/script/erl_crashdump_analyzer.sh,对erl_crash.dump进行分析
分析结果如下:

analyzing origin_erl_crash.dump, generated on: Wed Mar 13 07:13:57
2019

Slogan: eheap_alloc: Cannot allocate 762886488 bytes of memory (of
type “heap”).

Memory:
=== processes: 764 Mb processes_used: 764 Mb system: 641 Mb atom: 1 Mb atom_used: 1 Mb binary: 475 Mb code: 15 Mb ets: 101
Mb — total: 1406 Mb

Different message queue lengths (5 largest different):
=== 2245 0

Error logger queue length:
=== 0

File descriptors open:
=== UDP: 0 TCP: 196 Files: 43 — Total: 239

Number of processes:
=== 2245

Processes Heap+Stack memory sizes (words) used in the VM (5 largest
different):

  1 95360811
  3 28690
  2 17731
  2 10958
 49 6772

Processes OldHeap memory sizes (words) used in the VM (5 largest
different):

  1 28690
 10 17731
 19 10958
 83 6772
 28 4185

Process States when crashing (sum):

  1 Garbing
  1 Running    2243 Waiting

从这个看来也好像是内存分配不足的原因,进一步的原因就不好确定了,总之都是内存不足的问题
以后的解决办法:
1,增加服务器性能
2,减少服务器内存负担,可以将服务器不相关的进程迁移到其他服务器,
3,开启swap,不过用交换内存实际也不是很好,可能有负面影响
4,升级rabbitmq
更进一步的分析可以参考这篇http://ju.outofmemory.cn/entry/186612

你可能感兴趣的:(服务器问题处理)