RabbitMQ
Features
Get Started
Support
Community
Docs
Blog
Overview
Clients communicate with RabbitMQ over the network. All protocols supported by the broker are TCP-based. Both RabbitMQ and the operating system provide a number of knobs that can be tweaked. Some of them are directly related to TCP and IP operations, others have to do with application-level protocols such as TLS. This guide covers multiple topics related to networking in the context of RabbitMQ. This guide is not meant to be an extensive reference but rather an overview. Some tuneable parameters discussed are OS-specific. This guide focuses on Linux when covering OS-specific subjects, as it is the most common platform RabbitMQ is deployed on.
There are several areas which can be configured or tuned. Each has a section in this guide:
Interfaces the node listens on for client connections
IP version preferences: dual stack, IPv6-only and IPv4-only
Ports used by clients, inter-node traffic in clusters and CLI tools
IPv6 support for inter-node traffic
TLS for client connections
Hostname resolution-related topics such as reverse DNS lookups
TCP buffer size (affects throughput and how much memory is used per connection)
The interface and port used by epmd
Other TCP socket settings
Proxy protocol support for client connections
Kernel TCP settings and limits (e.g. TCP keepalives and open file handle limit)
This guide also covers a few topics closely related to networking:
Except for OS kernel parameters and DNS, all RabbitMQ settings are configured via RabbitMQ configuration file(s).
Networking is a broad topic. There are many configuration options that can have positive or negative effect on certain workloads. As such, this guide does not try to be a complete reference but rather offer an index of key tunable parameters and serve as a starting point.
In addition, this guide touches on a few topics closely related to networking, such as
Hostnames, hostname resolution and DNS
connection lifecycle logging
Heartbeats (a.k.a. keepalives)
proxies and load balancers
high connection churn scenarios and resource exhaustion
and more.
A methodology for troubleshooting of networking-related issues is covered in a separate guide.
Network Interfaces for Client Connections
For RabbitMQ to accept client connections, it needs to bind to one or more interfaces and listen on (protocol-specific) ports. One such interface/port pair is called a listener in RabbitMQ parlance. Listeners are configured using the listeners.tcp.* configuration option(s).
TCP listeners configure both an interface and port. The following example demonstrates how to configure AMQP 0-9-1 and AMQP 1.0 listener to use a specific IP and the standard port:
listeners.tcp.1 = 192.168.1.99:5672
Or, using the classic config format:
[
{rabbit, [
{tcp_listeners, [{“192.168.1.99”, 5672}]}
]}
].
By default, RabbitMQ will listen on port 5672 on all available interfaces. It is possible to limit client connections to a subset of the interfaces or even just one, for example, IPv6-only interfaces. The following few sections demonstrate how to do it.
Listening on Dual Stack (Both IPv4 and IPv6) Interfaces
The following example demonstrates how to configure RabbitMQ to listen on localhost only for both IPv4 and IPv6:
listeners.tcp.1 = 127.0.0.1:5672
listeners.tcp.2 = ::1:5672
Or, in the classic config format:
[
{rabbit, [
{tcp_listeners, [{“127.0.0.1”, 5672},
{"::1", 5672}]}
]}
].
With modern Linux kernels and Windows releases, when a port is specified and RabbitMQ is configured to listen on all IPv6 addresses but IPv4 is not disabled explicitly, IPv4 address will be included, so
listeners.tcp.1 = :::5672
is equivalent to
listeners.tcp.1 = 0.0.0.0:5672
listeners.tcp.2 = :::5672
Listening on IPv6 Interfaces Only
In this example RabbitMQ will listen on an IPv6 interface only:
listeners.tcp.1 = fe80::2acf:e9ff:fe17:f97b:5672
In IPv6-only environments the node must also be configured to use IPv6 for inter-node communication and CLI tool connections.
Listening on IPv4 Interfaces Only
In this example RabbitMQ will listen on an IPv4 interface only:
listeners.tcp.1 = 192.168.1.99:5672
Port Access
RabbitMQ nodes bind to ports (open server TCP sockets) in order to accept client and CLI tool connections. Other processes and tools such as SELinux may prevent RabbitMQ from binding to a port. When that happens, the node will fail to start.
CLI tools, client libraries and RabbitMQ nodes also open connections (client TCP sockets). Firewalls can prevent nodes and CLI tools from communicating with each other. Make sure the following ports are accessible:
4369: epmd, a peer discovery service used by RabbitMQ nodes and CLI tools
5672, 5671: used by AMQP 0-9-1 and 1.0 clients without and with TLS
25672: used for inter-node and CLI tools communication (Erlang distribution server port) and is allocated from a dynamic range (limited to a single port by default, computed as AMQP port + 20000). Unless external connections on these ports are really necessary (e.g. the cluster uses federation or CLI tools are used on machines outside the subnet), these ports should not be publicly exposed. See networking guide for details.
35672-35682: used by CLI tools (Erlang distribution client ports) for communication with nodes and is allocated from a dynamic range (computed as server distribution port + 10000 through server distribution port + 10010). See networking guide for details.
15672: HTTP API clients, management UI and rabbitmqadmin (only if the management plugin is enabled)
61613, 61614: STOMP clients without and with TLS (only if the STOMP plugin is enabled)
1883, 8883: (MQTT clients without and with TLS, if the MQTT plugin is enabled
15674: STOMP-over-WebSockets clients (only if the Web STOMP plugin is enabled)
15675: MQTT-over-WebSockets clients (only if the Web MQTT plugin is enabled)
15692: Prometheus metrics (only if the Prometheus plugin is enabled)
It is possible to configure RabbitMQ to use different ports and specific network interfaces.
EPMD and Inter-node Communication
What is EPMD and How is It Used?
epmd (for Erlang Port Mapping Daemon) is a small additional daemon that runs alongside every RabbitMQ node and is used by the runtime to discover what port a particular node listens on. The port is then used by peer nodes and CLI tools.
When a node or CLI tool needs to contact node rabbit@hostname2 it will do the following:
Resolve hostname2 to an IP4 or IPv6 address using the standard OS resolver or a custom one specified in the inetrc file
Contact epmd running on hostname2 using the above address
Ask epmd for the port used by node rabbit on it
Contact the node using the resolved IP address and discovered port
Proceed with communication
EPMD Interface
epmd will listen on all interfaces by default. It can be limited to a number of interfaces using the ERL_EPMD_ADDRESS environment variable:
export ERL_EPMD_ADDRESS="::1"
When ERL_EPMD_ADDRESS is changed, both RabbitMQ node and epmd on the host must be stopped. For epmd, use
epmd -kill
to terminate it. The service will be started by the local RabbitMQ node automatically on boot.
The loopback interface will be implicitly added to that list (in other words, epmd will always bind to the loopback interface).
EPMD Port
The default epmd port is 4369, but this can be changed using the ERL_EPMD_PORT environment variable:
export ERL_EPMD_PORT=“4369”
All hosts in a cluster must use the same port.
When ERL_EPMD_PORT is changed, both RabbitMQ node and epmd on the host must be stopped. For epmd, use
epmd -kill
to terminate it. The service will be started by the local RabbitMQ node automatically on boot.
Inter-node Communication Port Range
RabbitMQ nodes will use a port from a certain range known as the inter-node communication port range. The same port is used by CLI tools when they need to contact the node. The range can be modified.
RabbitMQ nodes communicate with CLI tools and other nodes using a port known as the distribution port. It is dynamically allocated from a range of values. For RabbitMQ, the default range is limited to a single value computed as RABBITMQ_NODE_PORT (AMQP 0-9-1 and AMQP 1.0 port) + 20000, which results in using port 25672. This single port can be configured using the RABBITMQ_DIST_PORT environment variable.
RabbitMQ command line tools also use a range of ports. The default range is computed by taking the RabbitMQ distribution port value and adding 10000 to it. The next 10 ports are also part of this range. Thus, by default, this range is 35672 through 35682. This range can be configured using the RABBITMQ_CTL_DIST_PORT_MIN and RABBITMQ_CTL_DIST_PORT_MAX environment variables. Note that limiting the range to a single port will prevent more than one CLI tool from running concurrently on the same host and may affect CLI commands that require parallel connections to multiple cluster nodes. A port range of 10 is therefore a recommended value.
When configuring firewall rules it is highly recommended to allow remote connections on the inter-node communication port from every cluster member and every host where CLI tools might be used. epmd port must be open for CLI tools and clustering to function.
The range used by RabbitMQ can also be controlled via two configuration keys:
kernel.inet_dist_listen_min in the classic config format only
kernel.inet_dist_listen_max in the classic config format only
They define the range’s lower and upper bounds, inclusive.
The example below uses a range with a single port but a value different from default:
[
{kernel, [
{inet_dist_listen_min, 33672},
{inet_dist_listen_max, 33672}
]},
{rabbit, [
…
]}
].
To verify what port is used by a node for inter-node and CLI tool communication, run
epmd -names
on that node’s host. It will produce output that looks like this:
epmd: up and running on port 4369 with data:
name rabbit at port 25672
Inter-node Communication Buffer Size Limit
Inter-node connections use a buffer for data pending to be sent. Temporary throttling on inter-node traffic is applied when the buffer is at max allowed capacity. The limit is controlled via the RABBITMQ_DISTRIBUTION_BUFFER_SIZE environment variable in kilobytes. Default value is 128 MB (128000 kB).
In clusters with heavy inter-node traffic increasing this value may have a positive effect on throughput. Values lower than 64 MB are not recommended.
Using IPv6 for Inter-node Communication (and CLI Tools)
In addition to exclusive IPv6 use for client connections for client connections, a node can also be configured to use IPv6 exclusively for inter-node and CLI tool connectivity.
This involves configuration in a few places:
Inter-node communication protocol setting in the runtime
Configuring IPv6 to be used by CLI tools
epmd, a service involved in inter-node communication (discovery)
It is possible to use IPv6 for inter-node and CLI tool communication but use IPv4 for client connections or vice versa. Such configurations can be hard to troubleshoot and reason about, so using the same IP version (e.g. IPv6) across the board or a dual stack setup is recommended.
Inter-node Communication Protocol
To instruct the runtime to use IPv6 for inter-node communication and related tasks, use the RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS environment variable to pass a couple of flags:
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="-kernel inetrc ‘/etc/rabbitmq/erl_inetrc’ -proto_dist inet6_tcp"
RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp"
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS above uses two closely related flags:
-kernel inetrc to configure a path to an inetrc file that controls hostname resolution
-proto_dist inet6_tcp to tell the node to use IPv6 when connecting to peer nodes and listening for CLI tool connections
The erl_inetrc file at /etc/rabbitmq/erl_inetrc will control hostname resolution settings. For IPv6-only environments, it must include the following line:
%% Tells DNS client on RabbitMQ nodes and CLI tools to resolve hostnames to IPv6 addresses.
%% The trailing dot is not optional.
{inet6,true}.
CLI Tools
With CLI tools, use the same runtime flag as used for RabbitMQ nodes above but provide it using a different environment variable, RABBITMQ_CTL_ERL_ARGS:
RABBITMQ_CTL_ERL_ARGS="-proto_dist inet6_tcp"
Note that once instructed to use IPv6, CLI tools won’t be able to connect to nodes that do not use IPv6 for inter-node communication. This involves the epmd service running on the same host as target RabbitMQ node.
epmd
epmd is a small helper daemon that runs next to a RabbitMQ node and lets its peers and CLI tools discover what port they should use to communicate to it. It can be configured to bind to a specific interface, much like RabbitMQ listeners. This is done using the ERL_EPMD_ADDRESS environment variable:
export ERL_EPMD_ADDRESS="::1"
By default RabbitMQ nodes will use an IPv4 interface when connecting to epmd. Nodes that are configured to use IPv6 for inter-node communication (see above) will also use IPv6 to connect to epmd.
When epmd is configured to use IPv6 exclusively but RabbitMQ nodes are not, RabbitMQ will log an error message similar to this:
Protocol ‘inet_tcp’: register/listen error: econnrefused
systemd Unit File
On distributions that use systemd, the epmd.socket service controls network settings of epmd. It is possible to configure epmd to only listen on IPv6 intefaces:
ListenStream=[::1]:4369
The service will need reloading after its unit file has been updated:
systemctl daemon-reload
systemctl restart epmd.socket epmd.service
Intermediaries: Proxies and Load Balancers
Proxies and load balancers are fairly commonly used to distribute client connections between cluster nodes. Proxies can also be useful to make it possible for clients to access RabbitMQ nodes without exposing them publicly. Intermediaries can also have side effects on connections.
Proxy Effects
Proxies and load balancers introduce an extra network hop (or even multiple ones) between client and its target node. Intermediaries also can become a network contention point: their throughput will then become a limiting factor for the entire system. Network bandwidth overprovisioning and throughput monitoring for proxies and load balancers are therefore very important.
Intermediaries also may terminate “idle” TCP connections when there’s no activity on them for a certain period of time. Most of the time it is not desirable. Such events will result in abrupt connection closure log messages on the server end and I/O exceptions on the client end.
When heartbeats are enabled on a connection, it results in periodic light network traffic. Therefore heartbeats have a side effect of guarding client connections that can go idle for periods of time against premature closure by proxies and load balancers.
Heartbeat timeouts from 10 to 30 seconds will produce periodic network traffic often enough (roughly every 5 to 15 seconds)) to satisfy defaults of most proxy tools and load balancers. Values that are too low will produce false positives.
Proxy Protocol
RabbitMQ supports Proxy protocol versions 1 (text header format) and 2 (binary header format).
The protocol makes servers such as RabbitMQ aware of the actual client IP address when connections go over a proxy (e.g. HAproxy or AWS ELB). This makes it easier for the operator to inspect connection origins in the management UI or CLI tools.
The protocol spec dictates that either it must be applied to all connections or none of them for security reasons, this feature is disabled by default and needs to be enabled for individual protocols supported by RabbitMQ. To enable it for AMQP 0-9-1 and AMQP 1.0 clients:
proxy_protocol = true
When proxy protocol is enabled, clients won’t be able to connect to RabbitMQ directly unless they themselves support the protocol. Therefore, when this option is enabled, all client connections must go through a proxy that also supports the protocol and is configured to send a Proxy protocol header. HAproxy and AWS ELB documentation explains how to do it.
When proxy protocol is enabled and connections go through a compatible proxy, no action or modifications are required from client libraries. The communication is entirely transparent to them.
STOMP and MQTT, as well as Web STOMP and Web MQTT have their own settings that enable support for the proxy protocol.
TLS (SSL) Support
It is possible to encrypt connections using TLS with RabbitMQ. Authentication using peer certificates is also possible. Please refer to the TLS/SSL guide for more information.
Tuning for Throughput
Tuning for throughput is a common goal. Improvements can be achieved by * Increasing TCP buffer sizes * Ensuring Nagle’s algorithm is disabled * Enabling optional TCP features and extensions
For the latter two, see the OS-level tuning section below.
Note that tuning for throughput will involve trade-offs. For example, increasing TCP buffer sizes will increase the amount of RAM used by every connection, which can be a significant total server RAM use increase.
TCP Buffer Size
This is one of the key tunable parameters. Every TCP connection has buffers allocated for it. Generally speaking, the larger these buffers are, the more RAM is used per connection and better the throughput. On Linux, the OS will automatically tune TCP buffer size by default, typically settling on a value between 80 and 120 KB.
For maximum throughput, it is possible to increase buffer size using a group of config options:
tcp_listen_options for AMQP 0-9-1 and AMQP 1.0
mqtt.tcp_listen_options for MQTT
stomp.tcp_listen_options for STOMP
Note that increasing TCP buffer size will increase how much RAM the node uses for every client connection.
The following example sets TCP buffers for AMQP 0-9-1 connections to 192 KiB:
tcp_listen_options.backlog = 128
tcp_listen_options.nodelay = true
tcp_listen_options.linger.on = true
tcp_listen_options.linger.timeout = 0
tcp_listen_options.sndbuf = 196608
tcp_listen_options.recbuf = 196608
The same example for MQTT:
mqtt.tcp_listen_options.backlog = 128
mqtt.tcp_listen_options.nodelay = true
mqtt.tcp_listen_options.linger.on = true
mqtt.tcp_listen_options.linger.timeout = 0
mqtt.tcp_listen_options.sndbuf = 196608
mqtt.tcp_listen_options.recbuf = 196608
and STOMP:
stomp.tcp_listen_options.backlog = 128
stomp.tcp_listen_options.nodelay = true
stomp.tcp_listen_options.linger.on = true
stomp.tcp_listen_options.linger.timeout = 0
stomp.tcp_listen_options.sndbuf = 196608
stomp.tcp_listen_options.recbuf = 196608
Note that setting send and receive buffer sizes to different values can be dangerous and not recommended.
Erlang VM I/O Thread Pool
Erlang runtime uses a pool of threads for performing I/O operations asynchronously. The size of the pool is configured via the RABBITMQ_IO_THREAD_POOL_SIZE environment variable. The variable is a shortcut to setting the +A VM command line flag, e.g. +A 128.
RABBITMQ_IO_THREAD_POOL_SIZE=32
To set the flag directly, use the RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS environment variable:
RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+A 128"
Default value in recent RabbitMQ releases is 128 (30 previously). Nodes that have 8 or more cores available are recommended to use values higher than 96, that is, 12 or more I/O threads for every core available. Note that higher values do not necessarily mean better throughput or lower CPU burn due to waiting on I/O.
Tuning for a Large Number of Connections
Some workloads, often referred to as “the Internet of Things”, assume a large number of client connections per node, and a relatively low volume of traffic from each node. One such workload is sensor networks: there can be hundreds of thousands or millions of sensors deployed, each emitting data every several minutes. Optimising for the maximum number of concurrent clients can be more important than for total throughput.
Several factors can limit how many concurrent connections a single node can support: * Maximum number of open file handles (including sockets) as well as other kernel-enforced resource limits * Amount of RAM used by each connection * Amount of CPU resources used by each connection * Maximum number of Erlang processes the VM is configured to allow
Open File Handle Limit
Most operating systems limit the number of file handles that can be opened at the same time. When an OS process (such as RabbitMQ’s Erlang VM) reaches the limit, it won’t be able to open any new files or accept any more TCP connections.
How the limit is configured varies from OS to OS and distribution to distribution, e.g. depending on whether systemd is used. For Linux, Controlling System Limits on Linux in our Debian and RPM installation guides provides. Linux kernel limit management is covered by many resources on the Web, including the open file handle limit.
With Docker, Docker daemon configuration file in the host controls the limits.
MacOS uses a similar system.
On Windows, the limit for the Erlang runtime is controlled using the ERL_MAX_PORTS environment variable.
When optimising for the number of concurrent connections, making sure your system has enough file descriptors to support not only client connections but also files the node may use. To calculate a ballpark limit, multiply the number of connections per node by 1.5. For example, to support 100,000 connections, set the limit to 150,000.
Increasing the limit slightly increases the amount of RAM idle machine uses but this is a reasonable trade-off.
Per Connection Memory Consumption: TCP Buffer Size
See the section above for an overview.
For maximum number of concurrent client connections, it is possible to decrease TCP buffer size using a group of config options:
tcp_listen_options for AMQP 0-9-1 and AMQP 1.0
mqtt.tcp_listen_options for MQTT
stomp.tcp_listen_options for STOMP
Decreasing TCP buffer size will decrease how much RAM the node uses for every client connection.
This is often necessary in environments where the number of concurrent connections sustained per node is more important than throughput.
The following example sets TCP buffers for AMQP 0-9-1 connections to 32 KiB:
tcp_listen_options.backlog = 128
tcp_listen_options.nodelay = true
tcp_listen_options.linger.on = true
tcp_listen_options.linger.timeout = 0
tcp_listen_options.sndbuf = 32768
tcp_listen_options.recbuf = 32768
The same example for MQTT:
mqtt.tcp_listen_options.backlog = 128
mqtt.tcp_listen_options.nodelay = true
mqtt.tcp_listen_options.linger.on = true
mqtt.tcp_listen_options.linger.timeout = 0
mqtt.tcp_listen_options.sndbuf = 32768
mqtt.tcp_listen_options.recbuf = 32768
and for STOMP:
stomp.tcp_listen_options.backlog = 128
stomp.tcp_listen_options.nodelay = true
stomp.tcp_listen_options.linger.on = true
stomp.tcp_listen_options.linger.timeout = 0
stomp.tcp_listen_options.sndbuf = 32768
stomp.tcp_listen_options.recbuf = 32768
Note that lowering TCP buffer sizes will result in a proportional throughput drop, so an optimal value between throughput and per-connection RAM use needs to be found for every workload.
Setting send and receive buffer sizes to different values is dangerous and is not recommended. Values lower than 8 KiB are not recommended.
Limiting Number of Channels on a Connection
Channels also consume RAM. By optimising how many channels applications use, that amount can be decreased. It is possible to cap the max number of channels on a connection using the channel_max configuration setting:
channel_max = 16
Note that some libraries and tools that build on top of RabbitMQ clients may implicitly require a certain number of channels. Values above 200 are rarely necessary. Finding an optimal value is usually a matter of trial and error.
Nagle’s Algorithm (“nodelay”)
Disabling Nagle’s algorithm is primarily useful for reducing latency but can also improve throughput.
kernel.inet_default_connect_options and kernel.inet_default_listen_options must include {nodelay, true} to disable Nagle’s algorithm for inter-node connections.
When configuring sockets that serve client connections, tcp_listen_options must include the same option. This is the default.
The following example demonstrates that. First, rabbitmq.conf:
tcp_listen_options.backlog = 4096
tcp_listen_options.nodelay = true
which should be used together with the following bits in the advanced config file:
[
{kernel, [
{inet_default_connect_options, [{nodelay, true}]},
{inet_default_listen_options, [{nodelay, true}]}
]}].
When using the classic config format, everything is configured in a single file:
[
{kernel, [
{inet_default_connect_options, [{nodelay, true}]},
{inet_default_listen_options, [{nodelay, true}]}
]},
{rabbit, [
{tcp_listen_options, [
{backlog, 4096},
{nodelay, true},
{linger, {true,0}},
{exit_on_close, false}
]}
]}
].
Erlang VM I/O Thread Pool Tuning
Adequate Erlang VM I/O thread pool size is also important when tuning for a large number of concurrent connections. See the section above.
Connection Backlog
With a low number of clients, new connection rate is very unevenly distributed but is also small enough to not make much difference. When the number reaches tens of thousands or more, it is important to make sure that the server can accept inbound connections. Unaccepted TCP connections are put into a queue with bounded length. This length has to be sufficient to account for peak load hours and possible spikes, for instance, when many clients disconnect due to a network interruption or choose to reconnect. This is configured using the tcp_listen_options.backlog option:
tcp_listen_options.backlog = 4096
tcp_listen_options.nodelay = true
In the classic config format:
[
{rabbit, [
{tcp_listen_options, [
{backlog, 4096},
{nodelay, true},
{linger, {true, 0}},
{exit_on_close, false}
]}
]}
].
Default value is 128. When pending connection queue length grows beyond this value, connections will be rejected by the operating system. See also net.core.somaxconn in the kernel tuning section.
Dealing with High Connection Churn
Why is High Connection Churn Problematic?
Workloads with high connection churn (a high rate of connections being opened and closed) will require TCP setting tuning to avoid exhaustion of certain resources: max number of file handles, Erlang processes on RabbitMQ nodes, kernel’s ephemeral port range (for hosts that open a lot of connections, including Federation links and Shovel connections), and others. Nodes that are exhausted of those resources won’t be able to accept new connections, which will negatively affect overall system availability.
Due to a combination of certain TCP features and defaults of most modern Linux distributions, closed connections can be detected after a prolonged period of time. This is covered in the heartbeats guide. This can be one contributing factor to connection build-up. Another is the TIME_WAIT TCP connection state. The state primarily exists to make sure that retransmitted segments from closed connections won’t “reappear” on a different (newer) connection with the same client host and port. Depending on the OS and TCP stack configuration connections can spend minutes in this state, which on a busy system is guaranteed to lead to a connection build-up.
See Coping with the TCP TIME_WAIT connections on busy servers for details.
TCP stack configuration can reduce peak number of connection in closing states and avoid resource exhaustion, in turn allowing nodes to accept new connections at all times.
High connection churn can also mean developer mistakes or incorrect assumptions about how the messaging protocols supported by RabbitMQ are meant to be used. All supported protocols assume long lived connections. Applications that open and almost immediately close connections unnecessarily waste resources (network bandwidth, CPU, RAM) and contribute to the problem described in this section.
Inspecting Connections and Gathering Evidence
If a node fails to accept connections it is important to first gather data (metrics, evidence) to determing the state of the system and the limiting factor (exhausted resource). Tools such as netstat, ss, lsof can be used to inspect TCP connections of a node. See Troubleshooting Networking for examples.
While heartbeats are sufficient for detecting defunct connections, they are not going to be sufficient in high connection churn scenarios. In those cases heartbeats should be combined with TCP keepalives to speed up disconnected client detection.
Reducing Amount of Time Spent in TIME_WAIT
TCP stack tuning can also reduce the amount of time connections spend in the TIME_WAIT state. The net.ipv4.tcp_fin_timeout setting specifically can help here:
net.ipv4.tcp_fin_timeout = 30
Note that like other settings prefixed with net.ipv4., this one applies to both IPv4 and IPv6 connections despite the name.
If inbound connections (from clients, plugins, CLI tools and so on) do not rely on NAT, net.ipv4.tcp_tw_reuse can be set to 1 (enabled) to allow the kernel to reuse sockets in the TIME_WAIT state for outgoing connections. This setting can be applied on client hosts or intermediaries such as proxies and load balancers. Note that if NAT is used the setting is not safe and can lead to hard to track down issues.
The settings above generally should be combined with reduced TCP keepalive values, for example:
net.ipv4.tcp_fin_timeout = 30
net.ipv4.tcp_keepalive_time=30
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=4
net.ipv4.tcp_tw_reuse = 1
OS Level Tuning
Operating system settings can affect operation of RabbitMQ. Some are directly related to networking (e.g. TCP settings), others affect TCP sockets as well as other things (e.g. open file handles limit).
Understanding these limits is important, as they may change depending on the workload.
A few important configurable kernel options include (note that despite option names they are effective for both IPv4 and IPv6 connections):
Kernel setting Description
fs.file-max Max number of files the kernel will allocate. Limits and current value can be inspected using /proc/sys/fs/file-nr.
net.ipv4.ip_local_port_range Local IP port range, define as a pair of values. The range must provide enough entries for the peak number of concurrent connections.
net.ipv4.tcp_tw_reuse When enabled, allows the kernel to reuse sockets in TIME_WAIT state when it’s safe to do so. See Dealing with High Connection Churn. This option is dangerous when clients and peers connect using NAT.
net.ipv4.tcp_fin_timeout Lowering this timeout to a value in the 15-30 second range reduces the amount of time closed connections will stay in the TIME_WAIT state. See Dealing with High Connection Churn.
net.core.somaxconn Size of the listen queue (how many connections are in the process of being established at the same time). Default is 128. Increase to 4096 or higher to support inbound connection bursts, e.g. when clients reconnect en masse.
net.ipv4.tcp_max_syn_backlog Maximum number of remembered connection requests which did not receive an acknowledgment yet from connecting client. Default is 128, max value is 65535. 4096 and 8192 are recommended starting values when optimising for throughput.
net.ipv4.tcp_keepalive_* net.ipv4.tcp_keepalive_time, net.ipv4.tcp_keepalive_intvl, and net.ipv4.tcp_keepalive_probes configure TCP keepalive. AMQP 0-9-1 and STOMP have Heartbeats which partially undo its effect, namely that it can take minutes to detect an unresponsive peer, e.g. in case of a hardware or power failure. MQTT also has its own keepalives mechanism which is the same idea under a different name. When enabling TCP keepalive with default settings, we recommend setting heartbeat timeout to 8-20 seconds. Also see a note on TCP keepalives later in this guide.
net.ipv4.conf.default.rp_filter Enabled reverse path filtering. If IP address spoofing is not a concern for your system, disable it.
Note that default values for these vary between Linux kernel releases and distributions. Using a recent kernel (3.9 or later) is recommended.
Kernel parameter tuning differs from OS to OS. This guide focuses on Linux. To configure a kernel parameter interactively, use sysctl -w (requires superuser privileges), for example:
sysctl -w fs.file-max 200000
To make the changes permanent (stick between reboots), they need to be added to /etc/sysctl.conf. See sysctl(8) and sysctl.conf(5) for more details.
TCP stack tuning is a broad topic that is covered in much detail elsewhere:
Enabling High Performance Data Transfers
Network Tuning Guide
TCP Socket Options
Common Options
Kernel setting Description
tcp_listen_options.nodelay When set to true, disables Nagle’s algorithm. Default is true. Highly recommended for most users.
tcp_listen_options.sndbuf See TCP buffers discussion earlier in this guide. Default value is automatically tuned by the OS, typically in the 88 KiB to 128 KiB range on modern Linux versions. Increasing buffer size improves consumer throughput and RAM use for every connection. Decreasing has the opposite effect.
tcp_listen_options.recbuf See TCP buffers discussion earlier in this guide. Default value effects are similar to that of tcp_listen_options.sndbuf but for publishers and protocol operations in general.
tcp_listen_options.backlog Maximum size of the unaccepted TCP connections queue. When this size is reached, new connections will be rejected. Set to 4096 or higher for environments with thousands of concurrent connections and possible bulk client reconnections.
tcp_listen_options.keepalive When set to true, enables TCP keepalives (see above). Default is false. Makes sense for environments where connections can go idle for a long time (at least 10 minutes), although using heartbeats is still recommended over this option.
Defaults
Below is the default TCP socket option configuration used by RabbitMQ:
TCP connection backlog is limited to 128 connections
Nagle's algorithm is disabled
Server socket lingering is enabled with the timeout of 0
Heartbeats
Some protocols supported by RabbitMQ, including AMQP 0-9-1, support heartbeats, a way to detect dead TCP peers quicker. Please refer to the Heartbeats guide for more information.
Net Tick Time
Heartbeats are used to detect peer or connection failure between clients and RabbitMQ nodes. net_ticktime serves the same purpose but for cluster node communication. Values lower than 5 (seconds) may result in false positive and are not recommended.
TCP Keepalives
TCP contains a mechanism similar in purpose to the heartbeat (a.k.a. keepalive) one in messaging protocols and net tick timeout covered above: TCP keepalives. Due to inadequate defaults, TCP keepalives often don’t work the way they are supposed to: it takes a very long time (say, an hour or more) to detect a dead peer. However, with tuning they can serve the same purpose as heartbeats and clean up stale TCP connections e.g. with clients that opted to not use heartbeats, intentionally or not.
Below is an example sysctl configuration for TCP keepalives that considers TCP connections dead or unreachable after 70 seconds (4 attempts every 10 seconds after connection idle for 30 seconds):
net.ipv4.tcp_keepalive_time=30
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=4
TCP keepalives can be a useful additional defense mechanism in environments where RabbitMQ operator has no control over application settings or client libraries used.
Connection Handshake Timeout
RabbitMQ has a timeout for connection handshake, 10 seconds by default. When clients run in heavily constrained environments, it may be necessary to increase the timeout. This can be done via the rabbit.handshake_timeout (in milliseconds):
handshake_timeout = 20000
Using the classic config format:
[
{rabbit, [
%% 20 seconds
{handshake_timeout, 20000}
]}
].
It should be pointed out that this is only necessary with very constrained clients and networks. Handshake timeouts in other circumstances indicate a problem elsewhere.
TLS (SSL) Handshake
If TLS/SSL is enabled, it may necessary to increase also the TLS/SSL handshake timeout. This can be done via the rabbit.ssl_handshake_timeout (in milliseconds):
ssl_handshake_timeout = 10000
Using the classic config format:
[
{rabbit, [
%% 10 seconds
{ssl_handshake_timeout, 10000}
]}
].
Hostname Resolution and DNS
In many cases, RabbitMQ relies on the Erlang runtime for inter-node communication (including tools such as rabbitmqctl, rabbitmq-plugins, etc). Client libraries also perform hostname resolution when connecting to RabbitMQ nodes. This section briefly covers most common issues associated with that.
Performed by Client Libraries
If a client library is configured to connect to a hostname, it performs hostname resolution. Depending on DNS and local resolver (/etc/hosts and similar) configuration, this can take some time. Incorrect configuration may lead to resolution timeouts, e.g. when trying to resolve a local hostname such as my-dev-machine, over DNS. As a result, client connections can take a long time (from tens of seconds to a few minutes).
Short and Fully-qualified RabbitMQ Node Names
RabbitMQ relies on the Erlang runtime for inter-node communication. Erlang nodes include a hostname, either short (rmq1) or fully-qualified (rmq1.dev.megacorp.local). Mixing short and fully-qualified hostnames is not allowed by the runtime. Every node in a cluster must be able to resolve every other node’s hostname, short or fully-qualified.
By default RabbitMQ will use short hostnames. Set the RABBITMQ_USE_LONGNAME environment variable to make RabbitMQ nodes use fully-qualified names, e.g. rmq1.dev.megacorp.local.
Reverse DNS Lookups
If the reverse_dns_lookups configuration option is set to true, RabbitMQ will perform reverse DNS lookups for client IP addresses and list hostnames in connection information (e.g. in the Management UI).
Reverse DNS lookups can potentially take a long time if node’s hostname resolution is not optimally configured. This can increase latency when accepting client connections.
To explicitly enable reverse DNS lookups:
reverse_dns_lookups = true
To disable reverse DNS lookups:
reverse_dns_lookups = false
Using the classic config format:
[
{rabbit, [
{reverse_dns_lookups, false}
]}
].
Connection Event Logging
See Connection Lifecycle Events in the logging guide.
Troubleshooting Network Connectivity
A methodology for troubleshooting of networking-related issues is covered in a separate guide.
Getting Help and Providing Feedback
If you have questions about the contents of this guide or any other topic related to RabbitMQ, don’t hesitate to ask them on the RabbitMQ mailing list.
Help Us Improve the Docs ❤️
If you’d like to contribute an improvement to the site, its source is available on GitHub. Simply fork the repository and submit a pull request. Thank you!
In This Section
Server Documentation
Configuration
File Locations
Logging
Persistence
Networking
Parameters and Policies
Management UI
Monitoring
Production Checklist
TLS Support
Feature Flags
Distributed RabbitMQ
Clustering
Reliable Delivery
Backup and restore
Alarms
Memory Use
Networking
Troubleshooting Networking
Virtual Hosts
High Availability (pacemaker)
Access Control (Authorisation)
Authentication Mechanisms
LDAP
Lazy Queues
Internal Event Exchange
Firehose (Message Tracing)
Manual Pages
Windows Quirks
Client Documentation
Plugins
News
Protocol
Our Extensions
Building
Previous Releases
License
In This Section
Server Documentation
Configuration
File Locations
Logging
Persistence
Networking
Parameters and Policies
Management UI
Monitoring
Production Checklist
TLS Support
Feature Flags
Distributed RabbitMQ
Clustering
Reliable Delivery
Backup and restore
Alarms
Memory Use
Networking
Troubleshooting Networking
Virtual Hosts
High Availability (pacemaker)
Access Control (Authorisation)
Authentication Mechanisms
LDAP
Lazy Queues
Internal Event Exchange
Firehose (Message Tracing)
Manual Pages
Windows Quirks
Client Documentation
Plugins
News
Protocol
Our Extensions
Building
Previous Releases
License
RabbitMQ
Features
Get Started
Support
Community
Docs
Blog
Copyright © 2007-2020 VMware, Inc. or its affiliates. All rights reserved. Terms of Use, Privacy and Trademark Guidelines