Open
Description
Describe the bug
We have a 3 node RabbitMQ cluster in our production environment where the rabbitmq-server service stops randomly. Systemd automatically restarts the service, but these stops occur anywhere from once to multiple times per day. In contrast, our lab cluster with identical settings but very light usage does not experience this issue. I opened this rabbitmq/rabbitmq-server#13223 ticket first.
Journal:
Feb 13 11:33:17 rabbit2a systemd[1]: Started RabbitMQ broker.
Feb 14 02:06:10 rabbit2a rabbitmq-server[1703247]: beam/erl_term.h:1527:tag_val_def() Assertion failed: tag_val_def error
Feb 14 02:06:12 rabbit2a rabbitmq-server[1703286]: [os_mon] memory supervisor port (memsup): Erlang has closed
Feb 14 02:06:12 rabbit2a rabbitmq-server[1703287]: [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
Feb 14 02:06:12 rabbit2a systemd[1]: rabbitmq-server.service: Main process exited, code=dumped, status=6/ABRT
Feb 14 02:06:12 rabbit2a systemd[1]: rabbitmq-server.service: Failed with result 'core-dump'.
Feb 14 02:06:12 rabbit2a systemd[1]: rabbitmq-server.service: Consumed 3h 49min 20.844s CPU time.
Feb 14 02:06:22 rabbit2a systemd[1]: rabbitmq-server.service: Scheduled restart job, restart counter is at 1.
Feb 14 02:06:22 rabbit2a systemd[1]: Stopped RabbitMQ broker.
Feb 14 02:06:22 rabbit2a systemd[1]: rabbitmq-server.service: Consumed 3h 49min 20.844s CPU time.
Feb 14 02:06:22 rabbit2a systemd[1]: Starting RabbitMQ broker..
Cluster status:
$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2a ...
Basics
Cluster name: rabbit2
Total CPU cores available cluster-wide: 12
Cluster Tags
(none)
Disk Nodes
rabbit@rabbit2a
rabbit@rabbit2b
rabbit@rabbit2c
Running Nodes
rabbit@rabbit2a
rabbit@rabbit2b
rabbit@rabbit2c
Versions
rabbit@rabbit2a: RabbitMQ 4.0.5 on Erlang 27.2.2
rabbit@rabbit2b: RabbitMQ 4.0.5 on Erlang 27.2.2
rabbit@rabbit2c: RabbitMQ 4.0.5 on Erlang 27.2.2
CPU Cores
Node: rabbit@rabbit2a, available CPU cores: 4
Node: rabbit@rabbit2b, available CPU cores: 4
Node: rabbit@rabbit2c, available CPU cores: 4
Maintenance status
Node: rabbit@rabbit2a, status: not under maintenance
Node: rabbit@rabbit2b, status: not under maintenance
Node: rabbit@rabbit2c, status: not under maintenance
Alarms
(none)
Network Partitions
(none)
Listeners
Node: rabbit@rabbit2a, interface: [::], port: 15671, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@rabbit2a, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@rabbit2a, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rabbit2a, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@rabbit2a, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Node: rabbit@rabbit2b, interface: [::], port: 15671, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@rabbit2b, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@rabbit2b, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rabbit2b, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@rabbit2b, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Node: rabbit@rabbit2c, interface: [::], port: 15671, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@rabbit2c, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@rabbit2c, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rabbit2c, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@rabbit2c, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Feature flags
Flag: classic_mirrored_queue_version, state: enabled
Flag: classic_queue_type_delivery_support, state: enabled
Flag: detailed_queues_endpoint, state: enabled
Flag: direct_exchange_routing_v2, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: feature_flags_v2, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: khepri_db, state: disabled
Flag: listener_records_in_ets, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: message_containers, state: enabled
Flag: message_containers_deaths_v2, state: enabled
Flag: quorum_queue, state: enabled
Flag: quorum_queue_non_voters, state: enabled
Flag: rabbit_exchange_type_local_random, state: enabled
Flag: rabbitmq_4.0.0, state: enabled
Flag: restart_streams, state: enabled
Flag: stream_filtering, state: enabled
Flag: stream_queue, state: enabled
Flag: stream_sac_coordinator_unblock_group, state: enabled
Flag: stream_single_active_consumer, state: enabled
Flag: stream_update_config_command, state: enabled
Flag: tracking_records_in_ets, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
$ free -m
total used free shared buff/cache available
Mem: 7530 1223 2417 82 4269 6306
Swap: 2047 0 2047
$
To Reproduce
- Install RedHat based OS
- Install RabbitMQ server from https://yum2.rabbitmq.com/rabbitmq/el/$releasever/$basearch or https://yum1.rabbitmq.com/rabbitmq/el/$releasever/$basearch. Erlang repo https://yum1.rabbitmq.com/erlang/el/$releasever/$basearch or https://yum2.rabbitmq.com/erlang/el/$releasever/$basearch
- Settings in /etc/rabbitmq/rabbitmq.conf
listeners.ssl.default = 5671
loopback_users.guest = false
ssl_options.verify = verify_peer
ssl_options.fail_if_no_peer_cert = false
ssl_options.cacertfile = /etc/pki/ca-trust/source/anchors/EXAMPLEORGCA.crt
ssl_options.certfile = /etc/pki/tls/certs/rabbit2.example.org-chained.crt
ssl_options.keyfile = /etc/pki/tls/private/rabbit2.example.org.key
ssl_options.versions.1 = tlsv1.3
ssl_options.versions.2 = tlsv1.2
cluster_name = rabbit2
management.ssl.port = 15671
management.ssl.cacertfile = /etc/pki/ca-trust/source/anchors/EXAMPLEORGCA.crt
management.ssl.certfile = /etc/pki/tls/certs/rabbit2.example.org-chained.crt
management.ssl.keyfile = /etc/pki/tls/private/rabbit2.example.org.key
auth_ldap.ssl_options.fail_if_no_peer_cert = true
Expected behavior
rabbitmq-server.service running without crashing.
Affected versions
$ erl
Erlang/OTP 27 [erts-15.2.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit:ns]
Eshell V15.2.2 (press Ctrl+G to abort, type help(). for help)
1>
Additional context
I have the beam.smp and a core dump that compressed is 62M. Not sure how to share that.
Activity