Skip to content

Erlang runtime error RabbitMQ #9437

Open
@perkons

Description

Describe the bug
We have a 3 node RabbitMQ cluster in our production environment where the rabbitmq-server service stops randomly. Systemd automatically restarts the service, but these stops occur anywhere from once to multiple times per day. In contrast, our lab cluster with identical settings but very light usage does not experience this issue. I opened this rabbitmq/rabbitmq-server#13223 ticket first.

Journal:

Feb 13 11:33:17 rabbit2a systemd[1]: Started RabbitMQ broker.
Feb 14 02:06:10 rabbit2a rabbitmq-server[1703247]: beam/erl_term.h:1527:tag_val_def() Assertion failed: tag_val_def error
Feb 14 02:06:12 rabbit2a rabbitmq-server[1703286]: [os_mon] memory supervisor port (memsup): Erlang has closed
Feb 14 02:06:12 rabbit2a rabbitmq-server[1703287]: [os_mon] cpu supervisor port (cpu_sup): Erlang has closed
Feb 14 02:06:12 rabbit2a systemd[1]: rabbitmq-server.service: Main process exited, code=dumped, status=6/ABRT
Feb 14 02:06:12 rabbit2a systemd[1]: rabbitmq-server.service: Failed with result 'core-dump'.
Feb 14 02:06:12 rabbit2a systemd[1]: rabbitmq-server.service: Consumed 3h 49min 20.844s CPU time.
Feb 14 02:06:22 rabbit2a systemd[1]: rabbitmq-server.service: Scheduled restart job, restart counter is at 1.
Feb 14 02:06:22 rabbit2a systemd[1]: Stopped RabbitMQ broker.
Feb 14 02:06:22 rabbit2a systemd[1]: rabbitmq-server.service: Consumed 3h 49min 20.844s CPU time.
Feb 14 02:06:22 rabbit2a systemd[1]: Starting RabbitMQ broker..

Cluster status:

$ rabbitmqctl cluster_status
Cluster status of node rabbit@rabbit2a ...
Basics

Cluster name: rabbit2
Total CPU cores available cluster-wide: 12

Cluster Tags

(none)

Disk Nodes

rabbit@rabbit2a
rabbit@rabbit2b
rabbit@rabbit2c

Running Nodes

rabbit@rabbit2a
rabbit@rabbit2b
rabbit@rabbit2c

Versions

rabbit@rabbit2a: RabbitMQ 4.0.5 on Erlang 27.2.2
rabbit@rabbit2b: RabbitMQ 4.0.5 on Erlang 27.2.2
rabbit@rabbit2c: RabbitMQ 4.0.5 on Erlang 27.2.2

CPU Cores

Node: rabbit@rabbit2a, available CPU cores: 4
Node: rabbit@rabbit2b, available CPU cores: 4
Node: rabbit@rabbit2c, available CPU cores: 4

Maintenance status

Node: rabbit@rabbit2a, status: not under maintenance
Node: rabbit@rabbit2b, status: not under maintenance
Node: rabbit@rabbit2c, status: not under maintenance

Alarms

(none)

Network Partitions

(none)

Listeners

Node: rabbit@rabbit2a, interface: [::], port: 15671, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@rabbit2a, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@rabbit2a, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rabbit2a, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@rabbit2a, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Node: rabbit@rabbit2b, interface: [::], port: 15671, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@rabbit2b, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@rabbit2b, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rabbit2b, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@rabbit2b, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS
Node: rabbit@rabbit2c, interface: [::], port: 15671, protocol: https, purpose: HTTP API over TLS (HTTPS)
Node: rabbit@rabbit2c, interface: [::], port: 15692, protocol: http/prometheus, purpose: Prometheus exporter API over HTTP
Node: rabbit@rabbit2c, interface: [::], port: 25672, protocol: clustering, purpose: inter-node and CLI tool communication
Node: rabbit@rabbit2c, interface: [::], port: 5672, protocol: amqp, purpose: AMQP 0-9-1 and AMQP 1.0
Node: rabbit@rabbit2c, interface: [::], port: 5671, protocol: amqp/ssl, purpose: AMQP 0-9-1 and AMQP 1.0 over TLS

Feature flags

Flag: classic_mirrored_queue_version, state: enabled
Flag: classic_queue_type_delivery_support, state: enabled
Flag: detailed_queues_endpoint, state: enabled
Flag: direct_exchange_routing_v2, state: enabled
Flag: drop_unroutable_metric, state: enabled
Flag: empty_basic_get_metric, state: enabled
Flag: feature_flags_v2, state: enabled
Flag: implicit_default_bindings, state: enabled
Flag: khepri_db, state: disabled
Flag: listener_records_in_ets, state: enabled
Flag: maintenance_mode_status, state: enabled
Flag: message_containers, state: enabled
Flag: message_containers_deaths_v2, state: enabled
Flag: quorum_queue, state: enabled
Flag: quorum_queue_non_voters, state: enabled
Flag: rabbit_exchange_type_local_random, state: enabled
Flag: rabbitmq_4.0.0, state: enabled
Flag: restart_streams, state: enabled
Flag: stream_filtering, state: enabled
Flag: stream_queue, state: enabled
Flag: stream_sac_coordinator_unblock_group, state: enabled
Flag: stream_single_active_consumer, state: enabled
Flag: stream_update_config_command, state: enabled
Flag: tracking_records_in_ets, state: enabled
Flag: user_limits, state: enabled
Flag: virtual_host_metadata, state: enabled
$ free -m
               total        used        free      shared  buff/cache   available
Mem:            7530        1223        2417          82        4269        6306
Swap:           2047           0        2047
$

To Reproduce

  1. Install RedHat based OS
  2. Install RabbitMQ server from https://yum2.rabbitmq.com/rabbitmq/el/$releasever/$basearch or https://yum1.rabbitmq.com/rabbitmq/el/$releasever/$basearch. Erlang repo https://yum1.rabbitmq.com/erlang/el/$releasever/$basearch or https://yum2.rabbitmq.com/erlang/el/$releasever/$basearch
  3. Settings in /etc/rabbitmq/rabbitmq.conf
listeners.ssl.default = 5671
loopback_users.guest = false
ssl_options.verify = verify_peer
ssl_options.fail_if_no_peer_cert = false
ssl_options.cacertfile = /etc/pki/ca-trust/source/anchors/EXAMPLEORGCA.crt
ssl_options.certfile = /etc/pki/tls/certs/rabbit2.example.org-chained.crt
ssl_options.keyfile = /etc/pki/tls/private/rabbit2.example.org.key
ssl_options.versions.1 = tlsv1.3
ssl_options.versions.2 = tlsv1.2
cluster_name = rabbit2
management.ssl.port = 15671
management.ssl.cacertfile = /etc/pki/ca-trust/source/anchors/EXAMPLEORGCA.crt
management.ssl.certfile = /etc/pki/tls/certs/rabbit2.example.org-chained.crt
management.ssl.keyfile = /etc/pki/tls/private/rabbit2.example.org.key
auth_ldap.ssl_options.fail_if_no_peer_cert = true

Expected behavior
rabbitmq-server.service running without crashing.

Affected versions

$ erl
Erlang/OTP 27 [erts-15.2.2] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [jit:ns]

Eshell V15.2.2 (press Ctrl+G to abort, type help(). for help)
1>

Additional context
I have the beam.smp and a core dump that compressed is 62M. Not sure how to share that.

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

Labels

bugIssue is reported as a bugteam:VMAssigned to OTP team VM

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions