Skip to content
This repository was archived by the owner on Jan 20, 2025. It is now read-only.
This repository was archived by the owner on Jan 20, 2025. It is now read-only.

Plugin re-activation can fail #57

@akissa

Description

@akissa

When enabling the plugin on the first node in a cluster the following error is generated.

[root@db3 ~]# rabbitmq-plugins enable autocluster
The following plugins have been enabled:
  rabbitmq_aws
  autocluster

Applying plugin configuration to [email protected]... failed.
Error: {{badmatch,false},
        [{autocluster_periodic,start_delayed,3,
                               [{file,"src/autocluster_periodic.erl"},
                                {line,47}]},
         {autocluster_consul,register,0,
                             [{file,"src/autocluster_consul.erl"},{line,135}]},
         {autocluster,register_with_backend,1,
                      [{file,"src/autocluster.erl"},{line,307}]},
         {autocluster,run_steps,1,[{file,"src/autocluster.erl"},{line,131}]},
         {rabbit_boot_steps,'-run_step/2-lc$^1/1-1-',1,
                            [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
         {rabbit_boot_steps,run_step,2,
                            [{file,"src/rabbit_boot_steps.erl"},{line,49}]},
         {rabbit_boot_steps,'-run_boot_steps/1-lc$^0/1-0-',1,
                            [{file,"src/rabbit_boot_steps.erl"},{line,26}]},
         {rabbit_boot_steps,run_boot_steps,1,
                            [{file,"src/rabbit_boot_steps.erl"},{line,26}]}]}

Investigating this seems to point to a race condition in the logic, The plugin runs through the steps initially and acquires a lock and possibly inserts into ets, then on determining that it is the only node goes through the steps again while still holding the initial lock.

Retaining the initial lock itself could be a problem, after the initial lock is released the process then proceeds to register with consul, after registration the issue then arises when setting up the delayed task.

It seems the initial run may have already created this, thus a false is returned from ets:insert_new whose documentation[1] says a false is returned if keys are already exist.

In this state, if you then disable the plugin, it keeps running and trying to recreate the node in consul. I suspect the delayed task is not removed.

A spiral loop is then entered because consul returns 500 result code when the a request is made to check the state of a service that does not exist.

https://github.com/rabbitmq/rabbitmq-autocluster/blob/stable/src/autocluster_consul.erl#L194
https://github.com/rabbitmq/rabbitmq-autocluster/blob/stable/src/autocluster_consul.erl#L212

The only way to get out of this is to restart the rabbitmq-server process.

I do not have much time at the moment to dig through this so i have created a workaround[2] to stop the enable error until i have time later. If someone else is able to look it to this it would be great.

[1] http://erlang.org/doc/man/ets.html#insert_new-2
[2] akissa@39e0cb4

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions