Skip to content

Backport storage-alerts #544

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

mrForza
Copy link

@mrForza mrForza commented Apr 18, 2025

Closes #493

Serpentian and others added 7 commits April 18, 2025 11:13
During upgrade from Tarantool 2.X to 3.X instance doesn't have name
for some time (until box.schema.upgrade() is run). During this time
master throws the alert UNREACHABLE_REPLICA, even though it's fully
functional.

This happens due to the fact, that named identification is used and
vshard supposes, that box.info.name is accessible. It uses it in
order to identify the current instance and fails.

Let's use box.info.id in order to identify the current instance.

Closes tarantool#13

NO_DOC=bugfix
When name is not set, UNREACHABLE_REPLICA alert is shown as "Replica
cdata<void *>: NULL isn't active", which is not very informative.

Let's fallback to box.info.id, when named identification is used, but
instance name is not set yet.

Follow-up tarantool#13

NO_DOC=bugfix
In test `test_named_hot_reload` there is no reconfiguration to
uuid_as_key identification_mode in the end of the test. This
test negatively influences other tests which expect uuid_as_key
identification_mode.

Needed for tarantool#493

NO_DOC=test
The function `assert_server_no_alerts` is moved from
`persistent_names_test` into `test/luatest_helpers/asserts.lua`.
This change has been made in order to use this function in subsequent
commit in `test_alerts_for_non_vshard_config_template`. Also some usages
of this function are changed in `persistent_names_test` with `asserts`
module.

Needed for tarantool#493

NO_DOC=test
This test expected alert for non-vshard replica to be shown, which is
incorrect behavior and will be fixed in the subsequent commit.

Needed for tarantool#493

NO_DOC=test
Before this patch vshard throwed alerts for replicas, which are not in
its config. E.g. disconnected CDC replica caused `UNREACHABLE_REPLICA`
alert. This patch fixes it by skipping unknown replicas.

Closes tarantool#493

NO_DOC=bugfix
This test checks scenario when replica disconnects from replicaset
in named vshard config. In this situation two alerts should be thrown:
`UNREACHABLE_REPLICA` and `UNREACHABLE_REPLICASET`. It is necessary to
add this test in order to reduce untested places in the codebase. Also
a function `info_assert_alert` is removed from `router_2_2_test` since
there is a more appropriate function in `asserts` module.

Follow-up tarantool#493

NO_DOC=test
Copy link
Collaborator

@Gerold103 Gerold103 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, thank you for fixing that!

@@ -40,4 +40,10 @@ function asserts:wait_fullmesh(servers, wait_time)
end)
end

function asserts:assert_server_no_alerts(server)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This entire file seems dead in vshard. I propose to

  • Drop test/luatest_helpers/asserts.lua.
  • Your new assertion better be added to test/luatest_helpers/server.lua. As a server's method. You even take the server as an argument here. So it makes sense to make it a server's method.

Comment on lines +144 to +167
test_group.test_alerts_for_named_replica = function(g)
t.run_only_if(vutil.feature.persistent_names)

local named_replica = server:new({
alias = 'named_replica_with_name_identification',
box_cfg = {
replication = g.replica_1_a.net_box_uri,
instance_name = 'named_replica'
}
})

named_replica:start()
named_replica:wait_for_vclock_of(g.replica_1_a)
asserts:assert_server_no_alerts(g.replica_1_a)
local instance_id = named_replica:instance_id()

named_replica:stop()
asserts:assert_server_no_alerts(g.replica_1_a)

named_replica:drop()
g.replica_1_a:exec(function(id)
box.space._cluster:delete(id)
end, {instance_id})
end
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this test identical to the ones added in storage_1_test.lua?

@@ -46,4 +46,13 @@ function asserts:assert_server_no_alerts(server)
end)
end

function asserts:info_assert_alert(alerts, alert_name)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not using anything in this asserts object. So it doesn't need to be a method of it. I suggest to either just duplicate this function in the needed tests, or put it into vtest. This is where we store all our vshard testing helpers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vshard shows alerts for replica, which are not in the vshard's config
3 participants