-
Notifications
You must be signed in to change notification settings - Fork 33
Backport storage-alerts #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
During upgrade from Tarantool 2.X to 3.X instance doesn't have name for some time (until box.schema.upgrade() is run). During this time master throws the alert UNREACHABLE_REPLICA, even though it's fully functional. This happens due to the fact, that named identification is used and vshard supposes, that box.info.name is accessible. It uses it in order to identify the current instance and fails. Let's use box.info.id in order to identify the current instance. Closes tarantool#13 NO_DOC=bugfix
When name is not set, UNREACHABLE_REPLICA alert is shown as "Replica cdata<void *>: NULL isn't active", which is not very informative. Let's fallback to box.info.id, when named identification is used, but instance name is not set yet. Follow-up tarantool#13 NO_DOC=bugfix
In test `test_named_hot_reload` there is no reconfiguration to uuid_as_key identification_mode in the end of the test. This test negatively influences other tests which expect uuid_as_key identification_mode. Needed for tarantool#493 NO_DOC=test
The function `assert_server_no_alerts` is moved from `persistent_names_test` into `test/luatest_helpers/asserts.lua`. This change has been made in order to use this function in subsequent commit in `test_alerts_for_non_vshard_config_template`. Also some usages of this function are changed in `persistent_names_test` with `asserts` module. Needed for tarantool#493 NO_DOC=test
This test expected alert for non-vshard replica to be shown, which is incorrect behavior and will be fixed in the subsequent commit. Needed for tarantool#493 NO_DOC=test
Before this patch vshard throwed alerts for replicas, which are not in its config. E.g. disconnected CDC replica caused `UNREACHABLE_REPLICA` alert. This patch fixes it by skipping unknown replicas. Closes tarantool#493 NO_DOC=bugfix
This test checks scenario when replica disconnects from replicaset in named vshard config. In this situation two alerts should be thrown: `UNREACHABLE_REPLICA` and `UNREACHABLE_REPLICASET`. It is necessary to add this test in order to reduce untested places in the codebase. Also a function `info_assert_alert` is removed from `router_2_2_test` since there is a more appropriate function in `asserts` module. Follow-up tarantool#493 NO_DOC=test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thank you for fixing that!
@@ -40,4 +40,10 @@ function asserts:wait_fullmesh(servers, wait_time) | |||
end) | |||
end | |||
|
|||
function asserts:assert_server_no_alerts(server) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This entire file seems dead in vshard. I propose to
- Drop
test/luatest_helpers/asserts.lua
. - Your new assertion better be added to
test/luatest_helpers/server.lua
. As a server's method. You even take the server as an argument here. So it makes sense to make it a server's method.
test_group.test_alerts_for_named_replica = function(g) | ||
t.run_only_if(vutil.feature.persistent_names) | ||
|
||
local named_replica = server:new({ | ||
alias = 'named_replica_with_name_identification', | ||
box_cfg = { | ||
replication = g.replica_1_a.net_box_uri, | ||
instance_name = 'named_replica' | ||
} | ||
}) | ||
|
||
named_replica:start() | ||
named_replica:wait_for_vclock_of(g.replica_1_a) | ||
asserts:assert_server_no_alerts(g.replica_1_a) | ||
local instance_id = named_replica:instance_id() | ||
|
||
named_replica:stop() | ||
asserts:assert_server_no_alerts(g.replica_1_a) | ||
|
||
named_replica:drop() | ||
g.replica_1_a:exec(function(id) | ||
box.space._cluster:delete(id) | ||
end, {instance_id}) | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this test identical to the ones added in storage_1_test.lua
?
@@ -46,4 +46,13 @@ function asserts:assert_server_no_alerts(server) | |||
end) | |||
end | |||
|
|||
function asserts:info_assert_alert(alerts, alert_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not using anything in this asserts
object. So it doesn't need to be a method of it. I suggest to either just duplicate this function in the needed tests, or put it into vtest
. This is where we store all our vshard testing helpers.
Closes #493