-
Notifications
You must be signed in to change notification settings - Fork 460
Fixes race condition in deleting namespaces #5443
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Namespace deletion checked that no tables existed in a namespace in the Accumulo client. No checks were done on the servers side. This has race conditions where the set of tables in a namespace changes after the client side check. Added a server side check that is done after the namespace lock is acquired. The client side check should be removed, however this would not be safe to do in a bug fix release. If the client side check is removed in 2.1.4 and a 2.1.4 client runs with a 2.1.3 server then nothing would do the check. To test this change the client side check was commented out and NamespaceIT.deleteNonEmptyNamespace() was run. This shows the change works now, but regressions in this new code will not be detected. In main/4.0 the client side check can be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should modifying tables create write locks on the namespace? It seems like they should, since creating a table is modifying the namespace contents (just like you need write access on a directory to create a file in the directory in Linux). With the #4698, it seems like this check will be easier. We could just read the mapping directly from ZooKeeper.
The string-based check makes me uncomfortable. It seems too fragile.
} catch (NamespaceExistsException e) { | ||
// should not happen | ||
throw new AssertionError(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of relying on the special String to detect this situation, you could instead use this existing NamespaceExistsException as a special case with the semantics of "namespace still exists (because it wasn't empty and couldn't be deleted)".
Namespaces.getTableIds(environment.getContext(), namespaceId); | ||
if (!tableIdsInNamespace.isEmpty()) { | ||
throw new AcceptableThriftTableOperationException(null, null, TableOperation.DELETE, | ||
TableOperationExceptionType.OTHER, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we add a different TableOperationExceptionType instead of relying on OTHER?
Since operations get table write locks they are operating on different tables in the namespace. This avoids all problems I can think of. If ops like create table did get a namespace write lock, then would not be able to run concurrent create table ops in the same namespace.
Guess that string is kinda sketchy. The motivation for that design was to minimize code change by leveraging as much existing code as possible w/o changing it since its a bug fix change for an obscure edge case. Maybe took that a bit too far. Will look at improving it. |
It seems like it might be best to wait until after #4698 to fix this... the table mapping in the namespace can be directly read inside the namespace operation after the write lock is obtained to ensure no tables exist. It doesn't fix it for earlier versions, but I think it's also probably fine as-is for earlier versions like 2.1. |
SGTM. Its hard to properly fix this in 2.1 w/o possibly disruptive changes. In main, after #4698 is merged, this can be fixed properly w/o adding tech debt. |
Namespace deletion checked that no tables existed in a namespace in the Accumulo client. No checks were done on the server side. This has a race condition where the set of tables in a namespace changes after the client side check.
Added a server side check that is done after the namespace lock is acquired. The client side check should be removed, however this would not be safe to do in a bug fix release. If the client side check is removed in 2.1.4 and a 2.1.4 client runs with a 2.1.3 server then nothing would do the check.
To test this change the client side check was commented out and NamespaceIT.deleteNonEmptyNamespace() was run. This shows the change works now, but regressions in this new code will not be detected.
In main/4.0 the client side check can be removed.