Skip to content

Fixes race condition in deleting namespaces #5443

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

keith-turner
Copy link
Contributor

Namespace deletion checked that no tables existed in a namespace in the Accumulo client. No checks were done on the server side. This has a race condition where the set of tables in a namespace changes after the client side check.

Added a server side check that is done after the namespace lock is acquired. The client side check should be removed, however this would not be safe to do in a bug fix release. If the client side check is removed in 2.1.4 and a 2.1.4 client runs with a 2.1.3 server then nothing would do the check.

To test this change the client side check was commented out and NamespaceIT.deleteNonEmptyNamespace() was run. This shows the change works now, but regressions in this new code will not be detected.

In main/4.0 the client side check can be removed.

Namespace deletion checked that no tables existed in a namespace in the
Accumulo client.  No checks were done on the servers side.  This has
race conditions where the set of tables in a namespace changes after the
client side check.

Added a server side check that is done after the namespace lock is
acquired.  The client side check should be removed, however this would
not be safe to do in a bug fix release.  If the client side check is
removed in 2.1.4 and a 2.1.4 client runs with a 2.1.3 server then
nothing would do the check.

To test this change the client side check was commented out and
NamespaceIT.deleteNonEmptyNamespace() was run.  This shows the change
works now, but regressions in this new code will not be detected.

In main/4.0 the client side check can be removed.
@keith-turner keith-turner added this to the 2.1.4 milestone Apr 1, 2025
@keith-turner keith-turner linked an issue Apr 1, 2025 that may be closed by this pull request
Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should modifying tables create write locks on the namespace? It seems like they should, since creating a table is modifying the namespace contents (just like you need write access on a directory to create a file in the directory in Linux). With the #4698, it seems like this check will be easier. We could just read the mapping directly from ZooKeeper.

The string-based check makes me uncomfortable. It seems too fragile.

Comment on lines 172 to 174
} catch (NamespaceExistsException e) {
// should not happen
throw new AssertionError(e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of relying on the special String to detect this situation, you could instead use this existing NamespaceExistsException as a special case with the semantics of "namespace still exists (because it wasn't empty and couldn't be deleted)".

Namespaces.getTableIds(environment.getContext(), namespaceId);
if (!tableIdsInNamespace.isEmpty()) {
throw new AcceptableThriftTableOperationException(null, null, TableOperation.DELETE,
TableOperationExceptionType.OTHER,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a different TableOperationExceptionType instead of relying on OTHER?

@keith-turner
Copy link
Contributor Author

Should modifying tables create write locks on the namespace? It seems like they should, since creating a table is modifying the namespace contents (just like you need write access on a directory to create a file in the directory in Linux). With the #4698, it seems like this check will be easier. We could just read the mapping directly from ZooKeeper.

Since operations get table write locks they are operating on different tables in the namespace. This avoids all problems I can think of. If ops like create table did get a namespace write lock, then would not be able to run concurrent create table ops in the same namespace.

The string-based check makes me uncomfortable. It seems too fragile.

Guess that string is kinda sketchy. The motivation for that design was to minimize code change by leveraging as much existing code as possible w/o changing it since its a bug fix change for an obscure edge case. Maybe took that a bit too far. Will look at improving it.

@ctubbsii
Copy link
Member

ctubbsii commented Apr 4, 2025

It seems like it might be best to wait until after #4698 to fix this... the table mapping in the namespace can be directly read inside the namespace operation after the write lock is obtained to ensure no tables exist. It doesn't fix it for earlier versions, but I think it's also probably fine as-is for earlier versions like 2.1.

@keith-turner
Copy link
Contributor Author

It seems like it might be best to wait until after #4698 to fix this... the table mapping in the namespace can be directly read inside the namespace operation after the write lock is obtained to ensure no tables exist. It doesn't fix it for earlier versions, but I think it's also probably fine as-is for earlier versions like 2.1.

SGTM. Its hard to properly fix this in 2.1 w/o possibly disruptive changes. In main, after #4698 is merged, this can be fixed properly w/o adding tech debt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race condition between table creation and namespace deletion
2 participants