Skip to content

Comprehensive table operations IT #5474

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

kevinrr888
Copy link
Member

@kevinrr888 kevinrr888 commented Apr 14, 2025

This PR:

Creates a comprehensive IT of all table operations against both system and user tables while trying to avoid duplicating existing testing. For example, when there was more extensive testing of a table operation elsewhere, these new tests just ensure that that testing still exists. This IT does not test for edge cases, but rather tests for basic expected functionality of all table operations against user tables and all system tables (ROOT, METADATA, FATE, SCAN_REF).

Motivation:

  • No single test testing all table operations making it hard to know what is or isn't tested
  • Lots of operations that aren't tested on all system tables and some operations that are completely untested. Good to ensure each table op works as expected on the different system tables as well as user tables. For example, some ops can't be performed on the metadata tables (ROOT, METADATA) but can be performed on FATE and SCAN_REF, some ops aren't allowed on any system tables, etc.
  • Has made identifying potential issues related to what ops are allowed on what system tables easier. For example, have discovered that offlineing the FATE and SCAN_REF tables are allowed through the table operations API when they shouldn't be (prevent offline all system tables #5433)
  • Ensures end-to-end functionality of table operations always work as expected
  • Ensures FATE, as it's used in the Manager, is functioning as expected for both META (FATE ops against system tables) and USER (FATE ops against user tables) FATE ops. This is important since they use different resources and different stores. Lots of individual FATE tests, but this verifies FATE is working as expected in the Manager.
  • Will force any newly added table operations to have testing here.
  • Will force any new system tables to be considered in these tests.

closes #5427

Creates a comprehensive IT of all table operations against both system
and user tables while trying to avoid duplicating existing testing. For
example, when there was more extensive testing of a table operation
elsewhere, these new tests just ensure that that testing still
exists. This IT does not test for edge cases, but rather tests for basic
expected functionality of all table operations against user tables and
all system tables (ROOT, METADATA, FATE, SCAN_REF).
@kevinrr888 kevinrr888 self-assigned this Apr 14, 2025
@kevinrr888 kevinrr888 added this to the 4.0.0 milestone Apr 14, 2025
Comment on lines 422 to 423
for (var sysTable : List.of(AccumuloTable.ROOT, AccumuloTable.METADATA,
AccumuloTable.SCAN_REF, AccumuloTable.FATE)) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: Test does not work. Potential issue with compacting system tables but more likely issue with testing approach.

This test is not working as expected. Only passes when compacting the FATE table is done last. The root cause of the issue is the call to createFateTableRow(userTable). This initiates a very slow compaction on the userTable that will never complete until cancelled. This is done to create some data in the FATE table and flush the table before we compact. Compacting the FATE table with this compaction FATE op in the FATE table works as expected (compaction completes on FATE table as expected). However, compacting subsequent system tables do not work as expected. One way to see this is putting FATE before SCAN_REF in the list. You can see that the compaction on SCAN_REF hangs indefinitely. It gets stuck on the CompactionDriver.isReady() step. Considered that the slow compaction which would still be running on the userTable is somehow causing the issue, but even if the compaction is cancelled and we wait for it to be completely cancelled and cleaned up from the FATE table, the issue still occurs on subsequent system table compactions. Further, the compaction on the userTable should have no effect on compactions on the system tables since compacting the userTable won't lock the accumulo namespace or anything like that. I have not been able to figure out this issue, if someone can spot an issue with my testing approach or look at the Manager log during the described failure case and figure out what is wrong, that would be much appreciated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run this test case on its own it passes. But does time out when I run all tests in this class.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it will pass on its own but not when all tests are run because some other tests call createFateTableRow(userTable) which is the root cause of the problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take a look into this. If this is not figured out before this PR is merged then we should open an issue for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be great.

Comment on lines 829 to 830
assertThrows(IllegalArgumentException.class, () -> ops
.setTabletAvailability(sysTable.tableName(), new Range(), TabletAvailability.UNHOSTED));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should maybe be an AccumuloException to keep consistency with the other table operations methods that throw an AccumuloException when performed on an unacceptable table?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This unreleased code, so now is the time to change exceptions. Makes sense to me to change it to AccumuloException. One important thing is making sure the expected exceptions are noted in the javadoc, that helps developers and users know what is expected going forward.

Copy link
Member Author

@kevinrr888 kevinrr888 Apr 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #5510

A lot of the table operations don't have javadoc for what the thrown exceptions mean (including setTabletAvailability in this case), so maybe this can be it's own ticket

Comment on lines +363 to +364
assertThrows(Exception.class, () -> client.tableOperations().clone(sysTable.tableName(),
cloneTableName, CloneConfiguration.empty()));
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment about some throwing AccumuloException and some throwing AccumuloSecurityException

Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking at this. Posting comments I have so far.

- Updated tests based on recent changes in main to `offline` and
  `getDiskUsage`
- Address review comments
- Trivial misc cleanup
Comment on lines 422 to 423
for (var sysTable : List.of(AccumuloTable.ROOT, AccumuloTable.METADATA,
AccumuloTable.SCAN_REF, AccumuloTable.FATE)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run this test case on its own it passes. But does time out when I run all tests in this class.

- Compare sets of files instead of number of files in flush
- Extra check to test_setProperty_modifyProperties_removeProperty_getProperties_getTableProperties_getConfiguration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add test of all table operations against system and user tables
4 participants