Skip to content

Modified IteratorEnvironment to no longer throw UOE #5490

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Apr 24, 2025

Conversation

dlmarion
Copy link
Contributor

@dlmarion dlmarion commented Apr 18, 2025

Modified the IteratorEnvironment interface such that the methods no longer have default implementation that throw UnsupportedOperationException. Created the class ClientIteratorEnvironment to serve as the default implementation for the client objects that defined their own implementation and for use in the tests. Removed all test implementations of IteratorEnvironment.

Closes #4810

Modified the IteratorEnvironment interface such that the
methods no longer have default implementation that throw
UnsupportedOperationException. Created the class
ClientIteratorEnvironment to serve as the default implementation
for the client objects that defined their own implementation
and for use in the tests. Removed all test implementations
of IteratorEnvironment.

Closes apache#4810
@dlmarion dlmarion added this to the 2.1.4 milestone Apr 18, 2025
@dlmarion dlmarion self-assigned this Apr 18, 2025
@dlmarion
Copy link
Contributor Author

The replacement of the IteratorEnvironment implementations in ClientSideIteratorScanner, RFileScanner, OfflineIterator with ClientIteratorEnvironment need to be scrutinized. I know that in one case one of the implementations handled the registerSideChannel method, but I could not find it being used anywhere so I just deleted it.

Removed ServiceEnvironment from ClientSideIteratorScanner, use
TabletIteratorEnvironment in server test.
Copy link
Member

@kevinrr888 kevinrr888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have not finished reviewing everything; want to first ensure this is a PR we want to move forward with. Overall, I think these changes are a good idea, however, this refactoring can very easily have bugs that are hard to spot. I'm not sure if this is the best for a 2.1 change, risking potentially introducing more bugs than the original PR set out to fix.

Some additional comments:

Comment on lines 303 to 306
@Override
public Configuration getConfiguration() {
return new ConfigurationImpl(new ConfigurationCopy(DefaultConfiguration.getInstance()));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #4816, had this method throwing an exception. Don't remember the specifics, but should make sure this functions as we expect (should it throw an exception, or return something)

@keith-turner
Copy link
Contributor

Overall, I think these changes are a good idea, however, this refactoring can very easily have bugs that are hard to spot. I'm not sure if this is the best for a 2.1 change, risking potentially introducing more bugs than the original PR set out to fix.

@kevinrr888 I also like these changes and was a bit nervous about making the changes in 2.1. Personally I would like to try to get these changes in (because they make things more consistent which will reduce bugs long term, also the current inconsistency feels really buggy) and review our test coverage around this code. While reviewing this today I found a place I would like to add a unit test for RFileScanner. More generally we may want to eventually have something like #5474 that covers the iteratorEnv in all the different scanner types (like RFileScanner,OfflineScanner,ScannerImpl,BatchScanner,ClientSideIteratorScanner). We could try to as much as possible to run the same tests across all the different scanner types. For this PR I will try to run all the unit test w/ code coverage and see what the coverage looks like for the new code.

@kevinrr888
Copy link
Member

kevinrr888 commented Apr 23, 2025

While reviewing this today I found a place I would like to add a unit test for RFileScanner. More generally we may want to eventually have something like #5474 that covers the iteratorEnv in all the different scanner types (like RFileScanner,OfflineScanner,ScannerImpl,BatchScanner,ClientSideIteratorScanner). We could try to as much as possible to run the same tests across all the different scanner types. For this PR I will try to run all the unit test w/ code coverage and see what the coverage looks like for the new code.

@keith-turner - In #4816, I changed the IteratorEnvIT to test across different scanner types. Could use that code and expand on that as needed for full test coverage. Could potentially move those changes into this PR, or change that PR to only have the changes not accomplished by this PR and merge them both in.

@keith-turner
Copy link
Contributor

@keith-turner - In #4816, I changed the IteratorEnvIT to test across different scanner types. Could use that code and expand on that as needed for full test coverage. Could potentially move those changes into this PR, or change that PR to only have the changes not accomplished by this PR and merge them both in.

After this PR is merged, seems like it would good to circle back to #4816 and pull the useful bits into another PR.

@dlmarion
Copy link
Contributor Author

I'm not sure about the change I made in 1c75b6c. With the builder pattern should we allow the user to call whatever methods they want in which-ever order they want?

@@ -79,6 +80,30 @@ public TabletIteratorEnvironment(ServerContext context, IteratorScope scope,
this.topLevelIterators = new ArrayList<>();
}

public TabletIteratorEnvironment(ServerContext context, IteratorScope scope,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is going on w/ this new constructor? Its only used in a test, what is preventing the test from using the new builder?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, I think it's because I was trying to get all of the server side code to use server-side iterator environment objects vs having all the server side tests using the new ClientIteratorEnvironment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I experimented w/ using the new code in InMemoryMapTest to create an iterenv and it worked really nicely and the test passed. The new class is a good general purpose builder for an IteratorEnv, seems fine to me to use it in the server code and remove this new constructor.

      IteratorEnvironment iterEnv =
          new ClientIteratorEnvironment.Builder().withClient(getServerContext())
              .withScope(IteratorScope.scan).withTableId(TableId.of("foo")).build();
      IteratorEnvironment iterEnvSC =
          new ClientIteratorEnvironment.Builder().withClient(getServerContext())
              .withScope(IteratorScope.scan).withTableId(TableId.of("foo")).withSamplingEnabled()
              .withSamplerConfiguration(sampleConfig.toSamplerConfiguration()).build();
      SortedKeyValueIterator<Key,Value> iter0dc1 = iter0.deepCopy(iterEnv);
      SortedKeyValueIterator<Key,Value> iter0dc2 = iter0.deepCopy(iterEnvSC);
      SortedKeyValueIterator<Key,Value> iter1dc1 = iter1.deepCopy(iterEnv);
      SortedKeyValueIterator<Key,Value> iter1dc2 = iter1.deepCopy(iterEnvSC);
      SortedKeyValueIterator<Key,Value> iter2dc1 = iter2.deepCopy(iterEnv);
      SortedKeyValueIterator<Key,Value> iter2dc2 = iter2.deepCopy(iterEnvSC);

Not a change for this PR, but maybe in the public API I we cold eventually add a static builder() method to IteratorEnvironment and use this new code as the impl. Then could write the above code like the following.

      IteratorEnvironment iterEnv =
          IteratorEnvironment.builder().withClient(getServerContext())
              .withScope(IteratorScope.scan).withTableId(TableId.of("foo")).build();

In this PR would could add a static builder() method to ClientIteratorEnvironment. Would make the code that uses the builder slightly shorter.

@keith-turner
Copy link
Contributor

I'm not sure about the change I made in 1c75b6c. With the builder pattern should we allow the user to call whatever methods they want in which-ever order they want?

That change is nice for runtime validation. Could add more runtime checks making sure things are empty when set in the builder w/ the assumption that if the same builder method is called twice it probably is not what was intended. I like the way it currently is, the optionals provide a lot of runtime validation (like for the case attempting to use something not set in the builder).

@dlmarion
Copy link
Contributor Author

That change is nice for runtime validation. Could add more runtime checks making sure things are empty when set in the builder w/ the assumption that if the same builder method is called twice it probably is not what was intended. I like the way it currently is, the optionals provide a lot of runtime validation (like for the case attempting to use something not set in the builder).

Ok, I can add more validation on the Builder

Copy link
Member

@kevinrr888 kevinrr888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Posting what I have now. Will finish review tomorrow

Comment on lines +119 to +120
private static final String TABLE_NAME = "rfileScanner";
private static final TableId TABLE_ID = TableId.of(TABLE_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I discussed in more detail here but I'm not sure about a dummy table id/name here. Both null and a dummy id have their drawbacks though, so may be up to preference.

In my opinion, a null table id makes sense since there is no table in this context while still allowing for env.getPluginEnv().getConfiguration(env.getTableId()) to work (getConfiguration just ensures the passed in value is null)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could spin this off in a separate issue since it was a preexisting pattern in the code.

private static final byte[] EMPTY_BYTES = new byte[0];
private static final Range EMPTY_RANGE = new Range();
private static final String TABLE_NAME = "rfileScanner";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be deleted if getTableName is removed

Comment on lines +96 to +101
@Override
public String getTableName(TableId tableId) throws TableNotFoundException {
Preconditions.checkArgument(tableId == TABLE_ID, "Expected " + TABLE_ID + " obtained"
+ " from IteratorEnvironment.getTableId(), but got: " + tableId);
return TABLE_NAME;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think getTableName should be supported here. There is no need for a user to call this. Should probably just throw an exception

Copy link
Member

@ctubbsii ctubbsii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of this change, overall. There is some small risk of doing this in a bugfix release, but I don't see any public API changes, or even SPI changes, that would really impact users. As long as all the ITs pass, I'm in favor of this overall. Aside from that, I don't have anything to add that others haven't already said or have already been fixed in the implementation.

@dlmarion
Copy link
Contributor Author

Full IT build passed

Copy link
Member

@kevinrr888 kevinrr888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes look good and are a lot nicer than the previous impl. Some follow ons for this may be:

  • Modified IteratorEnvironment to no longer throw UOE #5490 (comment) - I did not understand this constructor and why the builder couldn't be used, but since it's just used in tests and those pass, not too important. If the constructor is kept, could potentially add VisibleForTesting tag
  • IteratorEnvironment fixes #4816 - Should extract the useful bits from this PR and merge in separately (I can do this). Most importantly, want the IteratorEnvIT changes which will test the changes done here.
  • Look at potentially replacing the RFileScanner table id with null or something better if possible.

Comment on lines -44 to -50
@Deprecated(since = "2.0.0")
@Override
public SortedKeyValueIterator<Key,Value> reserveMapFileReader(String mapFileName)
throws IOException {
FileSystem fs = FileSystem.get(hadoopConf);
return new MapFileIterator(fs, mapFileName, hadoopConf);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something, but seems like the replacement public static final IteratorEnvironment DEFAULT = new Builder().build(); for this class will throw new UnsupportedOperationException("Feature not supported"); for this method. If DEFAULT is only used in testing this probably doesn't matter so long as everything still passes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ITs passed with this change, so I think we are safe. IIRC, Keith said above that he could not find any uses of it.

@dlmarion
Copy link
Contributor Author

@keith-turner @kevinrr888 - Thanks for all your reviews. I'm going to work on merging this in and up to main, then I'll create the follow-on issues.

@dlmarion dlmarion merged commit ec5954e into apache:2.1 Apr 24, 2025
8 checks passed
@dlmarion dlmarion deleted the fix-iter-env-iface branch April 24, 2025 17:41
ddanielr pushed a commit to keith-turner/accumulo that referenced this pull request Apr 24, 2025
Modified the IteratorEnvironment interface such that the
methods no longer have default implementation that throw
UnsupportedOperationException. Created the class
ClientIteratorEnvironment to serve as the default implementation
for the client objects that defined their own implementation
and for use in the tests. Removed all test implementations
of IteratorEnvironment.

Closes apache#4810


Co-authored-by: Kevin Rathbun <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFileScanner's IteratorEnvironment throws UnsupportedOperationException for getConfig() and getServiceEnv()
4 participants