Skip to content

IteratorEnvironment fixes #4816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 6 commits into from
Closed

Conversation

kevinrr888
Copy link
Member

@kevinrr888 kevinrr888 commented Aug 21, 2024

closes #4810

  • Fixed some inconsistencies and bugs in (non-test) implementations of IteratorEnvironment
    • Fixed RFileScanners unexpected UnsupportedOperationException when calling env.getServiceEnv() or env.getConfig() (both deprecated) or env.getPluginEnv()
    • Some impls returned a new ServiceEnvironment for each call to getServiceEnv()/getPluginEnv(). This did not align as well with the IteratorEnvIT which would assert that these two calls return the same object. Memoize a supplier to the ServiceEnvironment to avoid creating the object unless needed and to return the same object each time.
    • Seems like isUserCompaction() and isFullMajorCompaction() are supposed to be throwing an IllegalStateException if it's not a major compaction. Added missing javadoc to isUserCompaction() and changed impls that were not adhering to these docs.
  • Added tests for these environments to IteratorEnvIT

Questions:

  1. Need confirmation, but it was my interpretation that a TableId doesn't make sense in the context of RFileScanner. Is this correct? ANSWERED
  2. I have getConfiguration() throw an error in RFileScanners IteratorEnvironment since I'm not sure how we would return the system config in this context and I don't think it would even make sense here. Does this seem right? ANSWERED
  3. Made the IteratorEnvironment for RFileScanner return null and not throw an error on env.getTableId() so something like: env.getPluginEnv().getConfiguration(env.getTableId()) can be done. Thoughts on this? ANSWERED

closes apache#4810
- Fixed some inconsistencies and bugs in (non-test) implementations of `IteratorEnvironment`
	- Fixed `RFileScanner`s unexpected `UnsupportedOperationException` when calling `env.getServiceEnv()` or `env.getConfig()` (both deprecated) or `env.getPluginEnv()`
	- Changed the impls to return the same object for calls to `getServiceEnv()`/`getPluginEnv()` (expected by `IteratorEnvIT`)
	- Fixed inconsistencies in functionality of `isUserCompaction()` and `isFullMajorCompaction()` when it is not a major compaction. Added missing javadoc to `isUserCompaction()`
- Added tests for these environments to `IteratorEnvIT`
@kevinrr888 kevinrr888 self-assigned this Aug 21, 2024
Copy link
Contributor

@keith-turner keith-turner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still looking at this, posting comments I have so far. Will circle back to this later.

@ctubbsii ctubbsii added this to the 2.1.4 milestone Aug 21, 2024
- Updated RFile javadocs
	- Added description of new supported functionality for `RFile.ScannerOptions.withTableProperties()`
	- Moved `@see` tag in javadoc for `RFile.{ScannerOptions/SummaryOptions}.withTableProperties(Map<String,String>)` which was incorrect
- Changed `RFileScanner.IterEnv.getTableId()` to return a dummy table id to be used for calling `RFileScanner.IterEnv.getConfiguration(TableId)`
- Added functionality for `RFileScanner.IterEnv.instantiate()`
- Added error msg to invalid uses of `RFileScanner.IterEnv.getServiceEnv()/getPluginEnv()` methods
- Minor RFileScanner changes
- Refactored IteratorEnvIT to ensure expected assertions are executed
Comment on lines 143 to 144
* True if compaction was user initiated. Will throw IllegalStateException if
* {@link #getIteratorScope()} != {@link IteratorScope#majc}.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* True if compaction was user initiated. Will throw IllegalStateException if
* {@link #getIteratorScope()} != {@link IteratorScope#majc}.
* @return True if compaction was user initiated, false otherwise.
* @throws IllegalStateException if {@link #getIteratorScope()} != {@link IteratorScope#majc}

Comment on lines 144 to 145
public PluginEnvironment getPluginEnv() {
return new ClientServiceEnvironmentImpl(context.get());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the method that should be implemented. The getServiceEnv() method only existed for backwards compatibility. The code should not rely on the interface implementation of getPluginEnv() calling the getServiceEnv() here, because that behavior is removed when the deprecated code is dropped. Instead, it's best to keep both methods, as before your change, so that when the deprecated one is dropped in subsequent versions, the correct implementation is kept.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this was something I was unsure about when implementing this. Some IteratorEnvironments had both getServiceEnv() and getPluginEnv() (e.g., OfflineIterator) while some had just getServiceEnv() (e.g., TabletIteratorEnvironment), and I wasn't sure which to go with. I'll fix the impls to use both.

@@ -311,14 +319,27 @@ public void updateScanIteratorOption(String iteratorName, String key, String val
}

private class IterEnv implements IteratorEnvironment {
private final Supplier<ServiceEnvironment> serviceEnvironment;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents us from just using the implementation from super.getPluginEnv() and super.getServiceEnv()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IteratorEnvironment doesn't have any implementation for getServiceEnv() or getPluginEnv(). Or are you suggesting changing IteratorEnvironment?

Comment on lines +63 to +64
* @return true if the compaction is a full major compaction; false otherwise
* @throws IllegalStateException if {@link #getIteratorScope()} != {@link IteratorScope#majc}.
Copy link
Member

@ctubbsii ctubbsii Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would have preferred fixing the documentation to drop the exception case, rather than forcing an exception to be thrown in all the scan cases. false otherwise should be sufficient. Have you considered that as an alternative?

Copy link
Member Author

@kevinrr888 kevinrr888 Sep 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't looked into why an exception was thrown, I had just kept the functionality the same and consistent across impls. I assumed there was probably a reason for wanting all impls to throw an exception and didn't look any further.

Are you suggesting to drop the documentation about the exception case and revert the changes I made to adhere to the exception case, or are you suggesting changing the functionality to no longer throw the exception, or something else?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting just having the boolean methods return true or false... rather than throw an exception at all in some subset of "false" cases.

@ctubbsii
Copy link
Member

After a conversation with @kevinrr888, I added commit 5924b02 to incorporate some of the changes we discussed, that he intends to build on and polish up. The main points included were:

  • Push a reasonable getConfig() implementation up into IteratorEnvironment's default method (depends on the ability to have getTableId() return a null value)
  • Include concrete implementations of getPluginEnv() instead of just getServiceEnv() to make it easier to just drop the deprecated one when merging to newer branches
  • Consider using a Proxy instance of TableId instead of a real instance, so it isn't possible to be confused with any similar user-constructed TableId (may not work)
  • Update javadocs in RFile to make it clearer how properties set on the scanner builder (such as iterators) will be available to client-side iterators from the plugin environment

The main thing left unresolved after our conversation was how to fix the mismatch between the implementations and the javadoc for all the methods that inconsistently return null or throw IllegalStateException or UnsupportedOperationException, when calling a method that doesn't apply to a given situation (like asking for a table ID or table name when operating directly on RFiles or asking if it's a user-initiated compaction when it's a scan). Nearly every caller seems to have a unique set a requirements/implementation, and it's a bit hard to home in on the best implementation, especially for the default methods, and make sure the javadoc matches the behavior.

- Changed ClientSideIteratorScanner.getConfig() to use the safer getPluginEnv()
- Added getPluginEnv() impl to all IteratorEnvironments (besides test envs)
- Improved RFile.ScannerOptions.withTableProperties() javadoc to better explain how iterators will have access to these properties
- Opted for RFileScanner.IterEnv.getTableId() to return null instead of a dummy id
- Removed incorrect impl of RFileScanner.IterEnv.get(Service/Plugin)Env().getTableName() to instead just throw UnsupOpExc since the method doesn't make sense anyways.

Co-authored-by: Christopher Tubbs <[email protected]>
@kevinrr888
Copy link
Member Author

kevinrr888 commented Sep 12, 2024

Some comments about decisions in a68ec11

Comment about IteratorEnvironment.getConfig():

TLDR: opted for not having a default impl for IteratorEnvironment.getConfig() (keep default as throwing UOE)

It might be strange to have a default impl for this method but not others... If we do want a default impl for this that isnt just throwing a UOE, we should probably change the others too... But that should be done in a follow on if that is desired. Keeping the default as throwing a UOE keeps it consistent with the other default impls/more inline with the original approach (throw UOE by default, concrete impls should have an actual impl). The only IteratorEnvironment which would be using the default getConfig() would be ClientSideIteratorScanner. But we can just add the impl there (done in latest commit). It is also hard to know what would be a good default impl since its not exactly clear how getConfig() was originally working.

Comment about RFileScanner.IterEnv.getTableId()

TLDR: Opted for having RFileScanner.IterEnv.getTableId() return null

Proxy can only be used to create proxies for interfaces, not concrete classes, so can't be used for TableId. Opting to just return null with added javadoc.

Background: we want to support getConfiguration(TableId) since this is the method defined as returning the props set on the table and getConfiguration() should not be returning table props, so we cant substitute that method. So, something needs to be passed, but it doesn't matter what since it wont be used. The way getConfiguration(TableId) will be called is env.getPluginEnv().getConfiguration(env.getTableId()) (from javadocs for IteratorEnvironment.getPluginEnv()) So, we should have env.getTableId() returning something.

No solutions seem great, but Proxy is a no-go, a real dummy id may be confusing (e.g., a user may think that it is a real table id), null may lead to NPE. I figure null makes the most sense here. A NPE would only occur if the user is trying to do something with the table id which they shouldnt be doing anyways. It is also now noted in the javadocs that null will be returned so if a NPE does occur, it should be easy to see why.

Comment about UOE/ISE

Did not change anything further than I had already changed in my earlier commits relating to when/how UOE or ISE are thrown, how its documented etc. Some future considerations would be

  • IteratorEnvironment.getAuthorizations() currently documented as throwing UnsupportedOperationException if scope != scan (should probably be ISE if anything).
  • IteratorEnvironment.isFullMajorCompaction() and isUserCompaction() should maybe throw ISE in the default if the scope is majc
    • Maybe these shouldn't be throwing exceptions at all and only returning false... This would probably be the cleanest if possible. Would have to look more into why these are throwing exceptions in the first place.

Comment about RFileScanners createServiceEnv()

@ctubbsii suggested having this be an inner class instead of anon class. Leaving this comment as a reminder. Will do once the other more important changes are looked at; want to avoid confusing diff between commits for now.

@kevinrr888
Copy link
Member Author

@ctubbsii - Could you give this another look over when you get the chance? Would be good to get this merged in.

@ctubbsii
Copy link
Member

ctubbsii commented Mar 5, 2025

@ctubbsii - Could you give this another look over when you get the chance? Would be good to get this merged in.

It's been long enough that I'm probably going to have to look at this from scratch... I don't remember all the nuances that led to my comments on my earlier reviews. I will add this to my list of items to set aside and review ASAP.

@kevinrr888
Copy link
Member Author

@ctubbsii
Just re-pinging in case this fell off your radar. This isn't an urgent PR, but may be good to get in before 2.1.4

@ddanielr ddanielr added the blocker This issue blocks any release version labeled on it. label Apr 4, 2025
@dlmarion
Copy link
Contributor

I'm coming late to this conversation, and I have not read all of the prior comments. I don't understand the design of the IteratorEnvironment interface, such that all of the methods have default implementations that throw an UnsupportedOperationException. If this has been addressed elsewhere, please point me to it and I'm happy to read through it.

This type of interface design pattern enables implementations to not have to handle all of the methods, which might make it easier on the developer. But it makes it easier to run into a runtime exception instead of a compilation error. I think I'd rather see the fix for this remove the default exception throwing method implementations on the interface. I'm happy to start a different PR for this if that's out of scope for this.

@kevinrr888
Copy link
Member Author

@dlmarion - This was preexisting functionality added in #955, and may be explained there. I think I went through this PR a while ago, but don't remember anything about it. Skimming through some of the old comments in this PR, it looks like I address some of the difficulties with this current impl, as you mentioned.

Comment about IteratorEnvironment.getConfig():
TLDR: opted for not having a default impl for IteratorEnvironment.getConfig() (keep default as throwing UOE)
It might be strange to have a default impl for this method but not others... If we do want a default impl for this that isnt just throwing a UOE, we should probably change the others too... But that should be done in a follow on if that is desired. Keeping the default as throwing a UOE keeps it consistent with the other default impls/more inline with the original approach (throw UOE by default, concrete impls should have an actual impl). The only IteratorEnvironment which would be using the default getConfig() would be ClientSideIteratorScanner. But we can just add the impl there (done in latest commit). It is also hard to know what would be a good default impl since its not exactly clear how getConfig() was originally working.

So seems like changing IteratorEnvironment completely in this PR might be a scope creep (at least that's I thought 7 months ago)

@kevinrr888
Copy link
Member Author

@dlmarion - While I don't remember much about this, your suggestion sounds like a good change to me, as the current design doesn't seem great to me either.

If we want an UOE for some of the ops, we can define that in the concrete impls. This forces the developer to consider all of the methods individually and prevents the dev from forgetting to add certain functionality.

@dlmarion
Copy link
Contributor

@dlmarion - While I don't remember much about this, your suggestion sounds like a good change to me, as the current design doesn't seem great to me either.

If we want an UOE for some of the ops, we can define that in the concrete impls. This forces the developer to consider all of the methods individually and prevents the dev from forgetting to add certain functionality.

I have started looking at this change locally to see what it looks like.

@dlmarion
Copy link
Contributor

I have created #5490 as an alternate solution for #4810.


@Override
public Configuration getConfiguration(TableId tableId) {
Preconditions.checkArgument(tableId == getTableId(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getTableId above only returns null.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the intention was that the only valid use of this method is:
env.getPluginEnv().getConfiguration(env.getTableId())
(as described in javadocs for getPluginEnv())
env.getPluginEnv().getConfiguration(null)
would also work, and this still makes sense in the context of RFileScanner

Just wanted to avoid accepting something like
env.getPluginEnv().getConfiguration(someRandomTableId)

@kevinrr888
Copy link
Member Author

#5490 closed #4810
Closing this PR; will open a new PR with changes we still want from this PR

@kevinrr888 kevinrr888 closed this Apr 25, 2025
@ctubbsii ctubbsii removed this from the 2.1.4 milestone May 14, 2025
@kevinrr888 kevinrr888 deleted the 2.1-feature-4810 branch May 20, 2025 14:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocker This issue blocks any release version labeled on it.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFileScanner's IteratorEnvironment throws UnsupportedOperationException for getConfig() and getServiceEnv()
5 participants