Skip to content

Removed table name and id from RFileScanner. #5512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 28, 2025

Conversation

dlmarion
Copy link
Contributor

Closes #5505

@dlmarion dlmarion added this to the 2.1.4 milestone Apr 28, 2025
@dlmarion dlmarion self-assigned this Apr 28, 2025
Comment on lines 312 to 306
.withScope(IteratorScope.scan).withTableId(TABLE_ID);
.withScope(IteratorScope.scan);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would still need the withTableId here for the calls env.getPluginEnv().getConfiguration(env.getTableId()) (from javadocs for IteratorEnvironment.getPluginEnv()) so env.getTableId() doesn't throw an exception

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RFileScannerEnvironmentImpl overrides ClientServiceEnvironmentImpl.getConfiguration(TableId) and returns the tableConf regardless of the TableId.

@dlmarion
Copy link
Contributor Author

Looking at the 2.1.3 version of RFileScanner, it had an internal class IterEnv which did not override IteratorEnvironment.getTableId, so if it was called it would throw an UnsupportedOperationException.

@kevinrr888
Copy link
Member

Yes, that was the reason for #4810 and which was fixed with #5490

@kevinrr888
Copy link
Member

kevinrr888 commented Apr 28, 2025

So, previously the iter env for rfilescanner did not have getServiceEnv(), getPluginEnv(), or getTableId() implemented. We need all three of these methods to support the uses:

(quoting from my comment #4810 (comment))
The equivalent to
env.getServiceEnv() <-- deprecated in 2.x, but still need in 2.x
would now be
env.getPluginEnv()

and the equivalent to
env.getConfig() <-- deprecated in 2.x, but still need in 2.x
would now be
env.getPluginEnv().getConfiguration(env.getTableId())

@dlmarion
Copy link
Contributor Author

I modified ClientIteratorEnvironment in f857cb0 to support setting tableId to null. This should avoid throwing an exception and allow RFileScannerEnvironmentImpl.getConfiguration(TableId) to return the table conf.

return tableConf;
}

}

private static final byte[] EMPTY_BYTES = new byte[0];
private static final Range EMPTY_RANGE = new Range();
private static final String TABLE_NAME = "rfileScanner";
private static final TableId TABLE_ID = TableId.of(TABLE_NAME);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could have TABLE_ID = null and check that argument for getConfiguration and getTableName is null/TABLE_ID to avoid accepting random values. Argument should be env.getTableId()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RFileScanner is client utility where the user supplies the set of input files, configuration, columns they want to fetch, etc. I don't know that there is utility in validating the table id when this is being used outside of a table context.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is a weird case where we need to pretend there is a table id when there isn't. I'm fine also not validating it, my idea was just that we should expect env.getTableId() as the way of getting the table id when it is needed instead of accepting any table id

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does validating the table id and throwing an exception change the behavior of the code between the 2.1.3 and 2.1.4 releases? Are we going to introduce an error into the clients use of our utility by upgrading?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answering my own question - Looking at the 2.1.3 RFileScanner code, IterEnv doesn't override getServiceEnv, getPluginEnv or getTableId so any usage of those methods would throw an UnsupportedOperationException. We could add the validation, or not, it doesn't matter as the methods aren't called in RFileScanner.

@dlmarion dlmarion merged commit 67b4e4f into apache:2.1 Apr 28, 2025
8 checks passed
@dlmarion dlmarion deleted the 5505-remove-table-rfile-scanner branch April 28, 2025 18:25
@dlmarion dlmarion linked an issue Apr 28, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RFileScanner tableId and tableName
2 participants