-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#6487] improvement(hadoop-catalog): Close FileSystem instance manually. #6488
base: main
Are you sure you want to change the base?
Conversation
@@ -257,11 +257,10 @@ public Fileset createFileset( | |||
? new Path(storageLocation) | |||
: new Path(schemaPath, ident.name()); | |||
|
|||
try { | |||
try (FileSystem fs = getFileSystem(filesetPath, conf)) { | |||
// formalize the path to avoid path without scheme, uri, authority, etc. | |||
filesetPath = formalizePath(filesetPath, conf); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we move this line before the try block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need, L262 did change the value of filesetPath
after FileSystem
is initialized.
return path.makeQualified(defaultFs.getUri(), defaultFs.getWorkingDirectory()); | ||
try (FileSystem defaultFs = getFileSystem(path, configuration)) { | ||
return path.makeQualified(defaultFs.getUri(), defaultFs.getWorkingDirectory()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why wrap this into a try block?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just want to use the try-with-resource
mechanism to close FileSystem
automatically.
My concern is that creating and closing the fs each time will introduce unnecessary overheads. I remembered that 1: Hadoop FS has cache mechanism. 2: If we don't leverage Hadoop's cache, shall we maintain it on our own, so that we can close them when the catalog is closed? No need to close the fs each time. What do you think? |
Since we would support FileSystem credentials, each different fileset(name identifier) will have a different filesystem instance.
In the GVFS, we do have a cache to manage file system instances and the key is For the Gravitino server side, there is no cache, so I'm inclined to close it to free up memory quickly. |
I was thinking of build a similar cache on the server side to cache the fileset/fs. Creating/closing FS each time seems very costly, can we:
|
Sure, let me take some tests and then figure out how to optimize it later. |
What changes were proposed in this pull request?
Close the Hadoop FileSystem instance manually to free memory.
Why are the changes needed?
According to the performance test, a possible memory leak exists for Hadoop Filesystem instances.
Fix: #6487
Does this PR introduce any user-facing change?
N/A.
How was this patch tested?
Existing UTs can cover the tests.