Skip to content

Strange behavior of presto-hive`s temporary directories #26293

@skadilover

Description

@skadilover

Context

By design temporaryRoot = targetPath + temporaryPrefix , and temporaryPrefix`s default value is "/tmp/presto-${USER}"

    // create a temporary directory on the same filesystem
    Path temporaryRoot = new Path(targetPath, temporaryPrefix);

However, temporaryRoot will always equal to temporaryPrefix, when temporaryPrefix start with "/" like defaultValue above, see:
java.net.URI
String cp = (child.path == null) ? "" : child.path;
if (!cp.isEmpty() && cp.charAt(0) == '/') {
// 5.2 (5): Child path is absolute
ru.path = child.path;
} else {
// 5.2 (6): Resolve relative path
ru.path = resolvePath(base.path, cp, base.isAbsolute());
}
targetPath is not used here.

When I tried to fix this problem, I found that for the cats scenario of non-partitioned tables, the temporary directory must not be placed under the target table path, otherwise the newly written data will be deleted when the directory is renamed. see :

            if (table.getPartitionColumns().isEmpty() && currentPath.isPresent() && !targetPath.equals(currentPath.get())) {
                // CREATE TABLE AS SELECT unpartitioned table with staging directory
                renameDirectory(
                        context,
                        hdfsEnvironment,
                        currentPath.get(),
                        targetPath,
                        () -> cleanUpTasksForAbort.add(new DirectoryCleanUpTask(context, targetPath, true)));
            }

currentPath should not be child of targetPath here.

This problem seems to be a combination of wrong steps plus wrong steps to get the right result. ^_^

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    🆕 Unprioritized

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions