-
Notifications
You must be signed in to change notification settings - Fork 5.5k
Description
Context
By design temporaryRoot = targetPath + temporaryPrefix , and temporaryPrefix`s default value is "/tmp/presto-${USER}"
public final class HiveWriteUtils |
// create a temporary directory on the same filesystem
Path temporaryRoot = new Path(targetPath, temporaryPrefix);
However, temporaryRoot will always equal to temporaryPrefix, when temporaryPrefix start with "/" like defaultValue above, see:
java.net.URI
String cp = (child.path == null) ? "" : child.path;
if (!cp.isEmpty() && cp.charAt(0) == '/') {
// 5.2 (5): Child path is absolute
ru.path = child.path;
} else {
// 5.2 (6): Resolve relative path
ru.path = resolvePath(base.path, cp, base.isAbsolute());
}
targetPath is not used here.
When I tried to fix this problem, I found that for the cats scenario of non-partitioned tables, the temporary directory must not be placed under the target table path, otherwise the newly written data will be deleted when the directory is renamed. see :
Line 119 in 393797c
public class SemiTransactionalHiveMetastore |
if (table.getPartitionColumns().isEmpty() && currentPath.isPresent() && !targetPath.equals(currentPath.get())) {
// CREATE TABLE AS SELECT unpartitioned table with staging directory
renameDirectory(
context,
hdfsEnvironment,
currentPath.get(),
targetPath,
() -> cleanUpTasksForAbort.add(new DirectoryCleanUpTask(context, targetPath, true)));
}
currentPath should not be child of targetPath here.
This problem seems to be a combination of wrong steps plus wrong steps to get the right result. ^_^
Metadata
Metadata
Assignees
Labels
Type
Projects
Status