HiveStructurePaths provides utilities for working with Hive-style partitioned file hierarchies, where data is organized using key=value directory structures.
When managing datasets partitioned across multiple dimensions (e.g., criterion=depth/partition=1/k=10/data.arrow), HiveStructurePaths helps you:
- Parse paths to extract partition metadata
- Build paths with consistent hierarchical ordering
- Find all files matching a specific schema
Each HiveSchema defines one target filename and the hierarchical structure of its enclosing directories.
using HiveStructurePaths
# Define the schema
schema = HiveSchema(
parsers = Dict{String, Function}(
"criterion" => identity,
"partition" => x -> parse(Int, x),
"k" => x -> parse(Int, x)
),
order = ["criterion", "partition", "k"],
filename = "data.arrow"
)
# Build paths
path = build_hive_path(schema, "results"; criterion="depth", partition=2, k=5)
# → "results/criterion=depth/partition=2/k=5/data.arrow"
# Parse paths
parsed = parse_hive_path(schema, path; required_keys=["criterion", "partition"])
# → (criterion="depth", partition=2, k=5)
# Find all matching files
files = find_hive_files(schema, "results"; validate_keys=["criterion"])
# → ["results/criterion=depth/partition=1/k=3/data.arrow",
# "results/criterion=depth/partition=2/k=5/data.arrow", ...]See the docstrings for detailed API documentation.