Skip to content

okatsn/HiveStructurePaths.jl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HiveStructurePaths

Stable Dev Build Status Coverage

HiveStructurePaths provides utilities for working with Hive-style partitioned file hierarchies, where data is organized using key=value directory structures.

Purpose

When managing datasets partitioned across multiple dimensions (e.g., criterion=depth/partition=1/k=10/data.arrow), HiveStructurePaths helps you:

  • Parse paths to extract partition metadata
  • Build paths with consistent hierarchical ordering
  • Find all files matching a specific schema

Each HiveSchema defines one target filename and the hierarchical structure of its enclosing directories.

Example

using HiveStructurePaths

# Define the schema
schema = HiveSchema(
    parsers = Dict{String, Function}(
        "criterion" => identity,
        "partition" => x -> parse(Int, x),
        "k"         => x -> parse(Int, x)
    ),
    order = ["criterion", "partition", "k"],
    filename = "data.arrow"
)

# Build paths
path = build_hive_path(schema, "results"; criterion="depth", partition=2, k=5)
# → "results/criterion=depth/partition=2/k=5/data.arrow"

# Parse paths
parsed = parse_hive_path(schema, path; required_keys=["criterion", "partition"])
# → (criterion="depth", partition=2, k=5)

# Find all matching files
files = find_hive_files(schema, "results"; validate_keys=["criterion"])
# → ["results/criterion=depth/partition=1/k=3/data.arrow",
#    "results/criterion=depth/partition=2/k=5/data.arrow", ...]

See the docstrings for detailed API documentation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages