Skip to content

Conversation

@bwalsh
Copy link
Collaborator

@bwalsh bwalsh commented Oct 9, 2025

Pull Request Overview

This PR adds a new PathAggregation feature that provides file count and size statistics grouped by directory paths from DocumentReference data.

  • Introduces a new path aggregation module with utilities for normalizing paths and generating directory prefixes
  • Adds a path_aggregation method to the dataframer that aggregates DocumentReference files by path
  • Extends the CLI to support the new PathAggregation data type

Reviewed Changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
gen3_tracker/meta/path_aggregation.py New module with path normalization utilities and global aggregator state
gen3_tracker/meta/dataframer.py Adds path_aggregation method and registers PathAggregation data type
gen3_tracker/meta/cli.py Extends CLI choices to include PathAggregation option
tests/unit/dataframer/test_dataframer.py Adds comprehensive test for path aggregation functionality

Motivation and Context

Provides path hierarchy

How Has This Been Tested?

See unit test

Types of Changes

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have updated the documentation accordingly (link here).
  • I have tested that this feature locally.
  • I have added tests to cover my changes.
  • All new and existing tests passed.
  • Reviewer has tested this feature locally

@bwalsh bwalsh requested a review from Copilot October 9, 2025 18:47

This comment was marked as resolved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants