feat(uses): add glob pattern support for repository matching#1559
feat(uses): add glob pattern support for repository matching#1559ryandelano wants to merge 7 commits intozizmorcore:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for glob pattern matching in repository uses: clauses, enabling more flexible policy configuration for audits like unpinned-uses and forbidden-uses. The implementation supports single-wildcard patterns (e.g., foo-*, *-bar, foo-*-bar) in both repository and subpath segments, with proper specificity ordering to ensure exact patterns take precedence over globs.
Changes:
- Introduced
SegmentandParsedSegmentenums to handle exact matches and single-wildcard glob patterns - Updated
RepositoryUsesPatternto support glob patterns in repo and subpath fields - Implemented case-insensitive glob matching optimized to avoid string allocations
- Added comprehensive test coverage for glob pattern parsing, matching, and ordering
- Updated documentation with detailed examples of supported glob patterns
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
docs/configuration.md |
Documents glob pattern syntax and provides examples for all supported pattern types |
crates/zizmor/src/models/uses.rs |
Core implementation of glob pattern matching with Segment/ParsedSegment enums, parsing logic, and matching algorithms |
crates/zizmor/tests/integration/test-data/unpinned-uses/configs/invalid-policy-syntax-4.yml |
Updated test case to verify multi-wildcard rejection (changed from b*r to b*r*) |
crates/zizmor/tests/integration/test-data/forbidden-uses/configs/allow-glob.yml |
New test config demonstrating glob patterns in allow lists |
crates/zizmor/tests/integration/test-data/forbidden-uses/configs/deny-glob.yml |
New test config demonstrating glob patterns in deny lists |
crates/zizmor/tests/integration/audit/unpinned_uses.rs |
Updated error message test for invalid multi-wildcard patterns |
crates/zizmor/tests/integration/audit/forbidden_uses.rs |
Added integration tests for glob patterns in allow/deny lists |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
woodruffw
left a comment
There was a problem hiding this comment.
Thanks @ryandelano! This looks pretty good to me overall, I've left mostly questions and nitpicks in this round of review 🙂
crates/zizmor/src/models/uses.rs
Outdated
| /// A segment that can be either an exact match or a glob pattern. | ||
| /// | ||
| /// This is used for repo and subpath matching in [`RepositoryUsesPattern`]. | ||
| #[derive(Clone, Debug, Eq, PartialEq, Hash)] | ||
| pub(crate) enum Segment { | ||
| /// An exact literal match (e.g., "checkout", "foo/bar") | ||
| Exact(String), | ||
| /// A glob pattern with a single `*` (e.g., "foo-*", "*-bar") | ||
| Glob { | ||
| /// The literal text before the `*` | ||
| prefix: String, | ||
| /// The literal text after the `*` | ||
| suffix: String, | ||
| }, | ||
| } | ||
|
|
||
| /// Result of parsing a segment string, including the special `*` case. | ||
| /// | ||
| /// This is used during pattern parsing to distinguish between: | ||
| /// - `Star`: the full wildcard `*` (used for `owner/*` or `owner/repo/*`) | ||
| /// - `Segment`: an exact match or glob pattern | ||
| /// - Parse failure (multiple wildcards) | ||
| #[derive(Debug, Clone, PartialEq)] | ||
| pub(crate) enum ParsedSegment { | ||
| /// Just `*` - matches anything in this position | ||
| Star, | ||
| /// A concrete segment (exact or glob) | ||
| Segment(Segment), | ||
| } |
There was a problem hiding this comment.
Q: can we unify these types? i.e. just one enum Segment with {Star, Glob, Exact}? I think that would be slightly easier to follow here.
There was a problem hiding this comment.
We could, my reasoning was that Star acts purely a signal for which variant we're constructing given that it's never actually stored in a RepositoryUsesPattern. Unifying them would mean handling Star in the matching phase, which we could pretty easily do with an assertion that Segment::Star never appears after the construction phase. I didn't put it in the newest batch of changes, but does that approach sound better?
crates/zizmor/src/models/uses.rs
Outdated
| // Create a dummy segment for comparison purposes | ||
| let no_segment = Segment::Exact(String::new()); |
There was a problem hiding this comment.
Making sure I understand: is the idea here that not all patterns have a relevant segment, so we're putting a dummy one in to make Ord and PartialOrd work below?
If so, I think we could maybe make this clearer by having this return (u8, Option<Segment>, Option<Segment>) and handling the optional segment twice.
docs/configuration.md
Outdated
| `*` matches `#!yaml uses: actions/checkout` and | ||
| `#!yaml uses: pypa/gh-action-pypi-publish@release/v1`. | ||
|
|
||
| #### Glob patterns |
There was a problem hiding this comment.
Two nits here:
- I think we should probably call these something other than "glob patterns" -- that term implies (IMO) full compatibility with
fnmatch(2)syntax, when really what we're offering here is a very constrained wildcard match. Maybe we don't call them anything specific at all, and just describe them inline with the rest of the patterns? I'm not sure. - I think it'd be nice to unify this section more fully with the "Repository patterns" section right above it -- right now a user who reads this is going to go through the "Repository patterns" section, and then have to correct/adjust their understanding of how patterns work when they reach this lower section.
There was a problem hiding this comment.
Makes sense. They're inline with the rest of the patterns now, so you can see how you like the flow.
I also went ahead and renamed everything related to Glob in the code/comments to Wildcard to reduce confusion and make sure things match between docs and code.
crates/zizmor/src/models/uses.rs
Outdated
| /// Returns the "specificity" of this segment for ordering purposes. | ||
| /// Lower values are more specific. | ||
| /// Exact matches are more specific than globs. | ||
| /// For globs, longer prefix+suffix means more specific. | ||
| fn specificity(&self) -> (u8, usize) { |
There was a problem hiding this comment.
Q: can we put this inline in the Ord implementation? Unless I'm missing something, I don't think having it in its own function does a ton for us.
(Now that I think about it I guess there's a benefit to being able to declare the ordering "key" distinct from Ord/without referencing two instances, so if that's the rationale that seems fine to me 🙂)
There was a problem hiding this comment.
That benefit is exactly what I was going for haha, just debugging/separation of concerns. Can inline as well if you'd prefer
crates/zizmor/src/models/uses.rs
Outdated
| // Exact is most specific (0), with no length consideration | ||
| Segment::Exact(_) => (0, 0), | ||
| // Glob is less specific (1), but longer literals are more specific (inverted) | ||
| Segment::Glob { prefix, suffix } => (1, usize::MAX - (prefix.len() + suffix.len())), |
There was a problem hiding this comment.
Can't this be prefix.len() + suffix.len()? Why do we have to begin at size:::MAX and subtract from there?
There was a problem hiding this comment.
The goal was to make sure that longer prefix/suffix literals should be considered "more specific" (and therefore be sorted earlier), but looking back over this, using Reverse for that makes that goal much clearer, so I refactored it.
|
Responded to everything and resolved the comments changes, keep me posted on anything else :) edit: thought I re-requested the review when I commented this and apparently didn't, sorry!! |
Thanks, and no worries! I'm going to try and set aside some time to review this over the weekend. |
|
Sorry for the delay here, I'm doing another review tonight. |
Add support for glob wildcards (e.g., `foo-*`, `*-bar`, `foo-*-bar`) in repository and subpath pattern matching for `uses:` clauses. This allows more flexible policy configuration for the `unpinned-uses` audit. Changes: - Add `Segment` enum to represent exact or glob pattern matches - Add `ParsedSegment` enum for parsing patterns with `*` wildcards - Update `RepositoryUsesPattern` to use `Segment` for repo/subpath fields - Implement specificity ordering so exact patterns take precedence over globs - Optimize case-insensitive matching to avoid string allocations - Add comprehensive tests for glob pattern parsing and matching - Document glob patterns in configuration.md
27448c4 to
30dcc65
Compare
Signed-off-by: William Woodruff <william@yossarian.net>
Pre-submission checks
Please check these boxes:
Mandatory: This PR corresponds to an issue (if not, please create
one first) - discussed with @woodruffw on discord
I hereby disclose the use of an LLM or other AI coding assistant in the
creation of this PR. PRs will not be rejected for using AI tools, but
will be rejected for undisclosed use. - code review, slight optimization regarding string allocation
Summary
Add support for glob wildcards (e.g.,
foo-*,*-bar,foo-*-bar) in repository and subpath pattern matching foruses:clauses. This allows more flexible policy configuration for any audit that uses repository patterns, such asunpinned-usesandforbidden-uses. Used intuition for each of the possible cases and how they should behave, so I'm happy to adjust based on other perspectives!Changes:
Segmentenum to represent exact or glob pattern matchesParsedSegmentenum for parsing patterns with*wildcardsRepositoryUsesPatternto useSegmentfor repo/subpath fieldsNew Patterns Supported
owner/prefix-*- match repos starting withprefix-owner/*-suffix- match repos ending with-suffixowner/prefix-*-suffix- match repos with both prefix and suffixowner/prefix-*/*- match repos with prefix, any subpathowner/repo/subpath-*- match subpaths with globowner/prefix-*/subpath-*Example Configuration
unpinned-uses:
forbidden-uses:
Implementation
Segmentenum handles exact vs glob matchingTest Plan
SegmentandParsedSegmentparsingunpinned-useswith invalid multi-wildcard patterns andforbidden-useswith glob patterns in allow/deny lists