Skip to content

feat(uses): add glob pattern support for repository matching#1559

Open
ryandelano wants to merge 7 commits intozizmorcore:mainfrom
ryandelano:feat/wildcard-repo-patterns
Open

feat(uses): add glob pattern support for repository matching#1559
ryandelano wants to merge 7 commits intozizmorcore:mainfrom
ryandelano:feat/wildcard-repo-patterns

Conversation

@ryandelano
Copy link

Pre-submission checks

Please check these boxes:

  • Mandatory: This PR corresponds to an issue (if not, please create
    one first) - discussed with @woodruffw on discord

  • I hereby disclose the use of an LLM or other AI coding assistant in the
    creation of this PR. PRs will not be rejected for using AI tools, but
    will be rejected for undisclosed use. - code review, slight optimization regarding string allocation

Summary

Add support for glob wildcards (e.g., foo-*, *-bar, foo-*-bar) in repository and subpath pattern matching for uses: clauses. This allows more flexible policy configuration for any audit that uses repository patterns, such as unpinned-uses and forbidden-uses. Used intuition for each of the possible cases and how they should behave, so I'm happy to adjust based on other perspectives!

Changes:

  • Add Segment enum to represent exact or glob pattern matches
  • Add ParsedSegment enum for parsing patterns with * wildcards
  • Update RepositoryUsesPattern to use Segment for repo/subpath fields
  • Implement specificity ordering so exact patterns take precedence over globs
  • Optimize case-insensitive matching to avoid string allocations
  • Add comprehensive tests for glob pattern parsing and matching
  • Document glob patterns in configuration.md

New Patterns Supported

  • owner/prefix-* - match repos starting with prefix-
  • owner/*-suffix - match repos ending with -suffix
  • owner/prefix-*-suffix - match repos with both prefix and suffix
  • owner/prefix-*/* - match repos with prefix, any subpath
  • owner/repo/subpath-* - match subpaths with glob
  • Combined: owner/prefix-*/subpath-*

Example Configuration

unpinned-uses:

rules:
  unpinned-uses:
    config:
      policies:
        "github/codeql-*/*": "hash-pin"  # All codeql actions
        "myorg/action-*": "ref-pin"       # All myorg action-* repos

forbidden-uses:

rules:
  forbidden-uses:
    config:
      allow:
        - "actions/*"           # Allow all actions org repos
        - "myorg/trusted-*"     # Allow myorg repos starting with "trusted-"
      # OR
      deny:
        - "untrusted-org/*-exploit"  # Deny repos ending with "-exploit"

Implementation

  • New Segment enum handles exact vs glob matching
  • Glob patterns with single * are supported while multiple wildcards per segment are rejected
  • Specificity ordering ensures exact patterns take precedence over globs
  • Case-insensitive matching optimized to avoid heap allocations

Test Plan

  • Added unit tests for Segment and ParsedSegment parsing
  • Added pattern matching tests covering all glob variants
  • Added ordering tests to verify specificity
  • Added integration tests for unpinned-uses with invalid multi-wildcard patterns and forbidden-uses with glob patterns in allow/deny lists
  • All existing tests continue to pass

Copilot AI review requested due to automatic review settings January 22, 2026 20:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for glob pattern matching in repository uses: clauses, enabling more flexible policy configuration for audits like unpinned-uses and forbidden-uses. The implementation supports single-wildcard patterns (e.g., foo-*, *-bar, foo-*-bar) in both repository and subpath segments, with proper specificity ordering to ensure exact patterns take precedence over globs.

Changes:

  • Introduced Segment and ParsedSegment enums to handle exact matches and single-wildcard glob patterns
  • Updated RepositoryUsesPattern to support glob patterns in repo and subpath fields
  • Implemented case-insensitive glob matching optimized to avoid string allocations
  • Added comprehensive test coverage for glob pattern parsing, matching, and ordering
  • Updated documentation with detailed examples of supported glob patterns

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
docs/configuration.md Documents glob pattern syntax and provides examples for all supported pattern types
crates/zizmor/src/models/uses.rs Core implementation of glob pattern matching with Segment/ParsedSegment enums, parsing logic, and matching algorithms
crates/zizmor/tests/integration/test-data/unpinned-uses/configs/invalid-policy-syntax-4.yml Updated test case to verify multi-wildcard rejection (changed from b*r to b*r*)
crates/zizmor/tests/integration/test-data/forbidden-uses/configs/allow-glob.yml New test config demonstrating glob patterns in allow lists
crates/zizmor/tests/integration/test-data/forbidden-uses/configs/deny-glob.yml New test config demonstrating glob patterns in deny lists
crates/zizmor/tests/integration/audit/unpinned_uses.rs Updated error message test for invalid multi-wildcard patterns
crates/zizmor/tests/integration/audit/forbidden_uses.rs Added integration tests for glob patterns in allow/deny lists

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@woodruffw woodruffw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @ryandelano! This looks pretty good to me overall, I've left mostly questions and nitpicks in this round of review 🙂

Comment on lines 46 to 74
/// A segment that can be either an exact match or a glob pattern.
///
/// This is used for repo and subpath matching in [`RepositoryUsesPattern`].
#[derive(Clone, Debug, Eq, PartialEq, Hash)]
pub(crate) enum Segment {
/// An exact literal match (e.g., "checkout", "foo/bar")
Exact(String),
/// A glob pattern with a single `*` (e.g., "foo-*", "*-bar")
Glob {
/// The literal text before the `*`
prefix: String,
/// The literal text after the `*`
suffix: String,
},
}

/// Result of parsing a segment string, including the special `*` case.
///
/// This is used during pattern parsing to distinguish between:
/// - `Star`: the full wildcard `*` (used for `owner/*` or `owner/repo/*`)
/// - `Segment`: an exact match or glob pattern
/// - Parse failure (multiple wildcards)
#[derive(Debug, Clone, PartialEq)]
pub(crate) enum ParsedSegment {
/// Just `*` - matches anything in this position
Star,
/// A concrete segment (exact or glob)
Segment(Segment),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: can we unify these types? i.e. just one enum Segment with {Star, Glob, Exact}? I think that would be slightly easier to follow here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could, my reasoning was that Star acts purely a signal for which variant we're constructing given that it's never actually stored in a RepositoryUsesPattern. Unifying them would mean handling Star in the matching phase, which we could pretty easily do with an assertion that Segment::Star never appears after the construction phase. I didn't put it in the newest batch of changes, but does that approach sound better?

Comment on lines 275 to 276
// Create a dummy segment for comparison purposes
let no_segment = Segment::Exact(String::new());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making sure I understand: is the idea here that not all patterns have a relevant segment, so we're putting a dummy one in to make Ord and PartialOrd work below?

If so, I think we could maybe make this clearer by having this return (u8, Option<Segment>, Option<Segment>) and handling the optional segment twice.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you're right. Added!

`*` matches `#!yaml uses: actions/checkout` and
`#!yaml uses: pypa/gh-action-pypi-publish@release/v1`.

#### Glob patterns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits here:

  1. I think we should probably call these something other than "glob patterns" -- that term implies (IMO) full compatibility with fnmatch(2) syntax, when really what we're offering here is a very constrained wildcard match. Maybe we don't call them anything specific at all, and just describe them inline with the rest of the patterns? I'm not sure.
  2. I think it'd be nice to unify this section more fully with the "Repository patterns" section right above it -- right now a user who reads this is going to go through the "Repository patterns" section, and then have to correct/adjust their understanding of how patterns work when they reach this lower section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. They're inline with the rest of the patterns now, so you can see how you like the flow.

I also went ahead and renamed everything related to Glob in the code/comments to Wildcard to reduce confusion and make sure things match between docs and code.

Comment on lines 143 to 147
/// Returns the "specificity" of this segment for ordering purposes.
/// Lower values are more specific.
/// Exact matches are more specific than globs.
/// For globs, longer prefix+suffix means more specific.
fn specificity(&self) -> (u8, usize) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: can we put this inline in the Ord implementation? Unless I'm missing something, I don't think having it in its own function does a ton for us.

(Now that I think about it I guess there's a benefit to being able to declare the ordering "key" distinct from Ord/without referencing two instances, so if that's the rationale that seems fine to me 🙂)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That benefit is exactly what I was going for haha, just debugging/separation of concerns. Can inline as well if you'd prefer

// Exact is most specific (0), with no length consideration
Segment::Exact(_) => (0, 0),
// Glob is less specific (1), but longer literals are more specific (inverted)
Segment::Glob { prefix, suffix } => (1, usize::MAX - (prefix.len() + suffix.len())),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this be prefix.len() + suffix.len()? Why do we have to begin at size:::MAX and subtract from there?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to make sure that longer prefix/suffix literals should be considered "more specific" (and therefore be sorted earlier), but looking back over this, using Reverse for that makes that goal much clearer, so I refactored it.

@ryandelano
Copy link
Author

ryandelano commented Jan 26, 2026

Responded to everything and resolved the comments changes, keep me posted on anything else :)

edit: thought I re-requested the review when I commented this and apparently didn't, sorry!!

@ryandelano ryandelano requested a review from woodruffw January 28, 2026 20:57
@woodruffw
Copy link
Member

Responded to everything and resolved the comments changes, keep me posted on anything else :)

edit: thought I re-requested the review when I commented this and apparently didn't, sorry!!

Thanks, and no worries! I'm going to try and set aside some time to review this over the weekend.

@woodruffw woodruffw added enhancement New feature or request config Configuration functionality labels Jan 30, 2026
@woodruffw
Copy link
Member

Sorry for the delay here, I'm doing another review tonight.

Add support for glob wildcards (e.g., `foo-*`, `*-bar`, `foo-*-bar`) in
repository and subpath pattern matching for `uses:` clauses. This allows
more flexible policy configuration for the `unpinned-uses` audit.

Changes:
- Add `Segment` enum to represent exact or glob pattern matches
- Add `ParsedSegment` enum for parsing patterns with `*` wildcards
- Update `RepositoryUsesPattern` to use `Segment` for repo/subpath fields
- Implement specificity ordering so exact patterns take precedence over globs
- Optimize case-insensitive matching to avoid string allocations
- Add comprehensive tests for glob pattern parsing and matching
- Document glob patterns in configuration.md
@woodruffw woodruffw force-pushed the feat/wildcard-repo-patterns branch from 27448c4 to 30dcc65 Compare February 11, 2026 02:47
Signed-off-by: William Woodruff <william@yossarian.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

config Configuration functionality enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments