Skip to content

Webhook receiver filter on includePaths #5591

@kwoodson

Description

@kwoodson

Checklist

  • I've searched the issue queue to verify this is not a duplicate feature request.
  • I've pasted the output of kargo version, if applicable.
  • I've pasted logs, if applicable.
kargo version                                                                                                                                                                                                                                                                                                                                                                                
Client Version: v1.8.4
Server Version: v1.8.1

Proposed Feature

Add Warehouse filtering on includePaths to webhook receivers for ClusterConfig webhooks.

Motivation

We store our application configuration in a directory called apps/ in a monorepo. We have plans to migrate more applications to this repository and might arrive at 150+ applications. Currently we have two warehouse configured:

  • ECR Warehouse (not relevant per this discussion)
  • Git warehouse
spec:
  subscriptions:
    - git:
        repoURL: https://github.com/<owner>/<repo>.git
        commitSelectionStrategy: NewestFromBranch
        includePaths:
          - apps/foo

According to the documentation here,

A webhook receiver's only job is to extract a repository URL from the webhook request's payload, query for all Warehouse resources across all Projects having subscriptions to that repository, and request each to execute their artifact discovery process.

This means that ALL Warehouses subscribed to this repository will get triggered to perform an update.

I would like to see filtering added to the Webhook receiver to notify only specific Warehouses that match the correct includePaths from the payload.

I checked that the Github push event does include the files:

type PushEvent struct {
    // ... other fields
    HeadCommit   *HeadCommit `json:"head_commit,omitempty"`
    Commits      []*HeadCommit `json:"commits,omitempty"`
}

type HeadCommit struct {
    Added     []string `json:"added,omitempty"`
    Removed   []string `json:"removed,omitempty"`
    Modified  []string `json:"modified,omitempty"`
    // ... other fields
}

Work would entail:

  1. Extract file paths from the webhook payload (github.go:215-225)
case *gh.PushEvent:
    repoURLs = []string{urls.NormalizeGit(e.GetRepo().GetCloneURL())}
    ref := e.GetRef()

    // NEW: Extract file paths from HeadCommit
    var filePaths []string
    if e.HeadCommit != nil {
        filePaths = append(filePaths, e.HeadCommit.Added...)
        filePaths = append(filePaths, e.HeadCommit.Modified...)
        filePaths = append(filePaths, e.HeadCommit.Removed...)
    }
  1. Update shouldRefresh in (refresh.go 148-179)
  • add file path matching
  • check if modified files match includePaths
  • respect the excludePaths
  1. Pass file paths through call chain:
  • update refreshWarehouses signature to accept file paths
  • update shouldRefresh to evaluate path patterns
  1. Add documentation

This would greatly reduce the Warehouse refresh.

The only other way I can think of tackling this is to cache the github repository and use that across multiple warehouses so that the Github API queries are reduced and Warehouses that share a repository can detect changes instantly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/enhancementAn entirely new featurekind/proposalIndicates maintainers have not yet committed to a feature requestneeds/areaIssue or PR needs to be labeled to indicate what parts of the code base are affectedneeds/priorityPriority has not yet been determined; a good signal that maintainers aren't fully committed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions