Skip to content

✨ github/gitlab: discover CloudFormation, Dockerfile, Bicep, Helm, and Kustomize IaC#8314

Open
tas50 wants to merge 1 commit into
mainfrom
github-gitlab-iac-discovery
Open

✨ github/gitlab: discover CloudFormation, Dockerfile, Bicep, Helm, and Kustomize IaC#8314
tas50 wants to merge 1 commit into
mainfrom
github-gitlab-iac-discovery

Conversation

@tas50

@tas50 tas50 commented Jun 10, 2026

Copy link
Copy Markdown
Member

What

Extends the github provider's infrastructure-as-code discovery beyond Terraform and Kubernetes manifests to also detect CloudFormation templates, Dockerfiles, Bicep files, Helm charts, and Kustomize configurations in repositories. Each match becomes a child asset that is cloned from git and scanned by the relevant provider.

Files discovered

Discovery uses the GitHub code-search API (paginated, 100/page) and skips hidden paths (anything with a .-prefixed segment, e.g. .github/).

Type Target flag Matched files Asset granularity
Terraform (existing) terraform repos with the HCL language one per repo
Kubernetes manifests (existing) k8s-manifests *.yaml / *.yml (excl. *mql.yaml/*mql.yml) one per repo
CloudFormation (new) cloudformation .yaml/.yml/.json/.template containing the AWSTemplateFormatVersion marker (avoids false positives against k8s/other YAML) one per template
Dockerfile (new) dockerfiles Dockerfile, Dockerfile.* (e.g. Dockerfile.prod), *.Dockerfile, *.dockerfile one per file
Bicep (new) bicep *.bicep one per repo
Helm (new) helm Chart.yaml one per chart directory
Kustomize (new) kustomize kustomization.yaml / kustomization.yml / Kustomization one per kustomization directory (base + each overlay)

Like the existing Terraform/k8s detectors, the new ones run only on explicit --discover all or --discover <type> (not --discover auto), so there's no surprise cloning.

Git-clone support added to downstream providers

None of bicep, cloudformation, helm, kustomize, or the os dockerfile connection could clone from a git URL before — the dockerfile connection only read ssh-url for naming. Each now:

  • clones the repo on http-url (shallow, via the shared plugin.NewGitClone),
  • resolves the target within the checkout (CloudFormation/Dockerfile point at a repo-relative file; Helm/Kustomize point at a repo-relative directory; Bicep scans the checkout directory),
  • cleans up the temporary clone directory via Close(), including on every post-clone error path (deferred cleanup, disarmed once the connection takes ownership).

Trade-off: CloudFormation, Dockerfile, Helm, and Kustomize emit one asset per matched file/directory, so a repo with N of them performs N shallow clones. This keeps each connection's existing single-target model and avoids a shared clone cache; documented in the discoverers.

Note on overlap: Helm templates/*.yaml and kustomization.yaml also match the existing k8s *.yaml detector, so a chart/kustomize repo can surface under --discover k8s-manifests and --discover helm/kustomize. That's intentional — each is a distinct analysis lens — but worth being aware of when running --discover all.

Repo-based naming & stable identity

  • k8s, bicep, cloudformation, helm, and kustomize now name discovered assets from the git repo (org/repo[/path]) like Terraform/Dockerfile, instead of the temporary clone directory. Fixes K8s Manifest directory mql-git-clone3841…K8s Manifest tas50/iac_tests.
  • bicep, cloudformation, helm, and kustomize platform IDs are now derived from the repo (and target path) rather than a hash of the temp clone path — which previously changed on every scan, giving a discovered asset a new identity each run.

Verification

Verified end-to-end against tas50/iac_tests. Discovery, naming, and content queries all resolve:

Asset Platform Content checked
K8s Manifest tas50/iac_tests k8s-manifest — (renamed)
CloudFormation template …/cloudformation/s3_bucket.yaml cloudformation 2 S3 resources parsed
Dockerfile …/docker/Dockerfile.secure dockerfile stages parsed (FROM ubuntu)
Dockerfile …/docker/Dockerfile.insecure dockerfile stages parsed
Bicep file tas50/iac_tests bicep 2 .bicep files, 1 resource each
Helm Chart …/helm/web helm chart web 0.1.0 parsed
Kustomize file …/kustomize/base kustomize 1 resource parsed
Kustomize file …/kustomize/overlays/prod kustomize 1 resource parsed

Files changed

  • providers/github/{config,connection,resources} — five discovery targets + detectors + paginated searchCode helper + gitCredentials dedupe
  • providers/bicep/{provider,connection}, providers/cloudformation/{provider,connection}, providers/helm/{provider,connection}, providers/kustomize/{provider,connection} — git clone, Close(), repo-based name/platform ID
  • providers/os/connection/docker/docker_file_connection.go — git clone, Close(), repo-relative platform ID
  • providers/k8s/connection/manifest/connection.go — repo-based asset name

@mondoo-code-review mondoo-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GitHub/GitLab repo scanning now discovers CloudFormation, Dockerfile, Bicep, Helm, and Kustomize IaC assets.

Comment thread providers/gitlab/provider/discovery.go
Comment thread providers/github/resources/discovery.go
Comment thread providers/bicep/connection/connection.go Outdated
@tas50 tas50 force-pushed the github-gitlab-iac-discovery branch from 7c5ca77 to f93539d Compare June 10, 2026 06:16

@mondoo-code-review mondoo-code-review Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IaC discovery from GitHub/GitLab repos will work for new IaC types, but GitHub DiscoveryAll on large orgs may hit the Search API rate limit (30 req/min) with up to 7 search calls per repo.

Comment thread providers/github/resources/discovery.go Outdated
Comment thread providers/gitlab/provider/discovery.go
Comment thread providers/github/resources/discovery.go Outdated
…d Kustomize IaC

Extend the GitHub and GitLab providers' infrastructure-as-code discovery
beyond Terraform and Kubernetes manifests to also detect CloudFormation
templates, Dockerfiles, Bicep files, Helm charts, and Kustomize
configurations in repositories.

Discovery (github + gitlab):
- New targets: cloudformation, dockerfiles, bicep, helm, kustomize —
  wired into org()/repo(), the "all" expansion, and the CLI config.
- Both providers classify the filename/extension-based types (k8s,
  dockerfiles, bicep, helm, kustomize) from a single recursive git-tree
  walk per repo rather than one search call per type. On GitHub the tree
  endpoint is on the generous core rate limit, so `--discover all` no
  longer risks the Code Search limit (30 req/min) on multi-repo orgs; the
  walk also catches *.Dockerfile names the old filename: prefix search
  missed.
- CloudFormation is detected via a content heuristic (the
  AWSTemplateFormatVersion marker) since templates share .yaml/.json with
  many other files — one Code Search (GitHub) / project blob search
  (GitLab), one child asset per template. GitLab skips it gracefully on
  instances without advanced search.
- Dockerfiles: one child asset per Dockerfile / Dockerfile.* / *.Dockerfile.
- Bicep: one child asset per repo (the connection walks the checkout).
- Helm: one asset per chart directory (Chart.yaml).
- Kustomize: one asset per kustomization directory (kustomization.yaml/.yml
  / Kustomization), so base and overlays each become their own asset.
- Deduped the credential-cloning into a gitCredentials helper.

Git-clone support (downstream providers):
- None of bicep, cloudformation, helm, kustomize, or the os dockerfile
  connection could clone before. Each now clones on http-url, resolves a
  repo-relative path/dir within the checkout where applicable, and cleans
  up its temp dir via Close(). A deferred cleanup guard is armed right
  after the clone and disarmed once the connection owns the closer, so
  every error path removes the checkout.

Repo-based naming & stable identity:
- k8s, bicep, cloudformation, helm, and kustomize now name discovered
  assets from the git repo (org/repo[/path]) like Terraform/Dockerfile,
  instead of the temporary clone directory.
- Platform IDs and asset.Id are derived from the repo (and template/dir
  path) rather than a hash of the temp clone path, which previously
  changed on every scan. The bicep connection no longer overwrites
  cc.Options["path"] with the non-deterministic clone directory.
- Asset names dropped the verbose "Static Analysis" verbiage (e.g.
  "CloudFormation template tas50/iac_tests/...", "Helm Chart tas50/...").

Note: Chart.yaml and kustomization.yaml match their own discovery cases
ahead of the k8s YAML branch, so a repo whose only YAML files are
Helm/Kustomize entry points no longer also registers as a k8s manifest
asset. This is intentional — those files aren't k8s manifests — and now
behaves identically on both providers.

Verified end-to-end against tas50/iac_tests: discovery, naming, and
content queries resolve for all IaC types.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@tas50 tas50 force-pushed the github-gitlab-iac-discovery branch from f93539d to c548377 Compare June 10, 2026 06:29
@github-actions

Copy link
Copy Markdown
Contributor

Test Results

9 864 tests   9 858 ✅  3m 6s ⏱️
  543 suites      6 💤
   40 files        0 ❌

Results for commit c548377.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant