Skip to content

[ingestion/data-quality issue] guaccollect image` in v1.0.0 Fails to Ingest SLSA Provenance from OCI Image Index #2749

@june5079

Description

@june5079

[Bug] guaccollect image in v1.0.0 Fails to Ingest SLSA Provenance from OCI Image Index

Description

When using the guaccollect collect image command in GUAC v1.0.0 to ingest an image that contains an embedded SLSA Provenance Attestation within an OCI Image Index, the collector fails to discover the attestation. It exits gracefully, reporting completed ingesting 0 documents.

The image was built using docker buildx build --provenance "mode=max". Direct inspection of the registry via curl with the correct Accept headers confirms that the OCI Image Index and the included attestation manifest are being served correctly by the registry.

This strongly suggests a bug in the OCI client implementation of guaccollect image, where it does not correctly parse the OCI Image Index to find linked attestation manifests.

Steps to Reproduce

  1. Create a Sample Application and Dockerfile:
    Prepare a simple application and a Dockerfile.

    # Dockerfile
    FROM python:3.9-slim
    WORKDIR /app
    COPY requirements.txt .
    RUN pip install --no-cache-dir -r requirements.txt
    COPY app.py .
    CMD ["python", "app.py"]
  2. Build and Push Image with SLSA Provenance:
    Use docker buildx to build and push an image that includes SLSA Provenance in max mode to an OCI-compliant registry (e.g., Harbor).

    export IMAGE_TAG=<your-registry>/<project>/my-app:1.0
    
    docker buildx build \
      --platform linux/amd64 \
      --tag $IMAGE_TAG \
      --provenance "mode=max" \
      --push \
      .
  3. Verify OCI Image Index Existence in Registry (via cURL):
    Confirm that the registry is serving the OCI Image Index correctly using curl.

    TOKEN=$(...) # Your registry auth token
    
    curl -v \
      -H "Authorization: Basic $TOKEN" \
      -H "Accept: application/vnd.oci.image.index.v1+json, application/vnd.docker.distribution.manifest.list.v2+json" \
      https://<your-registry>/v2/<project>/my-app/manifests/1.0

    Result: This returns an HTTP 200 OK response with a valid OCI Image Index JSON, containing manifests for both the amd64 image and the attestation. This proves the registry is not the issue.

  4. Run the guaccollect image Command:
    Use the GUAC v1.0.0 binary (guacone or guaccollect) to ingest the image pushed in step 2. (Tested in a local Docker Compose environment).

    ./guaccollect collect image <your-registry>/<project>/my-app:1.0

Expected Behavior

The guaccollect tool should successfully authenticate with the registry, parse the OCI Image Index, and discover the embedded SLSA Provenance attestation. It should then publish the attestation to NATS, and the logs should show messages similar to the following:

{"level":"info",...,"msg":"processing document...","type":"slsa", "format":"in-toto"}
{"level":"info",...,"msg":"document published"}
{"level":"info",...,"msg":"completed ingesting 1 documents"}

Subsequently, querying the GraphQL API for the artifact's slsaAttestations field should return the provenance data.

Actual Behavior

The guaccollect tool runs without any connection or authentication errors but fails to find the attestation. It reports that it has ingested 0 documents and exits gracefully, leaving no data in the GUAC database.

{"level":"info","ts":1756465620.462883,"caller":"logging/logger.go:79","msg":"Logging at info level","guac-version":"v1.0.0"}
{"level":"info","ts":1756465621.236005,"caller":"cmd/oci.go:125","msg":"collector ended gracefully","guac-version":"v1.0.0"}
{"level":"info","ts":1756465621.236041,"caller":"cmd/oci.go:138","msg":"completed ingesting 0 documents","guac-version":"v1.0.0"}

Environment

  • GUAC Version: v1.0.0 (from binary and ghcr.io/guacsec/guac:v1.0.0 Docker image)
  • Docker Version: Latest Docker Desktop with buildx support
  • Registry: Harbor (v2.x)
  • Execution Environment: Docker Compose and Kubernetes

Additional Context

This issue appears to be specific to the guaccollect image command's handling of OCI Image Indexes. In contrast, using the guaccollect file command to ingest an SBOM generated by syft shows that the tool can successfully parse documents and assemble nodes (assembling Package..., etc.). This indicates that the backend of the GUAC pipeline (Ingestor, GraphQL server) is functioning correctly.

The core of the problem seems to be the failure of guaccollect image to discover attestations embedded within an OCI Image Index.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingdata-qualityThings related to data quality and document ingestiondata-sources

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions