How to represent model lineage?

Models can undergo transformation processes that create new model artifacts from existing ones. Common scenarios include:

- **Fine-tuning**: Adapting a foundation model to a specific domain or task using additional training data
- **Quantization**: Reducing model precision (e.g., from FP32 to INT8) to optimize for inference performance
- **Distillation**: Training a smaller "student" model to mimic a larger "teacher" model
- **Compression**: Removing weights or layers to reduce model size
- **Merging**: Combining multiple models or LoRA adapters into a single artifact

In each case, we create new model files—a child artifact derived from one or more parent artifacts.

How should supply chain security metadata artifacts relate across the lineage boundary?

## OMS Signatures

How should the OMS (OpenSSF Model Signing) signature of the child relate to the OMS signature of the parent?

The child model consists of entirely new files with new content. A quantized INT8 model bears little binary resemblance to its FP32 parent, even though they perform the same function. A fine-tuned model has different weights than its foundation model base.

**Position**: The child's OMS signature should be completely independent of the parent's OMS signature.

A robust build process may verify the parent's OMS signature at the beginning of the pipeline—confirming authenticity and integrity before beginning the transformation. You want to ensure you're starting from a trusted artifact. However, once that verification passes and the transformation completes, the child artifact is a new, independent entity. The child's OMS signature should cryptographically bind to the child's files, not reference or include the parent's signature.

See also https://github.com/sigstore/model-transparency/issues/586.

## AI SBOMs

How should the AI SBOM of the child relate to the AI SBOM of the parent?

An AI SBOM inventories the components that comprise a model: training datasets, foundation models, dependency libraries, data preprocessing tools, etc. When you create a child model from a parent, the parent becomes a component of the child's supply chain.

**Position**: The child's AI SBOM should reference the parent model as a component using the `DESCENDANT_OF` / `ANCESTOR_OF` relationship type from SPDX or the `descendents` / `ancestors` fields of the `pedigree` field from CycloneDX.

For example, for a fine-tuned model, the AI SBOM should reflect:
- The foundation model as a component
- The fine-tuning dataset as a component
- Any new dependencies introduced during fine-tuning

For a quantized model, the AI SBOM might be nearly identical to the parent's SBOM, with metadata noting the quantization process.

## SLSA Provenance

How should the SLSA provenance of the child relate to the parent?

SLSA treats ML transformations as builds; when you fine-tune a model, that's a build process. The foundation model is a dependency of that build, just as a Python library would be.

SLSA is not [transitive](https://slsa.dev/spec/v1.1/faq#q-why-is-slsa-not-transitive), at least not yet and the [new dependency track draft](https://slsa.dev/spec/draft/dependency-track) defers verification of dependency provenance to a future iteration of the document.

**Position**: The child’s SLSA provenance is completely independent of the parent’s SLSA provenance. The child’s build process may choose to verify the provenance of its parent model, but this practice does not correspond with any requirements in SLSA and in any case, the provenance records should remain decoupled.

Note: [slsa-framework/slsa#978](https://github.com/slsa-framework/slsa/issues/978) makes the case that the SLSA provenance attestation for the child model should reference the parent model as an external dependency (one of `resolvedDependencies`) in the build process, but the right place for a parent model reference is in the SBOM of the child model, not the SLSA build provenance.

## Conclusion

The AI SBOM of a child model is the **critical artifact for representing the foundation models and contributing datasets** that were used in the long chain of builds and transformations that resulted in a final model.

The SLSA provenance of a child model describes how the artifact was built, but it does not describe a transitive foundation model and dataset dependencies that contributed to the production of the model.

The OMS signature is critical to determine if model files have been tampered with after the creation of the OMS signature.

See also https://github.com/sigstore/model-transparency/issues/587 for an illustration of how to encode AI SBOMs and provenance attestations in a model repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to represent model lineage? #588

OMS Signatures

AI SBOMs

SLSA Provenance

Conclusion

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to represent model lineage? #588

Description

OMS Signatures

AI SBOMs

SLSA Provenance

Conclusion

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions