Skip to content

feat: record plugin package provenance on resource creation#1002

Open
michael-johnston wants to merge 3 commits into
mainfrom
maj_plugin_provenance
Open

feat: record plugin package provenance on resource creation#1002
michael-johnston wants to merge 3 commits into
mainfrom
maj_plugin_provenance

Conversation

@michael-johnston
Copy link
Copy Markdown
Member

When a DiscoverySpaceResource, OperationResource, or ActuatorConfigurationResource is written to the metastore, the exact PyPI distribution name and version of every plugin involved is now captured and stored alongside it. This makes it straightforward to identify — after the fact — which package versions were used to create a given resource, supporting both replication and debugging.

Previously this was somewhat deducible for some operators as long as the operator name and package name were similar as the identifier contained the operator plugin version. However this breaks down if a package provides multiple operators. For experiments or actuator configurations there was no record.

Changes

  • Add model for package provenance
  • Add optional field to resource schema (space, operation, actuator configuration)
  • Add functions to plugin catalogs (actuator registry, operator collections) to get relevant data
  • Before creation of resource of each type get package data and add it

Examples:

space - one entry per actuator and custom_experiment used

...
created: '2026-06-04T18:17:16.323098Z'
customExperimentProvenance:
  nevergrad_opt_3d_test_func:
    distributionName: optimization_test_functions
    distributionVersion: 1.8.1.dev46+437b3291.dirty
identifier: space-7b7ae1-default
status:
- event: created
  recorded_at: '2026-06-04T18:17:16.323104Z'
- event: added
  recorded_at: '2026-06-04T18:17:16.324510Z'

operation

...
kind: operation
metadata:
 entities_submitted: 40
 experiments_requested: 40
operationType: search
operatorIdentifier: ray_tune-1.8.1.dev43+g08402ddd3.d20260603144611
operatorProvenance:
 distributionName: ado-ray-tune
 distributionVersion: 1.8.1.dev43+g08402ddd3.d20260603144611
 ...

@AlessandroPomponio
Copy link
Copy Markdown
Member

Wouldn't it be better to have a model called ProvenanceInfo which we can then include in all AdoResources?

Something like:

class ProvenanceInfo:
  actuators: dict[str, PackageProvenance] | None
  custom_experiments: dict[str, PackageProvenance] | None
  operators: dict[str, PackageProvenance] | None

This would give us a consistent pattern + extensibility for the future.

As a side note, I think we should stop using camelCasing even in Pydantic models, so I would use snake_case everywhere

@michael-johnston
Copy link
Copy Markdown
Member Author

Wouldn't it be better to have a model called ProvenanceInfo which we can then include in all AdoResources?

Only certain resources have this information though.

Its seems strange to have these fields on datacontainer or samplestore - they would also be in full dump of objects and schema.

Maybe the object should just be

class ProvenanceInfo:
    #no fields
    #add validator that all fields values are dict of name:PackageProvenance

resource subclass can then also subclass this OperationProvenanceInfo etc and put whatever fields are relevant to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants