Skip to content

Consolidate configuration of scan storages #8721

Open
@mnonnenmacher

Description

@mnonnenmacher

The scanner supports storing multiple types of data in storages for reuse in subsequents runs or other tools:

  • Scan results
    • Package based
    • Provenance based
  • Provenance resolutions results
    • For packages
    • For nested provenances
  • File archives
  • File lists

Currently the storage backends can be configured separately for each of those four data types. While this is very flexible, in practice it provides little value. For example, if scan results are stored in a Postgres database, there is little reason to store provenance results in a different place. Or if file archives are stored in S3, there is little reason to store file lists in a different place.

This flexibility makes the configuration complex: The default settings store all data in a local directory which is usually not desired in a production setup, so to store the data remotely four storage backend configurations are required. This often confuses users and can also cause performance issues for users not knowing how the scanner works internally, for example, by forgetting to configure a provenance storage which leads to unnecessary repetition of the provenance resolution.

To simplify the configuration, the proposal is to consolidate the configuration to just two types of data:

  • Structured data
    • Scan results
    • Provenance resolution results
  • Binary data
    • File archives
    • File lists (at least currently, it could be more efficient to treat file lists as structured data as well)

The implementation proposal is:

  • Rename ScanStorage and all related classes to ScanResultStorage
    • When introduced, the name ScanStorage was chosen because ScanResultStorage was already taken, but this is not the case anymore.
  • Make two interfaces providing the functions for storing structured data and binary data
    • ScanStorage
    • BinaryStorage
  • Make implementations of those interfaces that reuse the existing implementations
    • For example, PostgresScanStorage uses ProvenanceBasedPostgresStorage, PostgresPackageProvenanceStorage and PostgresNestedProvenanceStorage.
  • Adapt the configuration to require only configuration for those two types of storages.
  • Make the new interfaces plugins (see Turn scan storages into plugins #6603)
    • This ensures that new implementations provide all required features. For example, a new MariaDbScanStorage should not only provide a way to store scan results, but also to store provenance resolution results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    scannerAbout the scanner tool

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions