Description
The scanner supports storing multiple types of data in storages for reuse in subsequents runs or other tools:
- Scan results
- Package based
- Provenance based
- Provenance resolutions results
- For packages
- For nested provenances
- File archives
- File lists
Currently the storage backends can be configured separately for each of those four data types. While this is very flexible, in practice it provides little value. For example, if scan results are stored in a Postgres database, there is little reason to store provenance results in a different place. Or if file archives are stored in S3, there is little reason to store file lists in a different place.
This flexibility makes the configuration complex: The default settings store all data in a local directory which is usually not desired in a production setup, so to store the data remotely four storage backend configurations are required. This often confuses users and can also cause performance issues for users not knowing how the scanner works internally, for example, by forgetting to configure a provenance storage which leads to unnecessary repetition of the provenance resolution.
To simplify the configuration, the proposal is to consolidate the configuration to just two types of data:
- Structured data
- Scan results
- Provenance resolution results
- Binary data
- File archives
- File lists (at least currently, it could be more efficient to treat file lists as structured data as well)
The implementation proposal is:
- Rename
ScanStorage
and all related classes toScanResultStorage
- When introduced, the name
ScanStorage
was chosen becauseScanResultStorage
was already taken, but this is not the case anymore.
- When introduced, the name
- Make two interfaces providing the functions for storing structured data and binary data
ScanStorage
BinaryStorage
- Make implementations of those interfaces that reuse the existing implementations
- For example,
PostgresScanStorage
usesProvenanceBasedPostgresStorage
,PostgresPackageProvenanceStorage
andPostgresNestedProvenanceStorage
.
- For example,
- Adapt the configuration to require only configuration for those two types of storages.
- Make the new interfaces plugins (see Turn scan storages into plugins #6603)
- This ensures that new implementations provide all required features. For example, a new
MariaDbScanStorage
should not only provide a way to store scan results, but also to store provenance resolution results.
- This ensures that new implementations provide all required features. For example, a new