feat: Add support for multiple matching algorithm versions#1018
feat: Add support for multiple matching algorithm versions#1018adekoder wants to merge 1 commit intoNixOS:mainfrom
Conversation
c5b0020 to
8540a20
Compare
fricklerhandwerk
left a comment
There was a problem hiding this comment.
A simpler version config, import, and selection mechanism would substantially reduce the amount of testing required.
Then the next PRs could add the mechanism for dry-running the non-active versions and collecting metrics, respectively.
Side note: While this follows the design we discussed, I'm a bit uneasy about the algorithm versioning living in separate files. It makes diffing rather impractical. I don't have a better proposal than doing version bumps in multiple commits though: copy v1, change v1; review would look at commits separately.
| Controls which registered matching algorithm version is used when | ||
| linking CVEs to derivations. Must match a VERSION defined in | ||
| shared/listeners/algorithms/. Bump this setting to activate a new | ||
| algorithm version without changing code. |
There was a problem hiding this comment.
This doesn't work. Changing an environment variable for us is exactly equivalent to changing code, because there's no such thing as a deployment that doesn't touch the source.
That means, for our purposes at this point in development it would just as good (and a lot simpler) to hard-code it in e.g. the suggestion model like we have it for caching (and as discussed in #722 (comment)):
nix-security-tracker/src/shared/models/cached.py
Lines 21 to 23 in e3ead57
| def register(module: MatchingAlgorithm) -> None: | ||
| """Register an algorithm module under its VERSION.""" | ||
| _registry[module.VERSION] = module |
There was a problem hiding this comment.
Can't we just read the directory and map the v*.py filenames to versions automatically?
| ) | ||
|
|
||
|
|
||
| @override_settings(ACTIVE_MATCHING_ALGORITHM_VERSION=1) |
There was a problem hiding this comment.
Here is where this design via settings already falls apart. Once we remove v1 this test will need to change
| def _resolve(version: int) -> MatchingAlgorithm: | ||
| """Return a resigister alogirithm that match the version.""" | ||
| if not _registry: | ||
| raise RuntimeError("No matching algorithm registered.") | ||
| try: | ||
| return _registry[version] | ||
| except KeyError: | ||
| raise KeyError(f"No matching algorithm registered for version {version}.") |
There was a problem hiding this comment.
It would be perfectly enough to check in a test that selecting the active and candidate version works, simply to guard against typos and omissions. Since for now we're controlling the selection in code, all we need to guarantee is basic consistency.
Then the actual selection is simply _registry[version] right there at the call site.
| active_proposal = make_suggestion( | ||
| container=active_container, | ||
| drvs={drv: ProvenanceFlags.PACKAGE_NAME_MATCH}, | ||
| status=status, |
There was a problem hiding this comment.
This should be more explicit and independent of any particular value, something like:
algorithm_version=CVEDerivationClusterProposal.AlgorithmVersion
and then in the counterpart
algorithm_version=CVEDerivationClusterProposal.AlgorithmVersion+1
| make_drv: Callable[..., NixDerivation], | ||
| make_container: Callable[..., Container], | ||
| ) -> None: | ||
| """.objects.all() is unrestricted and returns every proposal regardless of version.""" |
There was a problem hiding this comment.
I don't think we need to test that.
|
|
||
|
|
||
| @override_settings(ACTIVE_MATCHING_ALGORITHM_VERSION=1) | ||
| def test_active_returns_only_matching_version_when_both_exist( |
There was a problem hiding this comment.
This is actually the same as the above, no? It just creates both versions simultaneously rather than in two different tests. Maybe let's only leave this one?
| make_drv: Callable[..., NixDerivation], | ||
| make_container: Callable[..., Container], | ||
| ) -> None: | ||
| """Changing ACTIVE_MATCHING_ALGORITHM_VERSION switches which proposals .active() returns.""" |
There was a problem hiding this comment.
I'm inclined to suggest to also fold this into the above test.
| def test_active_returns_empty_when_no_version_matches( | ||
| make_suggestion: Callable[..., CVEDerivationClusterProposal], | ||
| ) -> None: | ||
| """When no proposal has the active version, .active() is empty.""" |
There was a problem hiding this comment.
This could also be a parameter
| shared_drv = make_drv(pname="shared-pkg", attribute="shared-pkg") | ||
|
|
||
| active_container = make_container(cve_id="CVE-2025-2001") | ||
| active_proposal = make_suggestion( |
There was a problem hiding this comment.
| active_proposal = make_suggestion( | |
| active_proposal = make_cached_suggestion( |
Then no need to cache afterwards
Currently we look at each CVE exactly once. When we introduce a new matching algorithm we want to re-run it across all CVEs without overwriting or destroying suggestions that have already been accepted under the previous algorithm.
This PR lays the groundwork: each algorithm lives in its own versioned module (v1.py, v2.py, …) and self-registers into an in-memory registry on import.
Two new settings,
ACTIVE_MATCHING_ALGORITHM_VERSIONandCANDIDATE_MATCHING_ALGORITHM_VERSION, control which version is the live one and which is being evaluated.Proposals record the algorithm that generated them via a new
algorithm_versionfield, and all user-facing views, caching, and cache-regeneration are filtered to the active version only. This means a candidate algorithm can generate proposals in the background without them surfacing to triagers until the version is explicitly promoted.