Skip to content

feat: Add support for multiple matching algorithm versions#1018

Open
adekoder wants to merge 1 commit intoNixOS:mainfrom
adekoder:infrastructure-support-for-multiple-matching-algorithm
Open

feat: Add support for multiple matching algorithm versions#1018
adekoder wants to merge 1 commit intoNixOS:mainfrom
adekoder:infrastructure-support-for-multiple-matching-algorithm

Conversation

@adekoder
Copy link
Copy Markdown
Collaborator

Currently we look at each CVE exactly once. When we introduce a new matching algorithm we want to re-run it across all CVEs without overwriting or destroying suggestions that have already been accepted under the previous algorithm.

This PR lays the groundwork: each algorithm lives in its own versioned module (v1.py, v2.py, …) and self-registers into an in-memory registry on import.
Two new settings, ACTIVE_MATCHING_ALGORITHM_VERSION and CANDIDATE_MATCHING_ALGORITHM_VERSION, control which version is the live one and which is being evaluated.

Proposals record the algorithm that generated them via a new algorithm_version field, and all user-facing views, caching, and cache-regeneration are filtered to the active version only. This means a candidate algorithm can generate proposals in the background without them surfacing to triagers until the version is explicitly promoted.

Comment thread src/shared/listeners/algorithms/__init__.py Dismissed
Comment thread src/shared/listeners/automatic_linkage.py Dismissed
@adekoder adekoder force-pushed the infrastructure-support-for-multiple-matching-algorithm branch from c5b0020 to 8540a20 Compare April 29, 2026 12:51
Copy link
Copy Markdown
Collaborator

@fricklerhandwerk fricklerhandwerk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simpler version config, import, and selection mechanism would substantially reduce the amount of testing required.

Then the next PRs could add the mechanism for dry-running the non-active versions and collecting metrics, respectively.

Side note: While this follows the design we discussed, I'm a bit uneasy about the algorithm versioning living in separate files. It makes diffing rather impractical. I don't have a better proposal than doing version bumps in multiple commits though: copy v1, change v1; review would look at commits separately.

Comment thread src/project/settings.py
Comment on lines +174 to +177
Controls which registered matching algorithm version is used when
linking CVEs to derivations. Must match a VERSION defined in
shared/listeners/algorithms/. Bump this setting to activate a new
algorithm version without changing code.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't work. Changing an environment variable for us is exactly equivalent to changing code, because there's no such thing as a deployment that doesn't touch the source.

That means, for our purposes at this point in development it would just as good (and a lot simpler) to hard-code it in e.g. the suggestion model like we have it for caching (and as discussed in #722 (comment)):

@classproperty
def CURRENT_SCHEMA_VERSION(cls) -> int: # noqa: N802, N805
return 2

Comment on lines +27 to +29
def register(module: MatchingAlgorithm) -> None:
"""Register an algorithm module under its VERSION."""
_registry[module.VERSION] = module
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we just read the directory and map the v*.py filenames to versions automatically?

)


@override_settings(ACTIVE_MATCHING_ALGORITHM_VERSION=1)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is where this design via settings already falls apart. Once we remove v1 this test will need to change

Comment on lines +32 to +39
def _resolve(version: int) -> MatchingAlgorithm:
"""Return a resigister alogirithm that match the version."""
if not _registry:
raise RuntimeError("No matching algorithm registered.")
try:
return _registry[version]
except KeyError:
raise KeyError(f"No matching algorithm registered for version {version}.")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be perfectly enough to check in a test that selecting the active and candidate version works, simply to guard against typos and omissions. Since for now we're controlling the selection in code, all we need to guarantee is basic consistency.

Then the actual selection is simply _registry[version] right there at the call site.

active_proposal = make_suggestion(
container=active_container,
drvs={drv: ProvenanceFlags.PACKAGE_NAME_MATCH},
status=status,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be more explicit and independent of any particular value, something like:

  algorithm_version=CVEDerivationClusterProposal.AlgorithmVersion

and then in the counterpart

  algorithm_version=CVEDerivationClusterProposal.AlgorithmVersion+1

make_drv: Callable[..., NixDerivation],
make_container: Callable[..., Container],
) -> None:
""".objects.all() is unrestricted and returns every proposal regardless of version."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to test that.



@override_settings(ACTIVE_MATCHING_ALGORITHM_VERSION=1)
def test_active_returns_only_matching_version_when_both_exist(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually the same as the above, no? It just creates both versions simultaneously rather than in two different tests. Maybe let's only leave this one?

make_drv: Callable[..., NixDerivation],
make_container: Callable[..., Container],
) -> None:
"""Changing ACTIVE_MATCHING_ALGORITHM_VERSION switches which proposals .active() returns."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm inclined to suggest to also fold this into the above test.

def test_active_returns_empty_when_no_version_matches(
make_suggestion: Callable[..., CVEDerivationClusterProposal],
) -> None:
"""When no proposal has the active version, .active() is empty."""
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also be a parameter

shared_drv = make_drv(pname="shared-pkg", attribute="shared-pkg")

active_container = make_container(cve_id="CVE-2025-2001")
active_proposal = make_suggestion(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
active_proposal = make_suggestion(
active_proposal = make_cached_suggestion(

Then no need to cache afterwards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants