Skip to content

Conversation

@JimFuller-RedHat
Copy link
Contributor

@JimFuller-RedHat JimFuller-RedHat commented Dec 2, 2025

The 'latest' endpoint never got a proper user story - this ADR attempts to document what we have - so we can align current requirements.

link to ADR

Summary by Sourcery

Document the current behaviour and architecture of the 'latest' analysis endpoint, including how SBOM data is selected, loaded into the in-memory graph, and queried, and capture retroactive decisions and future requirements for evolving this functionality.

Enhancements:

  • Capture retroactive architectural decisions and proposed future changes for deprecating the generic 'latest' endpoint in favour of more specific APIs and expanding the SBOM test corpus.

Documentation:

  • Add a comprehensive ADR explaining the existing 'latest' component analysis implementation, including database ranking logic, in-memory graph querying, example API usages, and lifecycle testing requirements.

@JimFuller-RedHat JimFuller-RedHat self-assigned this Dec 2, 2025
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 2, 2025

Reviewer's Guide

Adds a comprehensive, retroactive Architecture Decision Record documenting the current 'latest' analysis endpoint implementation, including how SBOMs are selected and loaded into the in‑memory graph, how graph queries are executed, and what future requirements and refactors are envisioned.

Sequence diagram for the latest analysis component query

sequenceDiagram
    actor Client
    participant ApiLatestComponentEndpoint
    participant AnalysisService
    participant AnalysisInner
    participant Database
    participant GraphCache
    participant PackageGraph
    participant GraphLoader
    participant Collector

    Client->>ApiLatestComponentEndpoint: HTTP GET /api/v2/analysis/latest/component
    ApiLatestComponentEndpoint->>AnalysisService: retrieve_latest(search, options, paginated, db)

    AnalysisService->>AnalysisInner: load_latest_graphs_query(connection, GraphQuery)
    AnalysisInner->>Database: execute ranking subquery
    Database-->>AnalysisInner: latest_sbom_ids
    AnalysisInner->>Database: load SBOM data for latest_sbom_ids
    Database-->>AnalysisInner: SBOM records
    AnalysisInner-->>AnalysisService: graphs Vec_Uuid_Arc_PackageGraph

    AnalysisService->>GraphCache: get or insert PackageGraph entries
    GraphCache-->>AnalysisService: cached graphs slice

    AnalysisService->>AnalysisService: run_graph_query(query, options, graphs, connection, graph_cache)
    loop for_each_matching_node
        AnalysisService->>PackageGraph: iterate node_indices and filter
        PackageGraph-->>AnalysisService: node_index, node

        AnalysisService->>GraphLoader: new AnalysisService_clone
        par collect_ancestors_and_descendants
            AnalysisService->>Collector: new Direction_Incoming
            Collector->>PackageGraph: traverse ancestors
            PackageGraph-->>Collector: ancestor_nodes
            Collector-->>AnalysisService: ancestors

            AnalysisService->>Collector: new Direction_Outgoing
            Collector->>PackageGraph: traverse descendants
            PackageGraph-->>Collector: descendant_nodes
            Collector-->>AnalysisService: descendants
        and
        end
    end

    AnalysisService-->>ApiLatestComponentEndpoint: PaginatedResults_Node_
    ApiLatestComponentEndpoint-->>Client: HTTP 200 JSON items
Loading

Updated class diagram for the latest analysis service and graph query pipeline

classDiagram
    class ApiLatestComponentEndpoint {
        +search_latest_component(service, db, search, options, paginated, read_sbom) Result_Responder_
    }

    class AnalysisService {
        +concurrency usize
        +inner AnalysisInner
        +retrieve_latest(query, options, paginated, connection) Result_PaginatedResults_Node__Error_
        +run_graph_query(query, options, graphs, connection, graph_cache) Result_Vec_Node__Error_
        +collect_graph(query, graphs, concurrency, create) Result_Vec_Node__Error_
        +filter(graph, query, node_index) bool
    }

    class AnalysisInner {
        +graph_cache GraphMap
        +load_latest_graphs_query(connection, query) Result_Vec_Uuid_Arc_PackageGraph___Error_
    }

    class GraphMap {
        <<type_alias>>
        +PackageGraph_cache
    }

    class PackageGraph {
        <<in_memory_graph>>
        +node_indices() Iterator_NodeIndex_
        +node_weight(index) Option_graph_Node_
    }

    class Graph_Node {
        <<enum_like>>
        +PackageNode
        +ExternalNode
        +OtherNode
        +sbom_id Uuid
        +node_id String
        +name String
    }

    class PackageNode {
        +purl Vec_Purl_
        +cpe Vec_Cpe_
        +version String
    }

    class ExternalNode {
        +external_document_reference String
        +external_node_id String
    }

    class Collector {
        +new(graph_cache, graphs, graph, node_index, direction, depth, relationships, connection, concurrency, loader) Collector
        +collect() Future_Result_Vec_Node__Error__
    }

    class GraphLoader {
        +new(analysis_service) GraphLoader
        +load_external_sbom(sbom_id, connection) Future_Result_PackageGraph_Error_
    }

    class DatabaseConnection {
        <<trait_ConnectionTrait>>
        +execute(statement) Result_
        +query(statement) Result_Rows_
    }

    class Paginated {
        +page usize
        +page_size usize
        +paginate_array(items) PaginatedResults_Node_
    }

    class PaginatedResults_Node_ {
        +items Vec_Node_
        +total usize
    }

    class QueryOptions {
        +ancestors usize
        +descendants usize
        +relationships RelationshipFilter
    }

    class GraphQuery {
        <<enum_like>>
        +ComponentId
        +ComponentName
        +ComponentPurl
        +ComponentCpe
        +QuerySearch
    }

    class ComponentReference {
        <<enum_like>>
        +Id
        +Name
        +Purl
        +Cpe
    }

    class Direction {
        <<enum_like>>
        +Incoming
        +Outgoing
    }

    class Node {
        +base Graph_Node
        +relationship Option_RelationshipType_
        +ancestors Vec_Node_
        +descendants Vec_Node_
    }

    class RelationshipType {
        <<enum_like>>
        +describes
        +generates
        +package
        +variant
        +dependency
    }

    ApiLatestComponentEndpoint --> AnalysisService : uses
    AnalysisService --> AnalysisInner : has
    AnalysisService --> GraphMap : uses
    AnalysisInner --> GraphMap : uses
    AnalysisInner --> DatabaseConnection : uses
    AnalysisService --> DatabaseConnection : uses
    AnalysisService --> PackageGraph : reads
    PackageGraph --> Graph_Node : contains
    Graph_Node --> PackageNode : variant
    Graph_Node --> ExternalNode : variant
    AnalysisService --> GraphLoader : uses
    GraphLoader --> DatabaseConnection : uses
    AnalysisService --> Collector : creates
    Collector --> PackageGraph : traverses
    Collector --> GraphMap : uses
    Collector --> GraphLoader : uses
    Collector --> Direction : configures
    Collector --> Node : builds
    AnalysisService --> GraphQuery : interprets
    GraphQuery --> ComponentReference : contains
    AnalysisService --> QueryOptions : uses
    AnalysisService --> Paginated : uses
    Paginated --> PaginatedResults_Node_ : creates
    PaginatedResults_Node_ --> Node : contains
    Node --> RelationshipType : uses
    Node --> Graph_Node : wraps
Loading

File-Level Changes

Change Details Files
Document current behaviour and architecture of the 'latest' analysis endpoint as an ADR.
  • Describe end-to-end request flow from REST /analysis/latest/component endpoint through AnalysisService into database and in-memory graph operations.
  • Explain how latest SBOM IDs are derived using SeaORM ranking subqueries and a wrapper query_all helper to select distinct rank=1 sbom_id values.
  • Detail how SBOMs (including externally linked SBOMs) are loaded and transformed into PackageGraph instances for in-memory querying.
  • Document the run_graph_query and collect_graph logic, including GraphQuery translation, node filtering, ancestor/descendant collection, and pagination of results.
  • Provide concrete example queries for CPE-based and name/purl-based latest component searches, including sample JSON responses and how CPE roots and product relationships are inferred.
  • Capture retroactive requirements, proposed deprecation and replacement of the generic /latest endpoint with more specific endpoints, test corpus expansion, and alternative design considerations.
docs/adrs/00011-latest-implementation.md

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes - here's some feedback:

  • There are multiple typos and minor formatting issues in the ADR (e.g., 'comprehansive', 'hiearchichal', 'unreleastic', 'Mmore', 'constrainted', inconsistent 'descendents' vs 'descendants', stray ): after the curl sentence, missing space in curl --compressed"...", and retrive_latest vs retrieve_latest) that should be cleaned up to keep the document clear and professional.
  • The ADR is very long and dense; consider adding a brief high-level summary section near the top (separate from Context) that lists the key behaviors and data flows of the 'latest' endpoint in a few bullet points to make it easier for newcomers to grasp before diving into the detailed walkthrough.
  • Where you reference specific code locations (e.g., modules/analysis/src/service/mod.rs#186), it might be clearer to either link to the exact symbols (function names) or quote the function signatures only once and then refer to them by name afterward, to reduce repetition and improve readability.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- There are multiple typos and minor formatting issues in the ADR (e.g., 'comprehansive', 'hiearchichal', 'unreleastic', 'Mmore', 'constrainted', inconsistent 'descendents' vs 'descendants', stray `):` after the curl sentence, missing space in `curl --compressed"..."`, and `retrive_latest` vs `retrieve_latest`) that should be cleaned up to keep the document clear and professional.
- The ADR is very long and dense; consider adding a brief high-level summary section near the top (separate from Context) that lists the key behaviors and data flows of the 'latest' endpoint in a few bullet points to make it easier for newcomers to grasp before diving into the detailed walkthrough.
- Where you reference specific code locations (e.g., `modules/analysis/src/service/mod.rs#186`), it might be clearer to either link to the exact symbols (function names) or quote the function signatures only once and then refer to them by name afterward, to reduce repetition and improve readability.

## Individual Comments

### Comment 1
<location> `docs/adrs/00011-latest-implementation.md:11` </location>
<code_context>
+
+## Context
+
+This is retroactive and comprehansive ADR describing current process for querying Trustify in memory graph for 'latest' 
+components to aid any poor souls who wander into this part of the codebase.
+
</code_context>

<issue_to_address>
**issue (typo):** Fix typo in "comprehansive".

```suggestion
This is retroactive and comprehensive ADR describing current process for querying Trustify in memory graph for 'latest' 
```
</issue_to_address>

### Comment 2
<location> `docs/adrs/00011-latest-implementation.md:39-40` </location>
<code_context>
+At first glance - this diagram does not reveal the complexity of the dance between the database (which contains all sbom data) and
+and in memory graph collection. 
+
+Relational databases representing hiearchichal relationships blow up under 'join explosion' making dynamic, direct 
+queries, against the database, for these relationships unreleastic. A more natural internal representation, an in-memory graph, means such queries are simplified (eg. find root, link external sboms, return ancestors/descendants) 
+and perform much better.
</code_context>

<issue_to_address>
**issue (typo):** Correct spelling of "hierarchical" and "unrealistic".

These misspellings occur in the sentence about relational databases and join explosion—please update both in that line/paragraph.

```suggestion
Relational databases representing hierarchical relationships blow up under 'join explosion' making dynamic, direct 
queries, against the database, for these relationships unrealistic. A more natural internal representation, an in-memory graph, means such queries are simplified (eg. find root, link external sboms, return ancestors/descendants) 
```
</issue_to_address>

### Comment 3
<location> `docs/adrs/00011-latest-implementation.md:49` </location>
<code_context>
+current _latest_ products. 
+
+That is we want to perform queries on the latest version of a product (which are often represented across a set of 
+sboms) - this much smaller subset then 'all the sboms' related to a product over time contributes to making
+graph queries faster to resolve.
+
</code_context>

<issue_to_address>
**issue (typo):** Use "than" instead of "then" in this comparison.

Because this is a comparison ("smaller subset … than all the sboms"), "than" is the correct word here.

```suggestion
sboms) - this much smaller subset than 'all the sboms' related to a product over time contributes to making
```
</issue_to_address>

### Comment 4
<location> `docs/adrs/00011-latest-implementation.md:421` </location>
<code_context>
+    }
+```
+
+This function's main responsibility is to collect matched nodes and depending on if requested will resolve the ancestor and descendents of each node.
+
+The `collect_graph` invoke performs the graph query itself (`modules/analysis/src/service/mod.rs#439`):
</code_context>

<issue_to_address>
**issue (typo):** Fix spelling of "descendants".

Use the spelling "descendants" here to match the rest of the document.

```suggestion
This function's main responsibility is to collect matched nodes and depending on if requested will resolve the ancestor and descendants of each node.
```
</issue_to_address>

### Comment 5
<location> `docs/adrs/00011-latest-implementation.md:1020-1022` </location>
<code_context>
+  "total": 1
+}
+```
+This listing has been summarised for brevity - The latest version is chosen of CPE and all 
+descendents returned from top level matched CPEs. In this example there is only a single SBOM component
+matching
+
</code_context>

<issue_to_address>
**issue (typo):** Consistent spelling of "descendants".

In the phrase “descendents returned from top level matched CPEs”, please change “descendents” to the correctly spelled “descendants” to match the rest of the document.

```suggestion
This listing has been summarised for brevity - The latest version is chosen of CPE and all 
descendants returned from top level matched CPEs. In this example there is only a single SBOM component
matching
```
</issue_to_address>

### Comment 6
<location> `docs/adrs/00011-latest-implementation.md:1027-1028` </location>
<code_context>
+
+Returning all 'latest' matched components (by name or purl) allows us to traverse to their root which defines the CPE, eg.
+which (released, unreleased) Products are affected by any single component. Performing a single query that returns
+hits across all possible products is much more efficient then having to run a loop of queries by **n** CPEs.
+
+Using current endpoint:
</code_context>

<issue_to_address>
**issue (typo):** Use "than" instead of "then" in this efficiency comparison.

This is another comparison, so "then" should be "than" here.

```suggestion
which (released, unreleased) Products are affected by any single component. Performing a single query that returns
hits across all possible products is much more efficient than having to run a loop of queries by **n** CPEs.
```
</issue_to_address>

### Comment 7
<location> `docs/adrs/00011-latest-implementation.md:532-534` </location>
<code_context>
+##### query by specific CPE(s) and return all matched 'latest' components
+
+Returning all latest components (with full descendant tree underneath) of a set of CPEs means we can build up a materialised
+view of a single (released) Product. That enables generation of a final release artifact of a aggregate SBOM
+representing a single Product.
+
</code_context>

<issue_to_address>
**nitpick (typo):** Use "an aggregate SBOM" instead of "a aggregate SBOM".

This should be "an aggregate SBOM" ("an" before a vowel sound) instead of "a aggregate SBOM".

```suggestion
Returning all latest components (with full descendant tree underneath) of a set of CPEs means we can build up a materialised
view of a single (released) Product. That enables generation of a final release artifact of an aggregate SBOM
representing a single Product.
```
</issue_to_address>

### Comment 8
<location> `docs/adrs/00011-latest-implementation.md:1623` </location>
<code_context>
+forest of SBOM graphs - but external linking seemed an important boundary to respect. Also this might dramatically increase
+each graph beyond reasonable limits forcing us to use a graph store.
+
+We could have used a Mmore formal graph store, which would imply usage of common graph query language - which is better then compositing up 
+a bunch of specific code primitives for querying. But our need for graph queries are limited to a constrainted set
+of queries (get root, get desc, get ancestors).
</code_context>

<issue_to_address>
**issue (typo):** Fix "Mmore" and consider "than" instead of "then".

"Mmore" should be "more", and "better then compositing up" should use "than" for the comparison.

```suggestion
We could have used a more formal graph store, which would imply usage of common graph query language - which is better than compositing up 
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@JimFuller-RedHat JimFuller-RedHat force-pushed the latest-adr branch 3 times, most recently from 4ae3320 to cb6c92a Compare December 2, 2025 14:42
@codecov
Copy link

codecov bot commented Dec 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.17%. Comparing base (8338c24) to head (f2c0273).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2159      +/-   ##
==========================================
- Coverage   68.17%   68.17%   -0.01%     
==========================================
  Files         375      375              
  Lines       21048    21048              
  Branches    21048    21048              
==========================================
- Hits        14350    14349       -1     
+ Misses       5836     5833       -3     
- Partials      862      866       +4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PhilipCattanach
Copy link

@JimFuller-RedHat - Thanks for pulling together such a detailed document. I thought that these APIs were complex, but the actual level of complexity has shocked me.

The test data corpus is a great idea. And I 100% agree we need a separate collection of data for each ecosystem.
Ultimately I would like a collection of tests that prove the APIs work.

What would be extremely helpful would be if you could elaborate on this statement in the Decision section

Deprecate /latest/ endpoint and replace with 2-3 specific endpoints matching requirements using existing underlying implementation. Any refactor of the implementation could remove unneeded bits reducing overall 'footprint'.

So defining the new endpoints, their purpose, and the usage of /latest/ that they replace.

Finally, and not in scope for this document, we really need to accelerate the purging of stale data from the Atlas environments. We have reliable Delete endpoints now, but we really need an automated purge based upon retention criteria. WIP on that currently.

@JimFuller-RedHat
Copy link
Contributor Author

@JimFuller-RedHat - Thanks for pulling together such a detailed document. I thought that these APIs were complex, but the actual level of complexity has shocked me.

The test data corpus is a great idea. And I 100% agree we need a separate collection of data for each ecosystem. Ultimately I would like a collection of tests that prove the APIs work.

What would be extremely helpful would be if you could elaborate on this statement in the Decision section

Deprecate /latest/ endpoint and replace with 2-3 specific endpoints matching requirements using existing underlying implementation. Any refactor of the implementation could remove unneeded bits reducing overall 'footprint'.

So defining the new endpoints, their purpose, and the usage of /latest/ that they replace.

the three examples included in query section is what these 1-3 new endpoints would need to do - we would remove all other endpoint functionality - I intentionally left definition of those 3 endpoints out of this document - I do not want to impose this design here and I would propose @ctron does a first pass implementation/definition of them.

As for their purpose - the query section once again describes why (I hope).

Finally, and not in scope for this document, we really need to accelerate the purging of stale data from the Atlas environments. We have reliable Delete endpoints now, but we really need an automated purge based upon retention criteria. WIP on that currently.

that seems very important (and unrelated to this ADR).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants