-
Notifications
You must be signed in to change notification settings - Fork 35
latest adr: retroactive ADR that documents latest implementation #2159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Reviewer's GuideAdds a comprehensive, retroactive Architecture Decision Record documenting the current 'latest' analysis endpoint implementation, including how SBOMs are selected and loaded into the in‑memory graph, how graph queries are executed, and what future requirements and refactors are envisioned. Sequence diagram for the latest analysis component querysequenceDiagram
actor Client
participant ApiLatestComponentEndpoint
participant AnalysisService
participant AnalysisInner
participant Database
participant GraphCache
participant PackageGraph
participant GraphLoader
participant Collector
Client->>ApiLatestComponentEndpoint: HTTP GET /api/v2/analysis/latest/component
ApiLatestComponentEndpoint->>AnalysisService: retrieve_latest(search, options, paginated, db)
AnalysisService->>AnalysisInner: load_latest_graphs_query(connection, GraphQuery)
AnalysisInner->>Database: execute ranking subquery
Database-->>AnalysisInner: latest_sbom_ids
AnalysisInner->>Database: load SBOM data for latest_sbom_ids
Database-->>AnalysisInner: SBOM records
AnalysisInner-->>AnalysisService: graphs Vec_Uuid_Arc_PackageGraph
AnalysisService->>GraphCache: get or insert PackageGraph entries
GraphCache-->>AnalysisService: cached graphs slice
AnalysisService->>AnalysisService: run_graph_query(query, options, graphs, connection, graph_cache)
loop for_each_matching_node
AnalysisService->>PackageGraph: iterate node_indices and filter
PackageGraph-->>AnalysisService: node_index, node
AnalysisService->>GraphLoader: new AnalysisService_clone
par collect_ancestors_and_descendants
AnalysisService->>Collector: new Direction_Incoming
Collector->>PackageGraph: traverse ancestors
PackageGraph-->>Collector: ancestor_nodes
Collector-->>AnalysisService: ancestors
AnalysisService->>Collector: new Direction_Outgoing
Collector->>PackageGraph: traverse descendants
PackageGraph-->>Collector: descendant_nodes
Collector-->>AnalysisService: descendants
and
end
end
AnalysisService-->>ApiLatestComponentEndpoint: PaginatedResults_Node_
ApiLatestComponentEndpoint-->>Client: HTTP 200 JSON items
Updated class diagram for the latest analysis service and graph query pipelineclassDiagram
class ApiLatestComponentEndpoint {
+search_latest_component(service, db, search, options, paginated, read_sbom) Result_Responder_
}
class AnalysisService {
+concurrency usize
+inner AnalysisInner
+retrieve_latest(query, options, paginated, connection) Result_PaginatedResults_Node__Error_
+run_graph_query(query, options, graphs, connection, graph_cache) Result_Vec_Node__Error_
+collect_graph(query, graphs, concurrency, create) Result_Vec_Node__Error_
+filter(graph, query, node_index) bool
}
class AnalysisInner {
+graph_cache GraphMap
+load_latest_graphs_query(connection, query) Result_Vec_Uuid_Arc_PackageGraph___Error_
}
class GraphMap {
<<type_alias>>
+PackageGraph_cache
}
class PackageGraph {
<<in_memory_graph>>
+node_indices() Iterator_NodeIndex_
+node_weight(index) Option_graph_Node_
}
class Graph_Node {
<<enum_like>>
+PackageNode
+ExternalNode
+OtherNode
+sbom_id Uuid
+node_id String
+name String
}
class PackageNode {
+purl Vec_Purl_
+cpe Vec_Cpe_
+version String
}
class ExternalNode {
+external_document_reference String
+external_node_id String
}
class Collector {
+new(graph_cache, graphs, graph, node_index, direction, depth, relationships, connection, concurrency, loader) Collector
+collect() Future_Result_Vec_Node__Error__
}
class GraphLoader {
+new(analysis_service) GraphLoader
+load_external_sbom(sbom_id, connection) Future_Result_PackageGraph_Error_
}
class DatabaseConnection {
<<trait_ConnectionTrait>>
+execute(statement) Result_
+query(statement) Result_Rows_
}
class Paginated {
+page usize
+page_size usize
+paginate_array(items) PaginatedResults_Node_
}
class PaginatedResults_Node_ {
+items Vec_Node_
+total usize
}
class QueryOptions {
+ancestors usize
+descendants usize
+relationships RelationshipFilter
}
class GraphQuery {
<<enum_like>>
+ComponentId
+ComponentName
+ComponentPurl
+ComponentCpe
+QuerySearch
}
class ComponentReference {
<<enum_like>>
+Id
+Name
+Purl
+Cpe
}
class Direction {
<<enum_like>>
+Incoming
+Outgoing
}
class Node {
+base Graph_Node
+relationship Option_RelationshipType_
+ancestors Vec_Node_
+descendants Vec_Node_
}
class RelationshipType {
<<enum_like>>
+describes
+generates
+package
+variant
+dependency
}
ApiLatestComponentEndpoint --> AnalysisService : uses
AnalysisService --> AnalysisInner : has
AnalysisService --> GraphMap : uses
AnalysisInner --> GraphMap : uses
AnalysisInner --> DatabaseConnection : uses
AnalysisService --> DatabaseConnection : uses
AnalysisService --> PackageGraph : reads
PackageGraph --> Graph_Node : contains
Graph_Node --> PackageNode : variant
Graph_Node --> ExternalNode : variant
AnalysisService --> GraphLoader : uses
GraphLoader --> DatabaseConnection : uses
AnalysisService --> Collector : creates
Collector --> PackageGraph : traverses
Collector --> GraphMap : uses
Collector --> GraphLoader : uses
Collector --> Direction : configures
Collector --> Node : builds
AnalysisService --> GraphQuery : interprets
GraphQuery --> ComponentReference : contains
AnalysisService --> QueryOptions : uses
AnalysisService --> Paginated : uses
Paginated --> PaginatedResults_Node_ : creates
PaginatedResults_Node_ --> Node : contains
Node --> RelationshipType : uses
Node --> Graph_Node : wraps
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey there - I've reviewed your changes - here's some feedback:
- There are multiple typos and minor formatting issues in the ADR (e.g., 'comprehansive', 'hiearchichal', 'unreleastic', 'Mmore', 'constrainted', inconsistent 'descendents' vs 'descendants', stray
):after the curl sentence, missing space incurl --compressed"...", andretrive_latestvsretrieve_latest) that should be cleaned up to keep the document clear and professional. - The ADR is very long and dense; consider adding a brief high-level summary section near the top (separate from Context) that lists the key behaviors and data flows of the 'latest' endpoint in a few bullet points to make it easier for newcomers to grasp before diving into the detailed walkthrough.
- Where you reference specific code locations (e.g.,
modules/analysis/src/service/mod.rs#186), it might be clearer to either link to the exact symbols (function names) or quote the function signatures only once and then refer to them by name afterward, to reduce repetition and improve readability.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- There are multiple typos and minor formatting issues in the ADR (e.g., 'comprehansive', 'hiearchichal', 'unreleastic', 'Mmore', 'constrainted', inconsistent 'descendents' vs 'descendants', stray `):` after the curl sentence, missing space in `curl --compressed"..."`, and `retrive_latest` vs `retrieve_latest`) that should be cleaned up to keep the document clear and professional.
- The ADR is very long and dense; consider adding a brief high-level summary section near the top (separate from Context) that lists the key behaviors and data flows of the 'latest' endpoint in a few bullet points to make it easier for newcomers to grasp before diving into the detailed walkthrough.
- Where you reference specific code locations (e.g., `modules/analysis/src/service/mod.rs#186`), it might be clearer to either link to the exact symbols (function names) or quote the function signatures only once and then refer to them by name afterward, to reduce repetition and improve readability.
## Individual Comments
### Comment 1
<location> `docs/adrs/00011-latest-implementation.md:11` </location>
<code_context>
+
+## Context
+
+This is retroactive and comprehansive ADR describing current process for querying Trustify in memory graph for 'latest'
+components to aid any poor souls who wander into this part of the codebase.
+
</code_context>
<issue_to_address>
**issue (typo):** Fix typo in "comprehansive".
```suggestion
This is retroactive and comprehensive ADR describing current process for querying Trustify in memory graph for 'latest'
```
</issue_to_address>
### Comment 2
<location> `docs/adrs/00011-latest-implementation.md:39-40` </location>
<code_context>
+At first glance - this diagram does not reveal the complexity of the dance between the database (which contains all sbom data) and
+and in memory graph collection.
+
+Relational databases representing hiearchichal relationships blow up under 'join explosion' making dynamic, direct
+queries, against the database, for these relationships unreleastic. A more natural internal representation, an in-memory graph, means such queries are simplified (eg. find root, link external sboms, return ancestors/descendants)
+and perform much better.
</code_context>
<issue_to_address>
**issue (typo):** Correct spelling of "hierarchical" and "unrealistic".
These misspellings occur in the sentence about relational databases and join explosion—please update both in that line/paragraph.
```suggestion
Relational databases representing hierarchical relationships blow up under 'join explosion' making dynamic, direct
queries, against the database, for these relationships unrealistic. A more natural internal representation, an in-memory graph, means such queries are simplified (eg. find root, link external sboms, return ancestors/descendants)
```
</issue_to_address>
### Comment 3
<location> `docs/adrs/00011-latest-implementation.md:49` </location>
<code_context>
+current _latest_ products.
+
+That is we want to perform queries on the latest version of a product (which are often represented across a set of
+sboms) - this much smaller subset then 'all the sboms' related to a product over time contributes to making
+graph queries faster to resolve.
+
</code_context>
<issue_to_address>
**issue (typo):** Use "than" instead of "then" in this comparison.
Because this is a comparison ("smaller subset … than all the sboms"), "than" is the correct word here.
```suggestion
sboms) - this much smaller subset than 'all the sboms' related to a product over time contributes to making
```
</issue_to_address>
### Comment 4
<location> `docs/adrs/00011-latest-implementation.md:421` </location>
<code_context>
+ }
+```
+
+This function's main responsibility is to collect matched nodes and depending on if requested will resolve the ancestor and descendents of each node.
+
+The `collect_graph` invoke performs the graph query itself (`modules/analysis/src/service/mod.rs#439`):
</code_context>
<issue_to_address>
**issue (typo):** Fix spelling of "descendants".
Use the spelling "descendants" here to match the rest of the document.
```suggestion
This function's main responsibility is to collect matched nodes and depending on if requested will resolve the ancestor and descendants of each node.
```
</issue_to_address>
### Comment 5
<location> `docs/adrs/00011-latest-implementation.md:1020-1022` </location>
<code_context>
+ "total": 1
+}
+```
+This listing has been summarised for brevity - The latest version is chosen of CPE and all
+descendents returned from top level matched CPEs. In this example there is only a single SBOM component
+matching
+
</code_context>
<issue_to_address>
**issue (typo):** Consistent spelling of "descendants".
In the phrase “descendents returned from top level matched CPEs”, please change “descendents” to the correctly spelled “descendants” to match the rest of the document.
```suggestion
This listing has been summarised for brevity - The latest version is chosen of CPE and all
descendants returned from top level matched CPEs. In this example there is only a single SBOM component
matching
```
</issue_to_address>
### Comment 6
<location> `docs/adrs/00011-latest-implementation.md:1027-1028` </location>
<code_context>
+
+Returning all 'latest' matched components (by name or purl) allows us to traverse to their root which defines the CPE, eg.
+which (released, unreleased) Products are affected by any single component. Performing a single query that returns
+hits across all possible products is much more efficient then having to run a loop of queries by **n** CPEs.
+
+Using current endpoint:
</code_context>
<issue_to_address>
**issue (typo):** Use "than" instead of "then" in this efficiency comparison.
This is another comparison, so "then" should be "than" here.
```suggestion
which (released, unreleased) Products are affected by any single component. Performing a single query that returns
hits across all possible products is much more efficient than having to run a loop of queries by **n** CPEs.
```
</issue_to_address>
### Comment 7
<location> `docs/adrs/00011-latest-implementation.md:532-534` </location>
<code_context>
+##### query by specific CPE(s) and return all matched 'latest' components
+
+Returning all latest components (with full descendant tree underneath) of a set of CPEs means we can build up a materialised
+view of a single (released) Product. That enables generation of a final release artifact of a aggregate SBOM
+representing a single Product.
+
</code_context>
<issue_to_address>
**nitpick (typo):** Use "an aggregate SBOM" instead of "a aggregate SBOM".
This should be "an aggregate SBOM" ("an" before a vowel sound) instead of "a aggregate SBOM".
```suggestion
Returning all latest components (with full descendant tree underneath) of a set of CPEs means we can build up a materialised
view of a single (released) Product. That enables generation of a final release artifact of an aggregate SBOM
representing a single Product.
```
</issue_to_address>
### Comment 8
<location> `docs/adrs/00011-latest-implementation.md:1623` </location>
<code_context>
+forest of SBOM graphs - but external linking seemed an important boundary to respect. Also this might dramatically increase
+each graph beyond reasonable limits forcing us to use a graph store.
+
+We could have used a Mmore formal graph store, which would imply usage of common graph query language - which is better then compositing up
+a bunch of specific code primitives for querying. But our need for graph queries are limited to a constrainted set
+of queries (get root, get desc, get ancestors).
</code_context>
<issue_to_address>
**issue (typo):** Fix "Mmore" and consider "than" instead of "then".
"Mmore" should be "more", and "better then compositing up" should use "than" for the comparison.
```suggestion
We could have used a more formal graph store, which would imply usage of common graph query language - which is better than compositing up
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
4ae3320 to
cb6c92a
Compare
cb6c92a to
f2c0273
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2159 +/- ##
==========================================
- Coverage 68.17% 68.17% -0.01%
==========================================
Files 375 375
Lines 21048 21048
Branches 21048 21048
==========================================
- Hits 14350 14349 -1
+ Misses 5836 5833 -3
- Partials 862 866 +4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@JimFuller-RedHat - Thanks for pulling together such a detailed document. I thought that these APIs were complex, but the actual level of complexity has shocked me. The test data corpus is a great idea. And I 100% agree we need a separate collection of data for each ecosystem. What would be extremely helpful would be if you could elaborate on this statement in the Decision section Deprecate /latest/ endpoint and replace with 2-3 specific endpoints matching requirements using existing underlying implementation. Any refactor of the implementation could remove unneeded bits reducing overall 'footprint'. So defining the new endpoints, their purpose, and the usage of /latest/ that they replace. Finally, and not in scope for this document, we really need to accelerate the purging of stale data from the Atlas environments. We have reliable Delete endpoints now, but we really need an automated purge based upon retention criteria. WIP on that currently. |
the three examples included in query section is what these 1-3 new endpoints would need to do - we would remove all other endpoint functionality - I intentionally left definition of those 3 endpoints out of this document - I do not want to impose this design here and I would propose @ctron does a first pass implementation/definition of them. As for their purpose - the query section once again describes why (I hope).
that seems very important (and unrelated to this ADR). |
The 'latest' endpoint never got a proper user story - this ADR attempts to document what we have - so we can align current requirements.
link to ADR
Summary by Sourcery
Document the current behaviour and architecture of the 'latest' analysis endpoint, including how SBOM data is selected, loaded into the in-memory graph, and queried, and capture retroactive decisions and future requirements for evolving this functionality.
Enhancements:
Documentation: