fix(migration): add missing FK indexes for query performance by imurphy-rh · Pull Request #2276 · guacsec/trustify

imurphy-rh · 2026-03-06T14:11:39Z

Summary

Adds 6 missing indexes on foreign key columns that cause sequential scans on cold tables
Most impactful for the analysis service's SBOM graph loading path (GET /api/v2/analysis/component/{key})
Follows the same pattern as m0001200 and m0001110

Problem

Several FK columns lack indexes. When tables are cold (not in PostgreSQL's buffer cache), queries that JOIN on these columns trigger full sequential scans. The analysis service's get_nodes() query (modules/analysis/src/service/load/mod.rs:138) is the worst offender — it runs for every SBOM graph load and includes:

LEFT JOIN product_version ON sbom.sbom_id = product_version.sbom_id  -- no index
LEFT JOIN product ON product_version.product_id = product.id          -- no index

Indexes Added

Index	Column(s)	Query Path
`product_version_sbom_id_idx`	`product_version(sbom_id)`	Analysis graph load — every `get_nodes()` call
`product_version_product_id_idx`	`product_version(product_id)`	Analysis graph load — product JOIN
`package_relates_to_package_sbom_rel_idx`	`(sbom_id, relationship)`	CPE context filter in `product_advisory_info_sql()`
`purl_status_version_range_id_idx`	`purl_status(version_range_id)`	Vulnerability analysis JOINs
`cpe_vendor_product_version_idx`	`cpe(vendor, product, version)`	Generalized CPE tuple lookup
`advisory_issuer_id_idx`	`advisory(issuer_id)`	Advisory listing with organization JOIN

Why `package_relates_to_package` needs a separate index

The PK is (sbom_id, left_node_id, relationship, right_node_id). Queries filter WHERE sbom_id = $1 AND relationship = 13, but left_node_id sits between sbom_id and relationship in the composite key, so PostgreSQL can only use the leading sbom_id column and must scan all left_node_id values to find matching relationships.

Test plan

Migration applies cleanly (cargo run -p trustify-migration -- up)
Migration rolls back cleanly (cargo run -p trustify-migration -- down)
EXPLAIN ANALYZE on the get_nodes() query shows Index Scan on product_version instead of Seq Scan
Existing tests pass (cargo test --all-features)

🤖 Generated with Claude Code

Summary by Sourcery

Enhancements:

Introduce migration m0002100 to add targeted indexes on frequently joined foreign key columns and CPE tuple fields to reduce sequential scans and speed up analysis-related queries.

Several foreign key columns lack indexes, causing sequential scans on cold tables. The most impactful gap is product_version.sbom_id, which is joined on every SBOM graph load in the analysis service (the get_nodes() query in modules/analysis/src/service/load/mod.rs:188). Indexes added: - product_version(sbom_id) — eliminates seq scan on every graph load - product_version(product_id) — completes the product JOIN chain - package_relates_to_package(sbom_id, relationship) — the existing PK (sbom_id, left_node_id, relationship, right_node_id) defeats queries filtering on sbom_id + relationship because left_node_id sits between them in the composite key - purl_status(version_range_id) — used in vulnerability analysis JOINs - cpe(vendor, product, version) — used in generalized CPE tuple lookup within product_advisory_info_sql() - advisory(issuer_id) — used in advisory listing LEFT JOIN to organization Follows the same pattern as m0001200_source_document_fk_indexes and m0001110_sbom_node_checksum_indexes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

sourcery-ai · 2026-03-06T14:11:47Z

Reviewer's Guide

Adds a new migration m0002100 to create six performance-focused indexes on frequently joined foreign key and composite key columns, and wires the migration into the global migrator so it runs in sequence with prior migrations.

ER diagram for new FK index coverage on core tables

erDiagram
    Sbom {
        bigint id
    }

    Product {
        bigint id
    }

    ProductVersion {
        bigint sbom_id
        bigint product_id
    }

    PackageRelatesToPackage {
        bigint sbom_id
        int relationship
    }

    VersionRange {
        bigint id
    }

    PurlStatus {
        bigint version_range_id
    }

    Cpe {
        text vendor
        text product
        text version
    }

    Organization {
        bigint id
    }

    Advisory {
        bigint issuer_id
    }

    ProductVersion }o--|| Sbom : sbom_id_fk
    ProductVersion }o--|| Product : product_id_fk

    PackageRelatesToPackage }o--|| Sbom : sbom_id_fk

    PurlStatus }o--|| VersionRange : version_range_id_fk

    Advisory }o--|| Organization : issuer_id_fk

Class diagram for migration m0002100_perf_fk_indexes

classDiagram
    class Migration {
        +up(manager SchemaManager) Result
        +down(manager SchemaManager) Result
    }

    class SchemaManager
    class DbErr
    class Index

    class Indexes {
        <<enumeration>>
        ProductVersionSbomIdIdx
        ProductVersionProductIdIdx
        PackageRelatesToPackageSbomRelIdx
        PurlStatusVersionRangeIdIdx
        CpeVendorProductVersionIdx
        AdvisoryIssuerIdIdx
    }

    class ProductVersion {
        <<enumeration>>
        Table
        SbomId
        ProductId
    }

    class PackageRelatesToPackage {
        <<enumeration>>
        Table
        SbomId
        Relationship
    }

    class PurlStatus {
        <<enumeration>>
        Table
        VersionRangeId
    }

    class Cpe {
        <<enumeration>>
        Table
        Vendor
        Product
        Version
    }

    class Advisory {
        <<enumeration>>
        Table
        IssuerId
    }

    Migration ..> SchemaManager : uses
    Migration ..> DbErr : returns
    Migration ..> Index : creates_drops
    Migration ..> Indexes : names_indexes
    Migration ..> ProductVersion : refs_columns
    Migration ..> PackageRelatesToPackage : refs_columns
    Migration ..> PurlStatus : refs_columns
    Migration ..> Cpe : refs_columns
    Migration ..> Advisory : refs_columns

File-Level Changes

Change	Details	Files
Add new migration m0002100_perf_fk_indexes to create performance indexes on key relational tables	Introduce Migration struct implementing MigrationTrait with up/down methods for index creation and cleanup Create single-column indexes on product_version.sbom_id and product_version.product_id to support analysis graph joins Create composite index on package_relates_to_package (sbom_id, relationship) to match common WHERE filters that the PK cannot satisfy efficiently Create single-column index on purl_status.version_range_id to speed up joins with version_range Create composite index on cpe (vendor, product, version) for generalized CPE tuple lookups Create single-column index on advisory.issuer_id for joins with organization Add local DeriveIden enums for each affected table and an Indexes enum holding the index identifiers	`migration/src/m0002100_perf_fk_indexes.rs`
Register the new migration in the migrator pipeline so it is executed with other normal migrations	Declare m0002100_perf_fk_indexes module in the migration library Append m0002100_perf_fk_indexes::Migration to the MigratorExt::build_migrations() chain as a normal migration	`migration/src/lib.rs`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 1 issue, and left some high level feedback:

Using Indexes::...Idx.to_string() will produce CamelCase index names based on the enum variant; if your schema conventions expect snake_case names, consider implementing Display for Indexes or using with_name helpers to control the actual index name strings.
For the larger tables (e.g. product_version, package_relates_to_package), consider whether you need CREATE INDEX CONCURRENTLY semantics to avoid long exclusive locks during migration, and if so whether the migration framework supports that pattern.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- Using `Indexes::...Idx.to_string()` will produce CamelCase index names based on the enum variant; if your schema conventions expect snake_case names, consider implementing `Display` for `Indexes` or using `with_name` helpers to control the actual index name strings.
- For the larger tables (e.g. `product_version`, `package_relates_to_package`), consider whether you need `CREATE INDEX CONCURRENTLY` semantics to avoid long exclusive locks during migration, and if so whether the migration framework supports that pattern.

## Individual Comments

### Comment 1
<location path="migration/src/m0002100_perf_fk_indexes.rs" line_range="15-23" />
<code_context>
+        // Without this index, every SBOM graph load triggers a sequential scan
+        // of the entire product_version table.
+        manager
+            .create_index(
+                Index::create()
+                    .if_not_exists()
+                    .table(ProductVersion::Table)
+                    .name(Indexes::ProductVersionSbomIdIdx.to_string())
+                    .col(ProductVersion::SbomId)
+                    .to_owned(),
+            )
+            .await?;
+
+        // product_version.product_id — used in analysis graph loading:
</code_context>
<issue_to_address>
**issue (performance):** Consider the impact of non-concurrent index creation on large, hot tables.

Plain `CREATE INDEX` will take an exclusive lock on the table for the duration of the build, which can be disruptive on large, hot tables like `product_version`, `package_relates_to_package`, or `cpe`. If this runs on a live system, consider using `CREATE INDEX CONCURRENTLY` (via raw SQL or a special migration) or scheduling the migration during a maintenance window to avoid impacting production traffic.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2026-03-06T14:12:35Z

migration/src/m0002100_perf_fk_indexes.rs

+            .create_index(
+                Index::create()
+                    .if_not_exists()
+                    .table(ProductVersion::Table)
+                    .name(Indexes::ProductVersionSbomIdIdx.to_string())
+                    .col(ProductVersion::SbomId)
+                    .to_owned(),
+            )
+            .await?;


issue (performance): Consider the impact of non-concurrent index creation on large, hot tables.

Plain CREATE INDEX will take an exclusive lock on the table for the duration of the build, which can be disruptive on large, hot tables like product_version, package_relates_to_package, or cpe. If this runs on a live system, consider using CREATE INDEX CONCURRENTLY (via raw SQL or a special migration) or scheduling the migration during a maintenance window to avoid impacting production traffic.

imurphy-rh · 2026-03-06T14:15:59Z

Re: the Sourcery suggestion about CREATE INDEX CONCURRENTLY —

Good point in general, but it doesn't apply here for a few reasons:

No existing migration in the project uses CONCURRENTLY — all 53 migrations use standard Index::create() or raw SQL. This PR is consistent with the established pattern.
SeaORM migrations run inside transactions — CREATE INDEX CONCURRENTLY cannot run inside a transaction (PostgreSQL docs). We'd need to override is_transactional() to return false, which no migration in this project does.
Trustify migrations run at startup (PM mode) or as an explicit trustify-migration up step — not while the service is actively handling traffic.

If the project wants to adopt concurrent index creation as a pattern, that would be a broader conversation about migration infrastructure, not specific to this PR.

codecov · 2026-03-06T14:45:34Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 68.05%. Comparing base (50b9e82) to head (6678c5a).

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2276      +/-   ##
==========================================
+ Coverage   68.03%   68.05%   +0.01%     
==========================================
  Files         423      424       +1     
  Lines       24828    24833       +5     
  Branches    24828    24833       +5     
==========================================
+ Hits        16892    16900       +8     
+ Misses       7017     7007      -10     
- Partials      919      926       +7

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

JimFuller-RedHat · 2026-03-07T10:17:27Z

cool - though as always with data at scale and indexes there are caveats:

query plans (which choose indexes) are not static across all scales of data (or in isolation to each other)- more info needed eg.

Were u able to test these indexes with representative data load and activity ? An improvement at one scale of data might change dramatically at another scale (as query plan decides to do something else). And its also runtime eg. query planner chooses based on available resources ... and might come up with a different choice if 1000 concurrent users are actively spawning queries (versus 1) ... most db query planners are more 'magical black boxes' then llms ;)

There are other subtleties - for example - many of trustify queries take advantage of pg parallel workers ... if any of these new indexes use them it may tie up that resource robbing other critical queries of them (pro/cons balance needs to be made).

A good way to prove any new index actually improves things is to setup env with right scale data load and runtime activity (ingesting data, etc) and most importantly provide EXPLAIN ANALYZE on any query this improves - to view query plan selected which shows index being used (where it was not before) and clear performance improvement.

Another useful thing we do is look at long running env and check index efficiency - with something like:

SELECT
schemaname,
relname AS table_name,
pg_size_pretty(pg_total_relation_size(relid)) AS table_size,
seq_scan AS full_scans,
idx_scan AS index_scans,
round(100.0 * idx_scan / (seq_scan + idx_scan + 1), 1) AS efficiency_pct
FROM pg_stat_user_tables
WHERE seq_scan > 100 
LIMIT 10

AI analysis is useful for identifying opportunities but often does not have the data or runtime load context to make useful decision... more like a good 'scout' for possible optimisation opportunities.

Will let the team chime in on their thoughts.

github-project-automation bot added this to Trustify Mar 6, 2026

sourcery-ai bot reviewed Mar 6, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(migration): add missing FK indexes for query performance#2276

fix(migration): add missing FK indexes for query performance#2276
imurphy-rh wants to merge 1 commit intoguacsec:mainfrom
imurphy-rh:fix/perf-fk-indexes

imurphy-rh commented Mar 6, 2026 •

edited by sourcery-ai bot

Loading

Uh oh!

sourcery-ai bot commented Mar 6, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Uh oh!

sourcery-ai bot Mar 6, 2026

Uh oh!

imurphy-rh commented Mar 6, 2026

Uh oh!

codecov bot commented Mar 6, 2026

Uh oh!

JimFuller-RedHat commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

imurphy-rh commented Mar 6, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Indexes Added

Why package_relates_to_package needs a separate index

Test plan

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Mar 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

ER diagram for new FK index coverage on core tables

Class diagram for migration m0002100_perf_fk_indexes

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Mar 6, 2026

Choose a reason for hiding this comment

Uh oh!

imurphy-rh commented Mar 6, 2026

Uh oh!

codecov bot commented Mar 6, 2026

Codecov Report

Uh oh!

JimFuller-RedHat commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

imurphy-rh commented Mar 6, 2026 •

edited by sourcery-ai bot

Loading

Why `package_relates_to_package` needs a separate index

sourcery-ai bot commented Mar 6, 2026 •

edited

Loading

JimFuller-RedHat commented Mar 7, 2026 •

edited

Loading