Feature/updateinfo v2 implementation #67

rockythorn · 2025-11-08T00:19:32Z

Add v2 updateinfo API endpoint with product slug support

Summary

This PR introduces a new v2 updateinfo API endpoint that supports product slugs and aggregates advisories across minor versions. It also fixes a critical bug in module package source RPM matching that was causing module packages to be excluded from updateinfo.xml generation.

Changes

New v2 API Endpoint:

Added /api/updateinfo/{product}/{major_version}/{repo}/updateinfo.xml endpoint
Supports product slugs (e.g., "rocky-linux", "rocky-linux-sig-cloud") instead of exact product names
Aggregates all advisories for a major version across minor versions
Requires architecture parameter to prevent cross-architecture package contamination
Includes data integrity validation to prevent cross-product package contamination

Bug Fix:

Fixed module package source RPM matching bug where packages with "module." prefix were not matching their binary packages
Module source packages have format "module.package-name" while binary packages use "package-name"
Now correctly strips "module." prefix during matching to ensure module packages appear in updateinfo.xml

Code Refactoring:

Extracted source RPM mapping logic into reusable build_source_rpm_mapping() function
Created generate_updateinfo_xml() function to share XML generation logic between v1 and v2 endpoints
Added resolve_product_slug() for product name resolution

Testing:

Added comprehensive unit tests for product slug resolution
Tests cover case insensitivity, invalid inputs, and slug format validation

Technical Details

The v2 endpoint uses explicit supported_product_id filtering at multiple levels to ensure data integrity:

Filters affected_products by supported_product_id
Double-checks advisory packages match the same supported_product_id
Logs and skips packages with mismatched product IDs to prevent cross-contamination

This is critical for multi-tenant environments where a single database contains advisories for multiple products.

This commit enhances the generate_rocky_config.py script with two key improvements: 1. Flexible version matching for RHEL 8/9/10+ compatibility: - Major-only filtering (e.g., --version 9): Matches any minor version within that major version (9.0, 9.1, 9.2, 9.6, etc.) - Full version filtering (e.g., --version 9.6): Requires exact match to the specified major.minor version This addresses differences in Red Hat's advisory format across RHEL versions: - RHEL 8 & 9: Advisories typically don't include minor versions - RHEL 10+: Advisories now include minor versions (e.g., "RHEL 10.2") The flexible matching ensures that repository configurations can be generated with appropriate version matching rules (NULL match_minor_version for RHEL 8/9, specific match_minor_version for RHEL 10+). 2. Custom mirror naming with --mirror-name-base option: - Allows specifying a custom base name for generated mirror configurations - Example: --mirror-name-base "Rocky Linux 9" generates "Rocky Linux 9 x86_64" instead of "Rocky Linux 9.6 x86_64" - Useful for creating legacy product entries or custom naming schemes - Works in combination with --name-suffix for additional flexibility These changes improve Apollo's ability to generate configurations that align with Red Hat's advisory matching requirements across different major versions.

- Remove redundant None and empty string checks in mirror name building - Consolidate version filtering logic into single condition block - Eliminate unnecessary ternary operator in version parsing

Any advisory that addresses at least one CVE should be considered a Security Advisory and should returned by the OSV api. Instead of filtering strictly on the advisory "kind" (eg- Security, Bug Fix, Enhancement) we should instead filter based on if there are associated CVEs for the given advisory.

Remove self-explanatory comments that restate what the code does: - Removed obvious filter condition comments - Removed type conversion comment - Removed severity calculation comment

This commit refactors the Red Hat CSAF parser to fix two major issues: 1. Modular Package Extraction Bug - Old code failed to extract modular packages due to ::module:stream suffix - New code extracts NEVRA directly from product_tree product_id field - Strips ::module:stream suffix while preserving full NEVRA with epoch - Fixes 12+ affected advisories (e.g., RHSA-2025:12008 for redis:7) 2. EUS Advisory Filtering - Detects EUS/E4S/AUS/TUS products via CPE and product name - Filters out EUS-only advisories during ingestion - Reduces processed advisories by ~50% - Skips advisories where all products are EUS-related Changes: - apollo/rhcsaf/__init__.py: - Added _is_eus_product() helper for EUS detection - Added _extract_packages_from_product_tree() for product_tree parsing - Updated extract_rhel_affected_products_for_db() to filter EUS products - Updated red_hat_advisory_scraper() to use new extraction and skip EUS-only - apollo/tests/test_rhcsaf.py: - Updated test data to include product_version entries - Added TestEUSDetection class (3 tests) - Added TestModularPackages class (1 test) - Added TestEUSAdvisoryFiltering class (1 test) Validation: - Standalone testing in temp/modular_package_fix/ confirmed: - 18 modular packages extracted (was 0) - Regular packages work identically (no regression) - EUS advisories correctly filtered - All data fields preserved (CVEs, Bugzillas, metadata)

The previous code incorrectly let releases.csv overwrite changes.csv timestamps. This caused the workflow to miss advisory updates, as changes.csv contains the most recent modification times while releases.csv contains original publication dates. With this fix, when Red Hat updates advisories (like the mass update on 2025-11-07), the workflow will correctly detect and reprocess them. Changes: - Reversed merge order: {**releases, **changes} so changes.csv takes precedence - Updated comment to clarify the intended behavior - Ensures updated advisories are reprocessed to catch corrections/additions

Add admin interface to view and update the last_indexed_at timestamp that controls which CSAF advisories are processed by the Poll RHCSAF workflow. Changes: - Add DatabaseService methods for getting and updating last_indexed_at - Add admin route handlers for timestamp management - Add UI section with date picker and automatic ISO 8601 conversion - Remove duplicate timestamp display from Poll RHCSAF section - Fix preview results text readability - Add comprehensive unit tests for DatabaseService - Update BUILD.bazel and CI workflow to include new tests

This commit fixes multiple issues in test_csaf_processing.py that caused CI failures: 1. Missing unittest.main() call - Added 'if __name__ == "__main__": unittest.main()' block - Without this, Bazel's py_test runs the file as a script but never executes the tests, causing false positives - pytest doesn't need this (auto-discovers tests), but Bazel does 2. Fixed async test lifecycle methods - Changed 'async def tearDown' to 'async def asyncTearDown' - Removed incorrect @classmethod decorators from asyncSetUp/asyncTearDown - These must be instance methods in unittest.IsolatedAsyncioTestCase - Consolidated setUp logic into asyncSetUp - Added close_test_db() call to asyncTearDown for proper cleanup 3. Updated test CSAF data structure - Added product_version entries in product_tree (required by refactored parser) - Changed from EUS to MAIN product variant (EUS products are filtered out) - Added proper product_id, purl, and CPE format - The refactored CSAF parser (commit ccb297e) extracts packages from product_tree instead of vulnerabilities.product_status.fixed 4. Fixed test assertions - Changed minor_version expectation from 4 to None (CPE has no minor version) - Fixed test_no_fixed_packages to remove product_tree entries instead of just clearing the fixed array Root cause analysis: - Bazel tests were never actually running (missing unittest.main()) - GitHub Actions tests were running via pytest in Integration Tests step - pytest auto-discovers unittest tests without needing __main__ block - This is why CI showed failures while local Bazel tests appeared to pass All tests now pass in both Bazel and pytest environments.

Extracted magic constants from _is_eus_product() function to improve maintainability and readability: - EUS_CPE_PRODUCTS: CPE product identifiers for EUS variants - EUS_PRODUCT_NAME_KEYWORDS: Keywords for identifying EUS products Using frozenset for better performance on membership checks.

- Move product_name and cpe declarations closer to usage - Simplify modular package NEVRA extraction using split directly - Remove redundant nevra variable and empty string check

Replace explicit length comparison with truthiness check for red_hat_affected_products set.

Removed comments that simply restated what the code clearly does. Kept only comments that provide non-obvious context such as: - CPE format examples - Product ID format variations - Business logic explanations

- Remove redundant str() calls in f-strings - Use 'raise ... from e' to preserve exception chain

Converted nested helper functions to standalone pure functions: - _traverse_for_eus: Now takes and returns product_eus_map explicitly - _extract_packages_from_branches: Now takes and returns packages explicitly This makes the code more testable, readable, and eliminates hidden state mutations from closure variables.

Check if advisory only affects EUS products immediately after verifying vulnerabilities exist, before extracting packages, CVEs, and other data. This saves processing time for advisories that will be skipped anyway. Also cleaned up redundant product_full_name variable.

This commit addresses two validation issues that prevented importing configuration files exported from production: 1. Export serializer converting version numbers to floats: - The _json_serializer in admin_supported_products.py was converting all Decimal types to float, including version numbers - Version numbers (match_major_version, match_minor_version) should be integers, not floats - Updated serializer to check if Decimal is a whole number and convert to int, preserving proper type semantics 2. Name validation rejecting parentheses: - Production database contains legacy products with names like "Rocky Linux 8.5 x86_64 (Legacy)" - Validation pattern only allowed: letters, numbers, spaces, dots, hyphens, and underscores - Updated NAME_PATTERN to allow parentheses for legacy product naming - Updated error message to reflect allowed characters These changes ensure that configurations exported from production can be successfully imported into development environments without manual data cleanup.

Update test expectations to match the new behavior where whole number Decimal values are serialized as integers instead of floats. This aligns with the change to _json_serializer that preserves integer types for version numbers and other integer values.

Add boolean active field to supported_products_rh_mirrors table to allow disabling mirrors without deleting them. This preserves historical data and mirror relationships while preventing the mirror from being used in new advisory processing. Changes: - Add active column with default true to supported_products_rh_mirrors - Add database index on active field for query performance - Add migration script for schema change - Update DB model with active field - Add active field to admin UI forms (create and edit) - Update mirror filtering in workflow service to respect active flag - Update configuration import/export to handle active field - Add active field validation in form processing

The active checkbox wasn't saving properly when unchecked because HTML forms don't send unchecked checkbox values. This caused the field to always default to "true" in the backend. Added hidden input with value "false" before each checkbox, so the form always sends a value. Backend now parses all "active" values and takes the last one (which will be "true" if checked, "false" if unchecked). Changes: - Add hidden input to mirror edit and new templates - Update both POST endpoints to manually parse form data for active field - Remove default="true" from Form parameters that was masking the issue

Implemented multi-level sorting and visual status indicators for mirrors in the admin UI to improve usability and organization. Changes: - Sort mirrors by active status (active first), then major version (desc), then name (asc) for logical grouping - Add Status column with green "Active" and gray "Inactive" tags for clear visual differentiation - Update validation to allow parentheses in mirror names for descriptive naming like "Rocky Linux 9 (BaseOS)" - Fetch mirrors with explicit ordering in backend instead of relying on database insertion order

The RHMatcherWorkflow was processing all mirrors regardless of their active status, causing unnecessary fetches from mirrors that should be skipped. This adds a check to skip mirrors where active=False in the match_rh_repos activity.

The block_remaining_rh_advisories function had a nested loop bug where it would iterate over all mirrors from a prefetch, then inside that loop query for active mirrors and iterate over them again. This caused: 1. Redundant database queries (N queries for N total mirrors) 2. Processing each active mirror N times instead of once 3. Variable shadowing with the reused 'mirror' variable name Simplified to a single query for active mirrors and one processing loop.

Removed unnecessary hidden input fields and simplified the form parsing logic for the active checkbox in mirror creation and editing forms. Changes: - Replaced complex list indexing with simple membership check - Removed hidden input fields from both Jinja templates - Updated comments to reflect simpler approach The functionality remains identical, but the code is more readable and maintainable.

Add comprehensive tests for the simplified checkbox parsing logic and active field functionality: - Checkbox parsing for checked/unchecked/missing states - Active field in configuration export (true/false cases) - Active field in configuration import validation - Backwards compatibility for imports without active field All tests pass successfully.

Both admin_supported_product_mirror_repomd_new_post and admin_supported_product_mirror_repomd_post had identical code for building form_data and calling validation. Extracted this into _validate_repomd_form helper that returns validated_data, errors, and the original form_data for use in error templates. This eliminates 14 lines of duplication across the two functions.

Implements a new updateinfo.xml API endpoint that supports: - Product slug-based routing (rocky-linux, rocky-linux-sig-cloud) - Major version aggregation across all minor versions - Required architecture filtering to prevent cross-contamination - Data integrity validation with supported_product_id checks - Shared XML generation function for code reuse Key improvements over v1: - Prevents cross-product package contamination via explicit FK filtering - Cleaner URL structure using product slugs - Better error handling and validation - Comprehensive unit test coverage Changes: - apollo/server/routes/api_updateinfo.py: Add v2 endpoint and shared XML generation - apollo/tests/test_api_updateinfo.py: Add unit tests with mock database - apollo/tests/BUILD.bazel: Register new test target

Extract duplicated source RPM mapping logic from v1 endpoint and generate_updateinfo_xml() into a shared build_source_rpm_mapping() function. This refactoring: - Eliminates code duplication between v1 and v2 endpoints - Centralizes the logic for finding source RPMs for packages - Includes fix for module packages where package_name has 'module.' prefix - Makes future improvements easier to apply consistently The fix handles the case where: - Binary packages have package_name = 'delve' - Source packages have package_name = 'module.delve' - Strip 'module.' prefix for proper matching

Module packages were not appearing in updateinfo.xml due to a bug in the source RPM mapping logic. The issue was that binary and source packages were being grouped into different dictionary keys: - Binary packages: package_name='delve', key='go-toolset:delve:rhel8' - Source packages: package_name='module.delve', key='go-toolset:module.delve:rhel8' Because the keys didn't match, the source RPM lookup failed, causing all packages in the advisory to be filtered out, and the entire advisory to be removed from updateinfo.xml. Fix: Strip the 'module.' prefix from package_name when building dictionary keys, so both binary and source packages map to the same key. This allows the source RPM matching to work correctly. Impact: Fixes 181+ advisories with module packages (go-toolset, rust-toolset, etc.) that were previously missing from updateinfo.

rockythorn added 30 commits November 12, 2025 14:51

Simplify conditional logic in generate_rocky_config.py

95319c3

- Remove redundant None and empty string checks in mirror name building - Consolidate version filtering logic into single condition block - Eliminate unnecessary ternary operator in version parsing

Remove redundant comments from OSV API

531bbd0

Remove self-explanatory comments that restate what the code does: - Removed obvious filter condition comments - Removed type conversion comment - Removed severity calculation comment

Simplify OSV API advisory filtering logic

f1d918b

Simplify package extraction logic

12695ae

- Move product_name and cpe declarations closer to usage - Simplify modular package NEVRA extraction using split directly - Remove redundant nevra variable and empty string check

Use Pythonic empty set check

43ee733

Replace explicit length comparison with truthiness check for red_hat_affected_products set.

Remove redundant comments from CSAF processing code

6b997e5

Removed comments that simply restated what the code clearly does. Kept only comments that provide non-obvious context such as: - CPE format examples - Product ID format variations - Business logic explanations

Improve exception handling in database_service

c87bb75

- Remove redundant str() calls in f-strings - Use 'raise ... from e' to preserve exception chain

Remove unnecessary comments

6c7eb9e

Remove redundant comments from validation module

98d9112

Skip inactive mirrors in RHMatcherWorkflow

bd667c9

The RHMatcherWorkflow was processing all mirrors regardless of their active status, causing unnecessary fetches from mirrors that should be skipped. This adds a check to skip mirrors where active=False in the match_rh_repos activity.

Remove redundant comments from mirror active field code

c2d0f59

Remove unnecessary comments

6a47716

rockythorn added 3 commits November 13, 2025 18:12

rockythorn force-pushed the feature/updateinfo-v2-implementation branch from 002cf2b to 09ffcd3 Compare November 14, 2025 01:12

rockythorn closed this Nov 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature/updateinfo v2 implementation #67

Feature/updateinfo v2 implementation #67

Uh oh!

rockythorn commented Nov 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Feature/updateinfo v2 implementation #67

Feature/updateinfo v2 implementation #67

Uh oh!

Conversation

rockythorn commented Nov 8, 2025

Add v2 updateinfo API endpoint with product slug support

Summary

Changes

Technical Details

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant