Skip to content

Conversation

@FabriciaDinizRH
Copy link
Contributor

@FabriciaDinizRH FabriciaDinizRH commented Dec 9, 2025

Overview

This PR is being created to address RHINENG-22633.
(A description of your PR's changes, along with why/context to the PR, goes here.)

PR Checklist

  • Keep PR title short, ideally under 72 characters
  • Descriptive comments provided in complex code blocks
  • Include raw query examples in the PR description, if adding/modifying SQL query
  • Tests: validate optimal/expected output
  • Tests: validate exceptions and failure scenarios
  • Tests: edge cases
  • Recovers or fails gracefully during potential resource outages (e.g. DB, Kafka)
  • Uses type hinting, if convenient
  • Documentation, if this PR changes the way other services interact with host inventory
  • Links to related PRs

Secure Coding Practices Documentation Reference

You can find documentation on this checklist here.

Secure Coding Checklist

  • Input Validation
  • Output Encoding
  • Authentication and Password Management
  • Session Management
  • Access Control
  • Cryptographic Practices
  • Error Handling and Logging
  • Data Protection
  • Communication Security
  • System Configuration
  • Database Security
  • File Management
  • Memory Management
  • General Coding Practices

Summary by Sourcery

Remove the denormalized canonical_facts JSON column from the Host model, switching all read/write paths, filters, events, and jobs to use first-class host columns and a helper for extracting canonical facts.

Enhancements:

  • Adjust in-memory and SQL canonical fact matching logic to operate on dedicated columns, including proper containment handling for array fields like IP and MAC addresses.
  • Introduce an extract_canonical_facts_from_host helper and use it across repository, API, job, and serialization code to build canonical fact dictionaries from column attributes.
  • Update display name fallback and related host field behavior to rely on direct fqdn and ID fields rather than the removed canonical_facts blob.

Tests:

  • Update and expand unit, integration, and job tests to construct hosts without the canonical_facts field and assert behavior against the new column-based model, including deduplication, staleness, delete flows, and messaging paths.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 9, 2025

Reviewer's Guide

Refactors the Host model and related logic to remove the JSONB canonical_facts column in favor of first‑class columns for each canonical field, updates matching/serialization/notification logic to use those columns (with a helper to reconstruct canonical fact dicts when needed), and adjusts queries, jobs, and tests accordingly, including special handling for JSONB array canonical fields.

Sequence diagram for host matching using per-column canonical facts and in-memory reconstruction

sequenceDiagram
    actor Client
    participant API as HostAPI
    participant Repo as HostRepository
    participant DB as Database
    participant Ser as Serialization

    Client->>API: POST /hosts (host payload)
    API->>Repo: add_host(identity, input_host, existing_hosts?)
    Repo->>Ser: extract_canonical_facts_from_host(input_host)
    Ser-->>Repo: canonical_facts_dict
    alt existing_hosts provided
        Repo->>Repo: find_existing_host(identity, canonical_facts_dict, existing_hosts)
        Repo-->>API: matched_host or None
    else no in-memory match
        Repo->>Repo: find_existing_host(identity, canonical_facts_dict)
        Repo->>DB: query Host using
        Repo->>DB: contains_no_incorrect_facts_filter(canonical_facts_dict)
        Repo->>DB: matches_at_least_one_canonical_fact_filter(canonical_facts_dict)
        DB-->>Repo: existing Host or None
        Repo-->>API: matched_host or None
    end
    API->>Ser: serialize_host(matched_or_new_host,...)
    Ser->>Ser: extract_canonical_facts_from_host(host)
    Ser-->>API: serialized_host (canonical facts reconstructed)
    API-->>Client: response with serialized_host
Loading

ER diagram for Hosts table after dropping canonical_facts JSONB column

erDiagram
    HOSTS ||--o{ GROUPS : has
    HOSTS {
        uuid id PK
        varchar account
        varchar org_id
        varchar display_name
        varchar ansible_host
        jsonb facts
        jsonb tags
        jsonb tags_alt
        jsonb system_profile_facts
        jsonb groups
        uuid insights_id
        uuid subscription_manager_id
        uuid satellite_id
        varchar fqdn
        jsonb ip_addresses
        jsonb mac_addresses
        varchar provider_id
        varchar provider_type
        uuid openshift_cluster_id
        timestamptz created
        timestamptz modified
        timestamptz stale_timestamp
        jsonb per_reporter_staleness
        varchar reporter
        timestamptz last_check_in
    }

    GROUPS {
        uuid id
        varchar name
    }
Loading

Class diagram for updated Host and LimitedHost models without canonical_facts JSONB

classDiagram
    class LimitedHost {
        <<SQLAlchemyModel>>
        +UUID id
        +String display_name
        +String ansible_host
        +String account
        +String org_id
        +JSONB facts
        +JSONB tags
        +JSONB tags_alt
        +JSONB system_profile_facts
        +JSONB groups
        +UUID insights_id
        +UUID subscription_manager_id
        +UUID satellite_id
        +String fqdn
        +JSONB ip_addresses
        +JSONB mac_addresses
        +String provider_id
        +String provider_type
        +UUID openshift_cluster_id
        +DateTime created
        +DateTime modified
        +DateTime stale_timestamp
        +JSONB per_reporter_staleness
        +String reporter
        +DateTime last_check_in
        +save()
    }

    class Host {
        <<SQLAlchemyModel>>
        +UUID id
        +String display_name
        +String ansible_host
        +String account
        +String org_id
        +JSONB facts
        +JSONB tags
        +JSONB tags_alt
        +JSONB system_profile_facts
        +JSONB groups
        +UUID insights_id
        +UUID subscription_manager_id
        +UUID satellite_id
        +String fqdn
        +JSONB ip_addresses
        +JSONB mac_addresses
        +String provider_id
        +String provider_type
        +UUID openshift_cluster_id
        +DateTime created
        +DateTime modified
        +DateTime stale_timestamp
        +JSONB per_reporter_staleness
        +String reporter
        +DateTime last_check_in
        +save()
        +update(input_host, update_system_profile)
        +update_display_name(input_display_name, input_reporter, input_fqdn)
        +_should_ignore_display_name_update(input_reporter) bool
        +_apply_display_name_fallback(input_fqdn)
        +_update_ansible_host(ansible_host)
        +update_facts(facts_dict)
    }

    LimitedHost <|-- Host
Loading

File-Level Changes

Change Details Files
Stop using Host.canonical_facts for host matching and fact correctness checks, instead deriving canonical facts from typed columns and updating the DB filters and in‑memory logic, including JSONB array handling.
  • add extract_canonical_facts_from_host(host) helper that builds a canonical facts dict from Host/LimitedHost scalar and array canonical columns, omitting null/empty values
  • change add_host and update_system_profile to call find_existing_host with canonical facts produced by extract_canonical_facts_from_host instead of passing input_host.canonical_facts
  • rewrite contains_no_incorrect_facts_filter to resolve Host columns dynamically per canonical fact key, using column.isnot(None) with != for scalars and contains(@>) for ip_addresses/mac_addresses arrays
  • mirror the DB filter logic in contains_no_incorrect_facts_filter_in_memory by using extract_canonical_facts_from_host(host) and performing list‑containment checks for array canonical facts
  • rewrite matches_at_least_one_canonical_fact_filter and matches_at_least_one_canonical_fact_filter_in_memory to use Host columns and in‑memory values instead of Host.canonical_facts JSON contents, keeping COMPOUND_ID_FACTS behavior
lib/host_repository.py
app/serialization.py
Remove the canonical_facts JSONB column and constructor parameter from Host/LimitedHost, relying on explicit canonical columns with updated display name logic and model utilities.
  • comment out canonical_facts‑based GIN/functional indexes on LimitedHost and subscription_manager_id index based on canonical_facts
  • remove canonical_facts parameter and assignment from LimitedHost.init and Host.init, and move openshift_cluster_id initialization into LimitedHost
  • delete Host.canonical_facts column definition and related update_canonical_facts/update_canonical_facts_columns methods and associated ORM flagging
  • update Host.update to no longer merge canonical_facts, and to pass input_host.fqdn into update_display_name
  • simplify _apply_display_name_fallback and _set_display_name_on_save to derive fallback from fqdn or id instead of the canonical_facts JSON field
  • remove canonical_facts from Host.repr and from DEFAULT_COLUMNS in host_query_db
app/models/host.py
app/models/utils.py
api/host_query_db.py
Adjust host serialization, deserialization, and notifications to work directly with canonical fields instead of a nested canonical_facts structure.
  • change serialize_host to start from extract_canonical_facts_from_host(host) instead of serialize_canonical_facts(host.canonical_facts) and remove remove_null_canonical_facts helper
  • update tests expecting canonical_facts in serialized/created host data to now use top‑level canonical fields (insights_id, subscription_manager_id, fqdn, etc.)
  • update build_base_notification_obj and populate_events to read fqdn/insights_id/subscription_manager_id/satellite_id directly from host dict, no longer using host["canonical_facts"] or deserialize_canonical_facts
  • update host_delete_event and related notification builders to no longer inject serialized canonical_facts into the delete payload
  • adjust HostSchema.build_model / LimitedHostSchema.build_model to construct models without passing canonical_facts, relying on canonical fields in data instead
app/serialization.py
app/queue/notifications.py
app/queue/events.py
app/models/schemas.py
tests/test_models.py
tests/test_unit.py
tests/test_outbox_end_to_end_cases.py
tests/test_tags.py
Update all event, MQ, API, and job flows to reference typed canonical columns and helper extraction instead of Host.canonical_facts JSON.
  • replace host.canonical_facts[...] with direct attribute access (e.g., host.insights_id, host.subscription_manager_id, host.fqdn, host.mac_addresses, provider_id/provider_type/openshift_cluster_id) across MQ handlers, group repository, staleness API, and instrumentation
  • update message_headers and cache invalidation calls to pass host.insights_id instead of host.canonical_facts.get("insights_id")
  • change host_checkin to compute canonical_facts from the request payload using canonical fields rather than deserialize_canonical_facts(body) and use extract_canonical_facts_from_host/post_doc in tests
  • update _set_owner in host_mq to use host.subscription_manager_id (or dict.get("subscription_manager_id")) instead of canonical_facts lookups
  • fix host_synchronize to read insights_id from the serialized host dict (host.get("insights_id")) when building message headers
app/queue/host_mq.py
lib/group_repository.py
api/host.py
api/staleness.py
lib/host_synchronize.py
app/instrumentation.py
tests/test_host_mq_service.py
tests/helpers/mq_utils.py
tests/test_api_hosts_update.py
tests/test_api_hosts_delete.py
tests/test_rhsm_staleness.py
tests/test_utils.py
tests/test_jobs/test_host_reaper.py
tests/test_jobs/test_update_rhsm_host_timestamps.py
Switch queries, jobs, fixtures, and tests that filter or build hosts using canonical_facts JSON to use the new canonical columns directly.
  • update DB fixtures/helpers (db_host, minimal_db_host_dict, db_host_with_custom_canonical_facts_dict, create_reference_host_in_db, create_rhsm_only_host, etc.) so that they accept and populate canonical fields as top‑level kwargs instead of a canonical_facts dict
  • replace Host.canonical_facts[...] filters with column filters (Host.insights_id, Host.subscription_manager_id) in DB helpers, delete_hosts_s3, and jobs like add_inventory_view and host_delete_duplicates
  • update deduplication, delete‑duplicates, and culling jobs to construct canonical_facts by calling extract_canonical_facts_from_host when needed for comparison/matching instead of reusing host.canonical_facts
  • fix dozens of tests to construct hosts with top‑level canonical fields, to assert against fqdn/insights_id/subscription_manager_id/satellite_id/mac_addresses columns, and to adjust expectations around canonical_facts column updates/tests that used to verify JSONB behavior
  • ensure batch MQ dedup tests, RHSM staleness tests, and API GET/DELETE tests all use the new field layout and query paths
tests/helpers/db_utils.py
tests/fixtures/db_fixtures.py
jobs/delete_hosts_s3.py
jobs/add_inventory_view.py
jobs/host_delete_duplicates.py
tests/test_deduplication.py
tests/test_jobs/test_duplicate_hosts.py
tests/test_jobs/test_host_reaper.py
tests/test_jobs/test_update_rhsm_host_timestamps.py
tests/test_api_hosts_get.py
tests/test_api_hosts_delete.py
tests/test_api_hosts_update.py
tests/test_api_hosts_get.py
tests/test_outbox_end_to_end_cases.py
tests/test_jobs/test_duplicate_hosts.py
tests/test_jobs/test_host_reaper.py
tests/test_utils.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant