Skip to content

Conversation

@FabriciaDinizRH
Copy link
Contributor

@FabriciaDinizRH FabriciaDinizRH commented Dec 15, 2025

Overview

This PR is being created to address RHINENG-xxxx.
(A description of your PR's changes, along with why/context to the PR, goes here.)

PR Checklist

  • Keep PR title short, ideally under 72 characters
  • Descriptive comments provided in complex code blocks
  • Include raw query examples in the PR description, if adding/modifying SQL query
  • Tests: validate optimal/expected output
  • Tests: validate exceptions and failure scenarios
  • Tests: edge cases
  • Recovers or fails gracefully during potential resource outages (e.g. DB, Kafka)
  • Uses type hinting, if convenient
  • Documentation, if this PR changes the way other services interact with host inventory
  • Links to related PRs

Secure Coding Practices Documentation Reference

You can find documentation on this checklist here.

Secure Coding Checklist

  • Input Validation
  • Output Encoding
  • Authentication and Password Management
  • Session Management
  • Access Control
  • Cryptographic Practices
  • Error Handling and Logging
  • Data Protection
  • Communication Security
  • System Configuration
  • Database Security
  • File Management
  • Memory Management
  • General Coding Practices

Summary by Sourcery

Remove usage of the host.canonical_facts field in favor of first-class canonical fact columns and update APIs, serialization, and repository logic accordingly.

Enhancements:

  • Construct canonical fact data from individual host attributes instead of relying on a canonical_facts dict, and centralize canonical fact field definitions in config.
  • Adjust host model initialization, validation, update logic, and display-name behavior to operate directly on canonical fact columns.
  • Update serialization, deserialization, and MQ/event emission utilities to read and emit canonical facts from host attributes rather than the canonical_facts field.

Tests:

  • Update and extend unit, integration, and job tests to construct hosts without canonical_facts, validate the new canonical fact handling, and keep deduplication and staleness behavior unchanged.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 15, 2025

Reviewer's Guide

Refactors host canonical facts handling by removing the Host.canonical_facts JSON field from the API/model surface and tests, instead flowing canonical fact values as top-level fields, constructing canonical facts internally from those columns, and updating serialization, deserialization, matching, and event emission logic accordingly.

Sequence diagram for updated host deserialization and construction

sequenceDiagram
    actor Client
    participant API_host as API_host_endpoint
    participant Serialization as app_serialization
    participant Schema as HostSchema
    participant Host as Host_model

    Client->>API_host: POST /hosts body
    API_host->>Serialization: deserialize_host(raw_data, schema)

    Serialization->>Serialization: deserialize_canonical_facts(raw_data)
    Serialization->>Serialization: _deserialize_facts(raw_data.facts)
    Serialization->>Serialization: _deserialize_tags(raw_data.tags)

    Note over Serialization: Build data dict containing both
    Note over Serialization: payload fields and canonical facts columns

    Serialization->>Schema: build_model(validated_data_with_cf, facts, tags, tags_alt)

    Schema->>Host: __init__(canonical_facts=None, display_name, ansible_host, account, org_id, facts, tags, tags_alt, system_profile_facts, stale_timestamp, reporter, groups, insights_id, subscription_manager_id, satellite_id, fqdn, bios_uuid, ip_addresses, mac_addresses, provider_id, provider_type, openshift_cluster_id, per_reporter_staleness)

    Note over Host: Construct constructed_canonical_facts from
    Note over Host: individual canonical fact attributes

    Host->>Host: update_canonical_facts_columns(constructed_canonical_facts)

    Host-->>Schema: Host instance
    Schema-->>Serialization: Host instance
    Serialization-->>API_host: Host instance
    API_host-->>Client: HTTP 201 with serialized host
Loading

Sequence diagram for add_host matching using extracted canonical facts

sequenceDiagram
    actor Service as Caller_service
    participant HostRepo as host_repository
    participant Extract as extract_canonical_facts_from_host
    participant Finder as find_existing_host
    participant DB as database

    Service->>HostRepo: add_host(input_host, existing_hosts, identity, operation_args)

    HostRepo->>Extract: extract_canonical_facts_from_host(input_host)
    Extract-->>HostRepo: input_host_canonical_facts

    alt existing_hosts provided
        HostRepo->>Finder: find_existing_host(identity, input_host_canonical_facts, existing_hosts)
        Finder-->>HostRepo: matched_host or None
    end

    alt matched_host is None
        HostRepo->>Finder: find_existing_host(identity, input_host_canonical_facts)
        Finder->>DB: query hosts by canonical fact columns
        DB-->>Finder: existing host or None
        Finder-->>HostRepo: matched_host or None
    end

    Note over HostRepo: Canonical facts are now matched
    Note over HostRepo: via top level columns, not host.canonical_facts JSON

    HostRepo->>DB: insert or update Host row
    DB-->>HostRepo: persisted Host
    HostRepo-->>Service: resulting Host
Loading

Class diagram for updated Host canonical facts handling

classDiagram
    class Host {
        +UUID id
        +str account
        +str org_id
        +str display_name
        +str ansible_host
        +dict facts
        +list tags
        +list tags_alt
        +dict system_profile_facts
        +datetime stale_timestamp
        +str reporter
        +list groups
        +datetime last_check_in
        +UUID insights_id
        +str subscription_manager_id
        +str satellite_id
        +str fqdn
        +str bios_uuid
        +list ip_addresses
        +list mac_addresses
        +str provider_id
        +str provider_type
        +str openshift_cluster_id
        +dict per_reporter_staleness
        +__init__(canonical_facts, display_name, ansible_host, account, org_id, facts, tags, tags_alt, system_profile_facts, stale_timestamp, reporter, groups, insights_id, subscription_manager_id, satellite_id, fqdn, bios_uuid, ip_addresses, mac_addresses, provider_id, provider_type, openshift_cluster_id, per_reporter_staleness)
        +update(input_host, update_system_profile)
        +update_display_name(display_name, reporter, input_fqdn)
        +update_canonical_facts_columns(host)
        +save()
    }

    class LimitedHost {
        +UUID id
        +str account
        +str org_id
        +str display_name
        +str ansible_host
        +dict facts
        +list tags
        +list tags_alt
        +dict system_profile_facts
        +list groups
        +UUID insights_id
        +str subscription_manager_id
        +str satellite_id
        +str fqdn
        +str bios_uuid
        +list ip_addresses
        +list mac_addresses
        +str provider_id
        +str provider_type
        +str openshift_cluster_id
        +__init__(display_name, ansible_host, account, org_id, facts, tags, tags_alt, system_profile_facts, groups, insights_id, subscription_manager_id, satellite_id, fqdn, bios_uuid, ip_addresses, mac_addresses, provider_id, provider_type, openshift_cluster_id)
    }

    class HostSchema {
        +build_model(data, facts, tags, tags_alt)
    }

    class LimitedHostSchema {
        +build_model(data, facts, tags, tags_alt)
    }

    class Config {
        +tuple CANONICAL_FACTS_FIELDS
    }

    class Serialization {
        +deserialize_host(raw_data, schema)
        +deserialize_canonical_facts(raw_data, all)
        +_deserialize_canonical_facts(data)
        +_deserialize_all_canonical_facts(data)
        +serialize_host(host, timestamps, for_mq, additional_fields, staleness, system_profile_fields)
        +serialize_canonical_facts(canonical_facts)
        +extract_canonical_facts_from_host(host)
    }

    Host <|-- LimitedHost
    HostSchema --> Host
    LimitedHostSchema --> LimitedHost
    Serialization --> HostSchema
    Serialization --> LimitedHostSchema
    Host ..> Config : uses CANONICAL_FACTS_FIELDS
    Serialization ..> Config : uses CANONICAL_FACTS_FIELDS
Loading

File-Level Changes

Change Details Files
Refactor schema build_model helpers to no longer accept or use a canonical_facts dict, instead sourcing canonical fact fields directly from validated data.
  • LimitedHostSchema.build_model signature changed to drop canonical_facts parameter and map canonical fact fields from data to LimitedHost constructor arguments.
  • HostSchema.build_model now constructs Host using explicit keyword arguments, sets canonical_facts=None, and passes canonical fact values from data instead of canonical_facts dict.
app/models/schemas.py
Change Host model to construct and use canonical facts from individual columns rather than a stored canonical_facts JSON field.
  • Host.init now takes canonical_facts with a default of None, builds a constructed_canonical_facts dict from CANONICAL_FACTS_FIELDS using locals(), and validates presence of canonical and ID facts from that dict.
  • Removed update_canonical_facts and its call sites; Host now only updates per-field canonical fact columns through update_canonical_facts_columns.
  • update_canonical_facts_columns now takes a host-like dict, filters by CANONICAL_FACTS_FIELDS, and unconditionally sets/flags modified attributes instead of diffing values.
  • openshift_cluster_id assignment moved earlier in init so it is always set.
  • Host.update was updated to call update_display_name using input_host.get("fqdn") and update_canonical_facts_columns(input_host).
  • Host.repr no longer prints canonical_facts.
app/models/host.py
Centralize canonical facts field list and update serialization/deserialization to operate on host columns instead of Host.canonical_facts.
  • Introduced CANONICAL_FACTS_FIELDS constant in app.config and replaced the local _CANONICAL_FACTS_FIELDS in serialization with it.
  • deserialize_host now merges canonical_facts into validated_data and passes the combined dict into schema.build_model, eliminating the canonical_facts argument.
  • _deserialize_canonical_facts and _deserialize_all_canonical_facts now iterate over CANONICAL_FACTS_FIELDS.
  • serialize_canonical_facts now uses CANONICAL_FACTS_FIELDS, and serialize_host now serializes canonical facts via a new extract_canonical_facts_from_host helper that reads attributes directly from Host/LimitedHost.
  • remove_null_canonical_facts, serialize_host field mapping, and other helpers updated to rely on CANONICAL_FACTS_FIELDS and direct host attributes rather than host.canonical_facts JSON.
app/config.py
app/serialization.py
Introduce helper to extract canonical facts from Host/LimitedHost and use it across repository and API layers instead of relying on host.canonical_facts.
  • Added extract_canonical_facts_from_host(host) to build a canonical facts dict from CANONICAL_FACTS_FIELDS attributes, normalizing non-string values.
  • lib.host_repository.add_host and update_system_profile now compute canonical facts via extract_canonical_facts_from_host for matching, instead of using input_host.canonical_facts.
  • matches_at_least_one_canonical_fact_filter_in_memory now compares against host.get(key) rather than host.canonical_facts.get(key).
app/serialization.py
lib/host_repository.py
Update API/event emission logic to reference canonical fact columns instead of Host.canonical_facts JSON field.
  • _emit_patch_event, patch_host_by_id, update_facts_by_namespace, host_checkin, staleness._build_host_updated_event_params, and queue.host_mq.write_add_update_event_message now read insights_id via host.insights_id or cast to string where needed.
  • models.utils._set_display_name_on_save now falls back to params["fqdn"] instead of params["canonical_facts"]["fqdn"].
  • api.host_query_db.DEFAULT_COLUMNS no longer includes Host.canonical_facts.
api/host.py
api/staleness.py
app/models/utils.py
app/queue/host_mq.py
api/host_query_db.py
Adjust test helpers and fixtures to construct hosts using top-level canonical fact fields and to use extract_canonical_facts_from_host where canonical_facts was previously read or written.
  • db_host and related helpers now expand canonical facts into top-level fields (insights_id, subscription_manager_id, fqdn, etc.) instead of populating a canonical_facts dict.
  • db_get_host_by_insights_id and db_get_hosts_by_subman_id now query Host.insights_id/subscription_manager_id columns instead of JSON index access.
  • Tests that instantiated Host or minimal_db_host via canonical_facts dict were updated to pass canonical fact fields as keyword arguments or use **canonical_facts expansion.
  • Host check-in tests now build POST documents using extract_canonical_facts_from_host and ensure UUIDs are stringified where required.
  • Drop Host.canonical_facts from iqe schema-related tests and fixtures that constructed DB objects.
tests/test_models.py
tests/test_deduplication.py
tests/test_api_hosts_get.py
tests/test_api_hosts_update.py
tests/test_rhsm_staleness.py
tests/test_outbox_end_to_end_cases.py
tests/test_jobs/test_host_reaper.py
tests/test_jobs/test_duplicate_hosts.py
tests/test_jobs/test_update_rhsm_host_timestamps.py
tests/test_utils.py
tests/test_host_mq_service.py
tests/test_api_hosts_get.py
tests/test_unit.py
tests/helpers/db_utils.py
tests/fixtures/db_fixtures.py
iqe-host-inventory-plugin/iqe_host_inventory/tests/db/test_schema.py
iqe-host-inventory-plugin/iqe_host_inventory/utils/db_utils.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Member

@ezr-ondrej ezr-ondrej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've done a quick pass, didn't review the more complex changes yet :)

Comment on lines +282 to +283
if current_app.config["USE_SUBMAN_ID"] and "subscription_manager_id" in constructed_canonical_facts:
id = constructed_canonical_facts["subscription_manager_id"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if current_app.config["USE_SUBMAN_ID"] and "subscription_manager_id" in constructed_canonical_facts:
id = constructed_canonical_facts["subscription_manager_id"]
if current_app.config["USE_SUBMAN_ID"] and subscription_manager_id:
id = subscription_manager_id


self.update_canonical_facts(canonical_facts)
self.update_canonical_facts_columns(canonical_facts)
self.update_canonical_facts_columns(constructed_canonical_facts)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is no longer needed, because the fields are no longer settable from canonical facts, only directly, so there is no need to sync them from canonical facts


def __repr__(self):
return (
f"<Host id='{self.id}' account='{self.account}' org_id='{self.org_id}' display_name='{self.display_name}' "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good to add at least subman_id, or directly all CANONICAL_FACTS_FIELDS:

Suggested change
f"<Host id='{self.id}' account='{self.account}' org_id='{self.org_id}' display_name='{self.display_name}' "
f"<Host id='{self.id}' account='{self.account}' org_id='{self.org_id}' display_name='{self.display_name}' "
f"{[key + '=' + getattr(key) for key in CANONICAL_FACTS_FIELD].join(' ')} >"

At minimum, you're missing the closing bracket (>), that was part of the removed line

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants