Skip to content

Conversation

@FabriciaDinizRH
Copy link
Contributor

@FabriciaDinizRH FabriciaDinizRH commented Dec 15, 2025

Overview

This PR is being created to address RHINENG-22723.

  • remove canonical_facts from host_mq.py
  • fix code directly related mq to fix broken tests
  • updated outdated tests

PR Checklist

  • Keep PR title short, ideally under 72 characters
  • Descriptive comments provided in complex code blocks
  • Include raw query examples in the PR description, if adding/modifying SQL query
  • Tests: validate optimal/expected output
  • Tests: validate exceptions and failure scenarios
  • Tests: edge cases
  • Recovers or fails gracefully during potential resource outages (e.g. DB, Kafka)
  • Uses type hinting, if convenient
  • Documentation, if this PR changes the way other services interact with host inventory
  • Links to related PRs

Secure Coding Practices Documentation Reference

You can find documentation on this checklist here.

Secure Coding Checklist

  • Input Validation
  • Output Encoding
  • Authentication and Password Management
  • Session Management
  • Access Control
  • Cryptographic Practices
  • Error Handling and Logging
  • Data Protection
  • Communication Security
  • System Configuration
  • Database Security
  • File Management
  • Memory Management
  • General Coding Practices

Summary by Sourcery

Refactor host messaging and serialization to stop relying on the legacy canonical_facts dict and instead use dedicated Host fields for MQ events and related logic.

Enhancements:

  • Update host MQ event construction and headers to read identifiers directly from Host fields (e.g. insights_id, subscription_manager_id, fqdn) instead of the canonical_facts mapping.
  • Refine Host initialization and validation to work when canonical_facts is omitted while still enforcing presence of canonical ID fields, including subscription_manager_id when configured.
  • Adjust serialization utilities to use a centralized CANONICAL_FACTS_FIELDS config and to serialize canonical fact fields from Host attributes, including proper UUID-to-string handling.
  • Update host owner assignment to derive owner_id from subscription_manager_id stored on the Host model rather than from canonical_facts.
  • Simplify Host repr to reflect the primary identifier (insights_id) rather than the full canonical_facts structure.

Tests:

  • Remove and update unit and API tests that depended on canonical_facts in MQ paths to validate the new Host-field-based serialization and event payloads.

Summary by Sourcery

Refactor host canonical facts handling to rely on dedicated Host model fields instead of the legacy canonical_facts dict, updating serialization, MQ event production, and tests accordingly.

Enhancements:

  • Update Host model initialization, validation, and repr to support missing canonical_facts while enforcing required identifier fields and using subscription_manager_id directly when configured.
  • Change schema build_model logic and serialization utilities to read and write canonical fact fields from Host attributes via a centralized CANONICAL_FACTS_FIELDS config, including UUID-to-string normalization.
  • Adjust API, DB query defaults, and MQ/event-producer code to emit identifiers (e.g., insights_id, subscription_manager_id, fqdn) from Host fields and to derive owner_id from subscription_manager_id on the Host model.
  • Add Flask application context helpers and mixin base classes to simplify tests that depend on current_app and connexion identity context.

Tests:

  • Update serialization, MQ, and API tests to work with Host field–based canonical fact handling rather than the canonical_facts dict, and to validate the new serialize_canonical_facts behavior.

@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Dec 15, 2025

Reviewer's Guide

Refactors host serialization, MQ event construction, and ownership logic to stop using the canonical_facts dict, instead reading canonical identifiers directly from Host model fields and updating tests and helpers to operate in a Flask app context and validate the new behavior.

Sequence diagram for host MQ add/update event using Host field-based identifiers

sequenceDiagram
  participant ApiHost
  participant Host
  participant Serialization
  participant HostMQ
  participant EventProducer
  participant Cache

  ApiHost->>Host: apply update (set insights_id, subscription_manager_id, fqdn)
  ApiHost->>Serialization: serialize_host(host, staleness_timestamps, staleness)
  Serialization->>Host: serialize_canonical_facts(host)
  Serialization-->>ApiHost: serialized_host

  ApiHost->>HostMQ: write_add_update_event_message(event_producer, result)
  HostMQ->>Serialization: serialize_host(result.row, result.staleness_timestamps, staleness)
  Serialization->>Host: read CANONICAL_FACTS_FIELDS from attributes
  Serialization-->>HostMQ: output_host (canonical facts from Host fields)

  HostMQ->>Host: read insights_id
  HostMQ->>EventProducer: write_event(event, host.id, headers with insights_id)

  alt cache_cleanup_needed
    HostMQ->>Host: read insights_id and system_profile_facts.owner_id
    HostMQ->>Cache: delete_cached_system_keys(insights_id, org_id, owner_id)
  end
Loading

Class diagram for Host model and related serialization/MQ changes

classDiagram

class Config {
  +ID_FACTS
  +ID_FACTS_USE_SUBMAN_ID
  +CANONICAL_FACTS_FIELDS
}

class Host {
  +UUID id
  +String account
  +String org_id
  +String display_name
  +String ansible_host
  +JSONB canonical_facts
  +UUID insights_id
  +UUID subscription_manager_id
  +UUID satellite_id
  +String fqdn
  +String bios_uuid
  +JSONB ip_addresses
  +JSONB mac_addresses
  +String provider_id
  +String provider_type
  +UUID openshift_cluster_id
  +JSONB system_profile_facts
  +String reporter
  +dict per_reporter_staleness
  +__init__(canonical_facts=None, display_name=None, ansible_host=None, account=None, org_id=None, facts=None, reporter=None, tags=None, system_profile_facts=None, groups=None, insights_id=None, subscription_manager_id=None, satellite_id=None, fqdn=None, bios_uuid=None, ip_addresses=None, mac_addresses=None, provider_id=None, provider_type=None, openshift_cluster_id=None)
  +update(input_host, update_system_profile=False) void
  +save() Host
  +update_canonical_facts(canonical_facts) void
  +update_canonical_facts_columns(canonical_facts) void
  +__repr__() String
}

class Serialization {
  +serialize_host(host, staleness_timestamps, rbac_filtered=False, staleness=None, fields=None, host_type=None) dict
  +serialize_canonical_facts(host) dict
  +deserialize_canonical_facts(raw_data, all=False) dict
  +remove_null_canonical_facts(serialized_host) void
  +_deserialize_canonical_facts(data) dict
  +_deserialize_all_canonical_facts(data) dict
}

class Events {
  +host_delete_event(event_type, host, initiated_by_frontend=False, platform_metadata=None) dict
}

class HostMQ {
  +_set_owner(host, identity) Host
  +sync_event_message(message, session, event_producer) None
  +write_delete_event_message(event_producer, result) None
  +write_add_update_event_message(event_producer, result) None
}

class ApiHost {
  +_emit_patch_event(serialized_host, host) None
  +patch_host_by_id(host_id_list, body, rbac_filter=None) dict
  +update_facts_by_namespace(operation, host_id_list, namespace, fact_dict, rbac_filter=None) dict
}

Config <.. Host : uses
Config <.. Serialization : uses
Host <.. Serialization : reads attributes
Host <.. Events : reads attributes
Host <.. HostMQ : reads identifiers
Host <.. ApiHost : reads identifiers
Serialization <.. HostMQ : serialize_host
Serialization <.. ApiHost : serialize_host
Events <.. HostMQ : build_event
Loading

File-Level Changes

Change Details Files
Host model now validates and stores canonical identifiers on dedicated columns instead of relying on the canonical_facts dict, and uses insights_id in its representation.
  • Allow Host.init to accept optional canonical_facts while requiring at least one canonical fact field or ID fact to be present, considering both the dict and model attributes.
  • Adjust ID fact validation to check Host fields as well as canonical_facts and honor USE_SUBMAN_ID using subscription_manager_id from either source.
  • Guard canonical_facts updates in Host.update/save so they only run when canonical_facts is provided, avoiding mandatory dict usage.
  • Simplify Host.repr to log only the primary identifier insights_id instead of the full canonical_facts mapping.
app/models/host.py
Canonical facts configuration and serialization/deserialization now use a centralized CANONICAL_FACTS_FIELDS setting and operate directly on Host instances.
  • Introduce CANONICAL_FACTS_FIELDS in app.config and remove the private _CANONICAL_FACTS_FIELDS tuple from serialization.
  • Update deserialize/remove-null helpers to iterate over CANONICAL_FACTS_FIELDS instead of a local constant.
  • Refactor serialize_canonical_facts to accept a Host object, reading attributes by name and converting UUIDs to strings for JSON compatibility.
  • Change serialize_host and host delete event construction to spread serialize_canonical_facts(host) instead of host.canonical_facts.
  • Ensure canonical fact deserializers still casefold values but work with the shared field list.
app/config.py
app/serialization.py
app/queue/events.py
MQ and API event production now construct payloads and headers from Host fields (e.g., insights_id, fqdn, subscription_manager_id) rather than from canonical_facts.
  • Update MQ helper assertions to read insights_id, fqdn, bios_uuid, ip_addresses, mac_addresses, provider_id, provider_type, subscription_manager_id, and openshift_cluster_id directly from Host attributes when verifying events.
  • Use str(host.insights_id) in delete, patch, and synchronize event headers and payloads, and when purging cached system keys.
  • Refactor host_mq delete/sync event writers and API patch/facts update flows to pass host.insights_id into message_headers instead of canonical_facts['insights_id'].
  • Adjust FakeHostRow in tests to provide insights_id and subscription_manager_id attributes instead of exclusively canonical_facts.
tests/helpers/mq_utils.py
app/queue/host_mq.py
api/host.py
tests/test_host_mq_service.py
Owner assignment for RHSM-related reporters now derives owner_id from Host.subscription_manager_id instead of the canonical_facts mapping.
  • Change _set_owner to check host.subscription_manager_id when reporter is rhsm-conduit or rhsm-system-profile-bridge and to format that UUID into owner_id.
  • Mirror this behavior for static_system_profile.owner_id, again preferring the Host.subscription_manager_id attribute.
  • Retain validation that identity CN matches existing owner_id when a matching subscription_manager_id is not present.
app/queue/host_mq.py
Host schema/model building now passes canonical identifier fields from the main data payload into Host instead of via the canonical_facts dict, and query defaults align with the new model layout.
  • Modify build_model to read insights_id, subscription_manager_id, satellite_id, fqdn, bios_uuid, ip_addresses, mac_addresses, provider_id, provider_type, and openshift_cluster_id from the top-level data dict.
  • Update DEFAULT_COLUMNS used in host_query_db to include Host.insights_id, Host.subscription_manager_id, and Host.fqdn instead of Host.canonical_facts.
  • Relax host model tests that required canonical_facts as a mandatory field, since canonical identifiers now live on dedicated columns.
app/models/schemas.py
api/host_query_db.py
tests/test_models.py
Serialization and deserialization tests have been updated to work with Host attributes instead of canonical_facts dicts, and to use Flask application context helpers where required.
  • Introduce flask_app_context context manager and FlaskAppTestCase mixin to consistently set up Flask app and connexion identity for tests that touch current_app or Host initialization.
  • Update multiple serialization test cases to inherit from FlaskAppTestCase, construct Host objects with canonical fields as top-level kwargs, and call serialize_canonical_facts(host) rather than with a dict.
  • Adjust expectations in serialize/deserialize tests to spread canonical fact fields into expected dicts instead of nesting under canonical_facts.
  • Update mocked serialization tests to expect serialize_canonical_facts(host) calls and to treat subscription_manager_id as a direct Host attribute.
  • Tweak error assertion logic in deserialize tests to compare ValidationError string directly to the raised context.
tests/helpers/test_utils.py
tests/test_unit.py
API tests that previously mutated or asserted canonical_facts now interact with the corresponding Host attributes, and some outdated MQ-specific tests depending on canonical_facts were removed.
  • Remove check-in tests that posted canonical_facts directly for MQ-based host updates that are no longer supported in this form.
  • Change tests that simulate missing insights_id to delete host.insights_id instead of deleting from canonical_facts.
  • Align host delete/patch tests with the new owner/insights_id handling behavior in events and cache invalidation.
tests/test_api_hosts_update.py
tests/test_api_hosts_delete.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@FabriciaDinizRH FabriciaDinizRH force-pushed the remove-cf-from-mq branch 2 times, most recently from 53f0423 to 3974151 Compare December 16, 2025 17:00
@FabriciaDinizRH FabriciaDinizRH marked this pull request as ready for review December 16, 2025 17:00
@FabriciaDinizRH FabriciaDinizRH requested a review from a team as a code owner December 16, 2025 17:00
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey there - I've reviewed your changes and found some issues that need to be addressed.

  • The new Host.init validation logic appears incorrect: any(value is not None for value in CANONICAL_FACTS_FIELDS) and all(value is None for value in ID_FACTS) iterate over field-name strings rather than the actual constructor arguments/attributes, so these checks will never behave as intended; consider using the passed-in kwargs or getattr on the instance instead.
  • In Host.init, the USE_SUBMAN_ID gate around subscription_manager_id is effectively bypassed because of operator precedence: if current_app.config["USE_SUBMAN_ID"] and (canonical_facts and ... or subscription_manager_id is not None) should be parenthesized so that USE_SUBMAN_ID is required even when only subscription_manager_id is provided directly.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new Host.__init__ validation logic appears incorrect: `any(value is not None for value in CANONICAL_FACTS_FIELDS)` and `all(value is None for value in ID_FACTS)` iterate over field-name strings rather than the actual constructor arguments/attributes, so these checks will never behave as intended; consider using the passed-in kwargs or `getattr` on the instance instead.
- In Host.__init__, the `USE_SUBMAN_ID` gate around subscription_manager_id is effectively bypassed because of operator precedence: `if current_app.config["USE_SUBMAN_ID"] and (canonical_facts and ... or subscription_manager_id is not None)` should be parenthesized so that `USE_SUBMAN_ID` is required even when only `subscription_manager_id` is provided directly.

## Individual Comments

### Comment 1
<location> `app/models/host.py:320-321` </location>
<code_context>
             groups = []

-        if not canonical_facts:
+        # Leaving canonical_facts to avoid changing too much of the code at once
+        if not canonical_facts and not any(value is not None for value in CANONICAL_FACTS_FIELDS):
             raise ValidationException("At least one of the canonical fact fields must be present.")

</code_context>

<issue_to_address>
**issue (bug_risk):** Canonical-facts presence check is ineffective because it inspects field names instead of values

`any(value is not None for value in CANONICAL_FACTS_FIELDS)` will always be `True` because it iterates over field-name strings, not host values. This means the `ValidationException` will never be raised for missing canonical facts. To validate correctly, you need to inspect the actual attribute values for those fields (e.g., via `getattr(self, field)` or the constructor args), not the constants themselves.
</issue_to_address>

### Comment 2
<location> `app/models/host.py:324-325` </location>
<code_context>
             raise ValidationException("At least one of the canonical fact fields must be present.")

-        if all(id_fact not in canonical_facts for id_fact in ID_FACTS):
+        if (canonical_facts and all(id_fact not in canonical_facts for id_fact in ID_FACTS)) or all(
+            value is None for value in ID_FACTS
+        ):
             raise ValidationException(f"At least one of the ID fact fields must be present: {ID_FACTS}")
</code_context>

<issue_to_address>
**issue (bug_risk):** ID fact validation also checks field names instead of values, so the `all(... is None ...)` branch is always false

Because `ID_FACTS` is a tuple of constant strings, `all(value is None for value in ID_FACTS)` is always `False`. As a result, when `canonical_facts` is `None`, this condition never passes and the validation doesn’t run, even if all ID-related data is missing. To enforce “at least one ID fact” across both `canonical_facts` and the new columns, this needs to check the actual values (e.g., `canonical_facts.get(...)` and/or the relevant attributes/constructor args), not the constant field names.
</issue_to_address>

### Comment 3
<location> `app/queue/host_mq.py:683-686` </location>
<code_context>
     headers = message_headers(
         EventType.updated,
-        host.canonical_facts.get("insights_id"),
+        str(host.insights_id),
         host.reporter,
         host.system_profile_facts.get("host_type"),
</code_context>

<issue_to_address>
**issue (bug_risk):** Blindly stringifying insights_id risks propagating "None" instead of null/absence

Several call sites now pass `str(host.insights_id)` (and similar) into headers and cache-key helpers. If `insights_id` is `None`, this becomes the string `'None'`, which is truthy and semantically different from an absent/`None` ID, leading to headers or cache keys literally containing `'None'`. Please avoid stringifying `None` (e.g., use `insights_id and str(insights_id)` or pass `None` through and let downstream code handle it).
</issue_to_address>

### Comment 4
<location> `tests/test_unit.py:987` </location>
<code_context>

                     expected_errors = HostSchema().validate(inp)
-                    self.assertEqual(str(expected_errors), str(context.exception))
+                    self.assertEqual(str(expected_errors), str(context))

     # Test that both of the host schemas will pass all of these fields
</code_context>

<issue_to_address>
**issue (testing):** The invalid-input test is now asserting against the context manager, not the raised exception

Within `with self.assertRaises(ValidationException) as context:`, `context` is the context manager; the raised exception is `context.exception`. The previous assertion compared `str(expected_errors)` to `str(context.exception)`, which correctly checks the exception message. Using `str(context)` instead will not reflect the exception and can make the test pass or fail for the wrong reason. Please compare against `context.exception` (or otherwise explicitly assert on the exception) so the test validates the error correctly.
</issue_to_address>

### Comment 5
<location> `tests/test_models.py:257` </location>
<code_context>
     assert error_messages["tags"] == {0: {"key": ["Missing data for required field."]}}


[email protected]("missing_field", ["canonical_facts", "stale_timestamp", "reporter"])
[email protected]("missing_field", ["stale_timestamp", "reporter"])
 def test_host_models_missing_fields(missing_field):
     limited_values = {
</code_context>

<issue_to_address>
**suggestion (testing):** Add explicit tests for Host creation without `canonical_facts` but with canonical fact fields set on the model

This parametrization now reflects that `canonical_facts` is no longer required, but we’re missing a positive test for the new allowed path: creating a Host with `canonical_facts=None` (or omitted) while providing top-level identifiers (e.g. `insights_id`, `subscription_manager_id` when `USE_SUBMAN_ID` is enabled). Please add tests that (1) confirm creation succeeds in that scenario, and (2) confirm creation still fails when both `canonical_facts` and all ID fields are missing/None, ideally colocated with `test_host_models_missing_fields` to keep validation coverage together.

Suggested implementation:

```python
    assert error_messages["tags"] == {0: {"key": ["Missing data for required field."]}}


def test_host_creation_with_ids_without_canonical_facts(mocker):
    """
    Creating a Host without canonical_facts should succeed as long as
    at least one top-level identifier is present (e.g. insights_id or
    subscription_manager_id when USE_SUBMAN_ID is enabled).
    """
    # NOTE: adapt these to whichever ID fields your Host model actually uses
    host_data = {
        "account": USER_IDENTITY["account_number"],
        "display_name": "host-with-ids-no-cf",
        "stale_timestamp": STALE_TIMESTAMP,
        "reporter": "test-reporter",
        # canonical_facts is intentionally omitted / None
        "canonical_facts": None,
        "insights_id": "00000000-0000-0000-0000-000000000001",
    }

    # When USE_SUBMAN_ID is enabled, make sure we also support using that as an identifier
    if getattr(settings, "USE_SUBMAN_ID", False):
        host_data["subscription_manager_id"] = "subman-id-0001"

    host = HostSchema().load(host_data)

    # canonical_facts should be allowed to be empty/None, but identifiers should be set
    assert host.account == USER_IDENTITY["account_number"]
    assert host.display_name == "host-with-ids-no-cf"
    assert host.reporter == "test-reporter"
    assert host.insights_id == "00000000-0000-0000-0000-000000000001"
    if getattr(settings, "USE_SUBMAN_ID", False):
        assert host.subscription_manager_id == "subman-id-0001"
    # Depending on implementation this may be {} or None; keep the assertion loose
    assert host.canonical_facts in ({}, None)


def test_host_creation_fails_without_ids_or_canonical_facts():
    """
    Creating a Host must still fail when both canonical_facts and all
    top-level identifier fields are missing/None.
    """
    host_data = {
        "account": USER_IDENTITY["account_number"],
        "display_name": "host-without-ids-or-cf",
        "stale_timestamp": STALE_TIMESTAMP,
        "reporter": "test-reporter",
        # canonical_facts explicitly empty / None
        "canonical_facts": None,
        # All identifier fields omitted / None on purpose
        "insights_id": None,
    }

    if getattr(settings, "USE_SUBMAN_ID", False):
        host_data["subscription_manager_id"] = None

    with pytest.raises(ValidationError):
        HostSchema().load(host_data)


@pytest.mark.parametrize("missing_field", ["stale_timestamp", "reporter"])

```

To make this compile and match your existing conventions, you will likely need to:

1. **Imports**  
   - Ensure `ValidationError` is imported at the top of `tests/test_models.py`, usually:  
     `from marshmallow import ValidationError`  
   - Ensure `settings` and `STALE_TIMESTAMP` are imported/available. If your tests already have a shared timestamp fixture or constant, reuse that instead of `STALE_TIMESTAMP`, or replace with whatever is used in `test_host_models_missing_fields`.
   - If you don’t have a `settings` object in tests, replace `getattr(settings, "USE_SUBMAN_ID", False)` with how you normally access the `USE_SUBMAN_ID` flag (e.g. `from app import config` or `from app.environment import USE_SUBMAN_ID`).

2. **Schema / Model names**  
   - If your schema is not named `HostSchema` or is accessed differently (e.g. `schema.HostSchema()` or `HostSchemaStrict`), adjust the calls accordingly.
   - If your `Host` model has different field names for identifiers (e.g. `insights_id`, `satellite_id`, `subscription_manager_id`), update `host_data` and the assertions to match the actual field names.

3. **Fixtures**  
   - If your tests rely on a DB/session or app fixture (e.g. `db`, `db_session`, `app`), and schema loading requires it, add the appropriate fixture parameters to the test function signatures instead of `mocker` (or remove `mocker` entirely if not needed).
   - If `USER_IDENTITY` is provided via a fixture instead of a module-level constant, update the test signatures and usages to accept and use that fixture.

4. **Canonical facts default behavior**  
   - If your schema normalizes `canonical_facts` to `{}` rather than `None`, you can tighten the assertion to `assert host.canonical_facts == {}` to match the actual behavior.
</issue_to_address>

### Comment 6
<location> `tests/test_unit.py:1902-1905` </location>
<code_context>
         self.assertEqual(result, expected)


-class SerializationSerializeCanonicalFactsTestCase(TestCase):
+class SerializationSerializeCanonicalFactsTestCase(FlaskAppTestCase, TestCase):
     def test_contains_all_values_unchanged(self):
         canonical_facts = {
</code_context>

<issue_to_address>
**suggestion (testing):** Strengthen tests for `serialize_canonical_facts` to cover UUID-to-string conversion and host attributes

Since the tests now pass a `Host` instance into `serialize_canonical_facts`, please add a test that builds a `Host` with one or more canonical-fact fields as actual `UUID` objects (e.g., `insights_id`, `subscription_manager_id`, or `openshift_cluster_id`) and asserts that the serialized output contains those fields as strings. This will exercise the UUID-to-string conversion path and help prevent UUID objects from leaking into MQ payloads.

Suggested implementation:

```python
class SerializationSerializeCanonicalFactsTestCase(FlaskAppTestCase, TestCase):
    def test_uuid_canonical_facts_are_serialized_as_strings(self):
        insights_id = uuid4()
        subscription_manager_id = uuid4()
        openshift_cluster_id = uuid4()

        host = Host(
            insights_id=insights_id,
            subscription_manager_id=subscription_manager_id,
            openshift_cluster_id=openshift_cluster_id,
            display_name="uuid-test-host",
            stale_timestamp=now(),
            reporter="test",
        )

        serialized = serialize_canonical_facts(host)

        self.assertEqual(serialized["insights_id"], str(insights_id))
        self.assertEqual(serialized["subscription_manager_id"], str(subscription_manager_id))
        self.assertEqual(serialized["openshift_cluster_id"], str(openshift_cluster_id))

        self.assertIsInstance(serialized["insights_id"], str)
        self.assertIsInstance(serialized["subscription_manager_id"], str)
        self.assertIsInstance(serialized["openshift_cluster_id"], str)

```

1. Ensure that `serialize_canonical_facts` is imported into `tests/test_unit.py` if it is not already, e.g.:
   `from app.serialization import serialize_canonical_facts` (adjust the module path to match your project).
2. `Host`, `now`, `FlaskAppTestCase`, and `uuid4` appear to already be in use in this file; if any are missing, add the appropriate imports consistent with existing conventions.
</issue_to_address>

### Comment 7
<location> `tests/test_host_mq_service.py:3060-3062` </location>
<code_context>
         id = "host-id"
         org_id = "org-id"
         account = "acct"
-        canonical_facts = {"insights_id": str(generate_uuid())}
+        canonical_facts = {"insights_id": str(generate_uuid())}  # this line will be removed
+        insights_id = generate_uuid()
+        subscription_manager_id = generate_uuid()
         reporter = "puptoo"
         system_profile_facts = {"owner_id": "owner-id"}
</code_context>

<issue_to_address>
**suggestion (testing):** Add tests to cover `_set_owner` behavior when using `subscription_manager_id` instead of `canonical_facts`

The MQ path in `host_mq.py` now derives `owner_id` from `host.subscription_manager_id` for certain reporters, but there’s no end-to-end test validating this behavior.

Please add tests (likely alongside the existing MQ owner tests) that:
- Use a host with `subscription_manager_id` set, no `canonical_facts` dependency, and a reporter of `"rhsm-conduit"` or `"rhsm-system-profile-bridge"`, then assert `system_profile_facts["owner_id"]` (and static profile, if used) matches the formatted `subscription_manager_id` after `_set_owner` runs.
- Cover the case where `subscription_manager_id` is missing but identity contains an owner, and confirm that a `ValidationException` is still raised on mismatch.

This will ensure the refactor away from `canonical_facts` preserves owner assignment semantics.

Suggested implementation:

```python
        id = "host-id"
        org_id = "org-id"
        account = "acct"
        insights_id = generate_uuid()
        subscription_manager_id = generate_uuid()
        reporter = "puptoo"
        system_profile_facts = {"owner_id": "owner-id"}
        groups = [serialized_group]

```

To fully implement the test coverage requested in your review comment, you will also need to:

1. **Add an end-to-end MQ test for `subscription_manager_id` owner assignment (success case)**  
   Place this near the existing MQ owner-related tests (e.g., close to tests that currently verify `_set_owner` behavior based on `canonical_facts` or identity).  
   The new test should:
   - Construct a host payload with:
     - `subscription_manager_id = generate_uuid()`
     - `insights_id` set (if required by the payload schema)
     - No dependency on `canonical_facts` for determining the owner (i.e., the test should not set a `canonical_facts["insights_id"]` that is used by `_set_owner`).
     - `reporter` equal to `"rhsm-conduit"` (and another test for `"rhsm-system-profile-bridge"` if behavior is identical but you want explicit coverage).
   - Call the same helper / path that current MQ tests use to send a message through `host_mq` (e.g., something like `emit_event` -> handler -> `_set_owner`), so `_set_owner` is exercised in the same way as in production.
   - Assert that:
     - `system_profile_facts["owner_id"]` on the resulting host matches the formatted `subscription_manager_id` (whatever transformation `_set_owner` currently applies).
     - If the static system profile is persisted separately in your tests (e.g., `host.system_profile_facts` or a separate static profile store), assert it is consistent with the formatted `subscription_manager_id`.

   A skeleton that you will need to adjust to your test utilities and models might look like:

   ```python
   def test_mq_set_owner_from_subscription_manager_id_rhsm_conduit(
       mq_test_client,
       db_session,
       generate_uuid,
       identity_header_factory,
   ):
       subscription_manager_id = generate_uuid()

       host_payload = {
           "id": "host-id",
           "org_id": "org-id",
           "account": "acct",
           "insights_id": str(generate_uuid()),
           "subscription_manager_id": str(subscription_manager_id),
           "reporter": "rhsm-conduit",
           "system_profile": {
               # owner_id will be overridden by _set_owner
               "owner_id": "pre-existing-owner",
           },
       }

       identity = identity_header_factory(account_number="acct", org_id="org-id")
       # Use whatever helper you have that goes through the MQ flow and `_set_owner`
       created_host = send_mq_host_and_get_created_host(
           mq_test_client, host_payload, identity
       )

       expected_owner = format_subscription_manager_id(subscription_manager_id)
       assert created_host.system_profile_facts["owner_id"] == expected_owner
       # If you have a second representation (static stored profile), assert that too.
   ```

   Then duplicate / parametrize this test for `"rhsm-system-profile-bridge"` as the reporter.

2. **Add a test for the mismatch ValidationException with subscription_manager_id missing but identity owner present**  
   Near the existing tests that confirm `_set_owner` raises `ValidationException` when identity owner mismatches what’s on the host, add a new test that:
   - Constructs a host payload **without** `subscription_manager_id`.
   - Uses an identity whose owner is set (e.g., an `x-rh-identity` with an `entitlements` owner or something equivalent in your code base).
   - Sets `system_profile_facts["owner_id"]` on the host payload such that it conflicts with the identity’s owner.
   - Sends the host through the same MQ path that triggers `_set_owner`.
   - Asserts that a `ValidationException` is raised (using `pytest.raises(ValidationException)` or your existing pattern).

   Example skeleton (adapt names to match your code):

   ```python
   def test_mq_owner_mismatch_without_subscription_manager_id_raises_validation(
       mq_test_client,
       db_session,
       identity_header_factory,
   ):
       conflicting_owner = "owner-from-host"
       identity_owner = "owner-from-identity"

       host_payload = {
           "id": "host-id",
           "org_id": "org-id",
           "account": "acct",
           "insights_id": str(generate_uuid()),
           # subscription_manager_id intentionally omitted
           "reporter": "rhsm-conduit",
           "system_profile": {
               "owner_id": conflicting_owner,
           },
       }

       identity = identity_header_factory(
           account_number="acct",
           org_id="org-id",
           owner=identity_owner,
       )

       with pytest.raises(ValidationException):
           send_mq_host_and_get_created_host(
               mq_test_client, host_payload, identity
           )
   ```

3. **Reuse existing helpers and conventions**  
   - Replace `send_mq_host_and_get_created_host`, `format_subscription_manager_id`, and any other placeholder helpers in the skeletons with the actual functions/fixtures used elsewhere in `tests/test_host_mq_service.py` for MQ flows and owner tests.
   - Ensure imports (`pytest`, `ValidationException`, any MQ client fixtures, identity helpers, etc.) are consistent with the rest of the file. If similar tests already import `ValidationException` or identity factories, reuse those imports and fixtures.

With these additional tests in place, you will have explicit coverage for `_set_owner` behavior when using `subscription_manager_id` and when falling back to identity with `subscription_manager_id` missing, including the mismatch error path.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@FabriciaDinizRH
Copy link
Contributor Author

/retest

@FabriciaDinizRH FabriciaDinizRH force-pushed the remove-cf-from-mq branch 7 times, most recently from 96c42af to 2395cbe Compare December 18, 2025 16:13
@FabriciaDinizRH
Copy link
Contributor Author

/retest

@FabriciaDinizRH FabriciaDinizRH force-pushed the remove-cf-from-mq branch 5 times, most recently from 970428d to f213120 Compare December 20, 2025 00:02
@FabriciaDinizRH
Copy link
Contributor Author

/retest

@FabriciaDinizRH
Copy link
Contributor Author

/retest

1 similar comment
@FabriciaDinizRH
Copy link
Contributor Author

/retest

ezr-ondrej
ezr-ondrej previously approved these changes Dec 22, 2025
Copy link
Member

@ezr-ondrej ezr-ondrej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 Let see what Fero thinks :)

@FabriciaDinizRH
Copy link
Contributor Author

I'll update the branch once all tests pass

@rodrigonull
Copy link
Member

@FabriciaDinizRH The changes look good, you will need to update the iqe test: test_hosts_mq_events_kafka_update_preserves_insights_id to account for the insights_id that could be none in the old canonical facts json data and it defaults to 00000000-0000-0000-0000-000000000000 in the new insights_id individual field.

@FabriciaDinizRH FabriciaDinizRH force-pushed the remove-cf-from-mq branch 3 times, most recently from 39029b5 to f614bf2 Compare December 22, 2025 18:40
refactor(RHINENG-22723): remove canonical_facts from MQ
@FabriciaDinizRH
Copy link
Contributor Author

/retest

2 similar comments
@FabriciaDinizRH
Copy link
Contributor Author

/retest

@FabriciaDinizRH
Copy link
Contributor Author

/retest

Copy link
Contributor

@fstavela fstavela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this. I left a few comments/questions, please take a look.

"context": {
"inventory_id": host.get("id"),
"hostname": canonical_facts.get("fqdn", ""),
"hostname": host.get("fqdn") or "",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional:

Suggested change
"hostname": host.get("fqdn") or "",
"hostname": host.get("fqdn", ""),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fstavela for some reason this breaks IQE notification tests, I'll undo this change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a situation similar to the insights_id

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is strange, the two statements should be exactly the same 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to approve, as I think it doesn't really matter, but it's possible that the failure was caused just by some flakiness, because I don't know how this could affect anything.

@FabriciaDinizRH
Copy link
Contributor Author

@fstavela before we were getting the insights_id from the canonical facts, and it already was a string. Now we're getting it directly from the host, where it is stored as UUID. Since insights_id is not supposed to be None, only a UUID or the zero UUID, the str() is not a major concern

@fstavela
Copy link
Contributor

@fstavela before we were getting the insights_id from the canonical facts, and it already was a string. Now we're getting it directly from the host, where it is stored as UUID. Since insights_id is not supposed to be None, only a UUID or the zero UUID, the str() is not a major concern

Thank you for the explanation!

fstavela
fstavela previously approved these changes Dec 23, 2025
Copy link
Contributor

@fstavela fstavela left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Co-authored-by: František Viktor Stavěla <[email protected]>
fstavela
fstavela previously approved these changes Dec 23, 2025
@FabriciaDinizRH FabriciaDinizRH merged commit bd0395a into RedHatInsights:master Dec 23, 2025
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants