Skip to content

Add support for handling system profiles#1388

Open
rtamalin wants to merge 30 commits intomasterfrom
data_profiles_support
Open

Add support for handling system profiles#1388
rtamalin wants to merge 30 commits intomasterfrom
data_profiles_support

Conversation

@rtamalin
Copy link
Collaborator

@rtamalin rtamalin commented Nov 11, 2025

Define two new tables:

  • profiles This table stores the complete profiles that have been provided by clients as part of the system registration (announce_system) or keepalive (update) request handling, which consists of three fields, product_type, identifier and data.
  • system_profiles This table is used to link systems records to their associated profiles in a one to many relationship, allowing for the same profile records to be shared between multiple systems

Add support for handling any provided profiles to the handlers for the announce_system and update requests in the connect V3 API.

This support involves checking that the provided profiles are valid and, when provided for the first time, complete, with new profiles being added to the profiles table, and appropriate references being added or updated via the system_profiles table to associate systems with their corresponding profiles.

Similarly, when incomplete profiles are included in update requests, the corresponding complete profiles will be retrieved and used, with any missing profiles being dropped and considered problematic.

Additionally, when problematic profiles are detected, whether invalid or incomplete, they will be ignored and the X-System-Profiles-Action header will be included in the response, with a value of clear-cache.

In the System model a new custom attribute, complete_profiles, with an accompanying assignment method, complete_profiles=, is used to handle the assignment of complete profiles to be associated with a system record as part of record creation or update.

If racing requests attempt to create the same new profile, one will succeed and the others will rescue the ActiveRecord:RecordNotUnique exception and instead lookup the newly created profile record.

Add support for optimizing the content of send_bulk_system_update() request to only include the full profile, including both identifier and data fields, on the first occurrence of a profile within the list of serialized systems, with subsequent profile occurrences dropping the data field.

Update the test cases to validate correct operation of new models, and associated request handling changes.

Related Jira: TEL-265

@rtamalin
Copy link
Collaborator Author

FYI while doing some heavy stress testing I'm seeing some DB stall issues

@rtamalin
Copy link
Collaborator Author

Investigating and the issue may be related to recent changes on the master branch

@rtamalin
Copy link
Collaborator Author

rtamalin commented Nov 12, 2025

Yup having reverted my test env to be based upon current master branch I still see the same issue... Digging further.

Copy link
Member

@rjschwei rjschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments should start with uppercase to stay consistent throughout the code base

@rtamalin
Copy link
Collaborator Author

Determined that the issue was due to a recent upgrade of containerd.io on my system to 2.1.5, which has a very low default soft file limit, leading to problems when there were lots of active connections...

@rtamalin rtamalin marked this pull request as ready for review November 13, 2025 16:18
@ngetahun ngetahun added the 2 reviewers A second reviewer is requested. label Nov 17, 2025
@ngetahun ngetahun self-assigned this Nov 17, 2025
@rtamalin
Copy link
Collaborator Author

rtamalin commented Nov 17, 2025

@ngetahun We don't plan for this to merge until after the 2.24 release goes out as well... If there is some sort of label pattern that should be used to indicate that, I'm happy to use it. Or should I just go ahead and just add a Post-v2.24 label?

Never mind I see the 2.25 label - so I added that...

@rtamalin rtamalin added the 2.25 label Nov 17, 2025
@rtamalin
Copy link
Collaborator Author

@paragjain0910 @felixsch @mssola I've updated the PR with the implementation of the optimized serialization of systems in the send_bulk_system_update() request payload. I identified relatively minimal changes to the existing problematic test case implementation to get it passing again.

I also spotted and fixed a minor error in the update request handler that could wipe existing profile associations if profiles were provided to the update, but all were either unknown incomplete or invalid profiles. An additional test case has been added to cover this scenario.

@rtamalin rtamalin marked this pull request as draft November 19, 2025 22:25
@rtamalin
Copy link
Collaborator Author

rtamalin commented Nov 19, 2025

Found an issue when stress testing under heavy load, have identified a promising fix that I will try out tomorrow.

@paragjain0910
Copy link
Contributor

@rtamalin It would be great if you could also include the testing steps for the endpoints. That would really help me in testing the scenarios.

@rtamalin
Copy link
Collaborator Author

@rtamalin It would be great if you could also include the testing steps for the endpoints. That would really help me in testing the scenarios.

I use the github.com/rtamalin/rmt-client-tester tooling that I created in August for exploring the RMT performance when creating profiles with my initial investigative hacking. It is a simple Go based app framework that can be used to precreate a (large) set of system hardware info JSON blobs, currently with the system profiles included (though I plan to move them out to simplify/clean up the code this week, time permitting) and then I can repeatably run a tool that submits register, update and deregister requests based upon those generated hardware info blobs... I think the current README.md is reasonably up to date but will check today...

@rtamalin
Copy link
Collaborator Author

FYI @paragjain0910 @rjschwei over the weekend I was able to validate with some heavy stress testing that my proposed solution to the DB deadlock issues I was seeing in the Profile ensure_profile_exists() routine (https://github.com/SUSE/rmt/pull/1388/files#diff-0414a45947b697e51d740c5ba32eb6c6445f4c795807250ce68106de5ff7e414R73 in the current state of things) works.

This involves using the "LOCK IN SHARE MODE" directive to the lock associated with the where() query to ensure that shared rather than exclusive locking is being used when trying to lookup the Profile entry created by a racing request handler.

However when investigating this I realized that the current PubCloud interception of the announce_system request handling is by passing the system profile processing that is implemented in the standard RMT workflow... I will look in addressing this today.

rtamalin and others added 28 commits February 27, 2026 13:25
Add an addition unique index spanning the system_id and profile_id
fields in the system_profiles table to ensure that a given profile
can only be associated with a given system once.
We should consider profiles that have an empty identifier value as
invalid, so update the filter_profiles() method to check for and
treat them as invalid.
Update the System.complete_profiles=() method to avoid deletion and
recreation of linking records for profile associations that haven't
changed.
Enhance the SystemSerializer to take an optional serialized_profiles
set as an initialize() argument, defaulting a new empty set if not
specified, and setup a serializer instance variable holding it.

This serialized_profiles instance variable set tracks profile.id's
and is used to determine if the serializer has previously serialized
a specific profile or not, with the first serialization including
the data field, and subsequent serializations dropping it.

Update the send_bulk_system_update() request generation to setup a
new serialized_profiles set for each batch of systems being processed
ensuring that only the first occurrence of a given profile includes
the data field.

Update tests to exercise the new SystemSerializer initialization and
optional system profiles data field inclusion, and verify that the
expected profiles are serialized by send_bulk_system_update().
Improve the debug messages logged by the announce_system and update
request handlers to report the profile types for problematic profiles
identified.

Additionally enhance the Profile.filter_profiles() method to return
hashes with symbolized keys to simplify determining which incomplete
profiles are unknown.

Only update the profiles associated with a system if valid complete
profiles were either provided or identified from incomplete profiles,
and add a test to ensure that existing profile associations are not
replaced if not valid complete profiles were provided in the update.
Under heavy stress test loading a subtle race was identified where
the find_by() in the rescue block was failing to find the Profile
record created by a racing request handler. By locking the table
before calling find_by() we ensure that the query is run against the
current state of the table, rather than a potentially stale read-only
version that was restored as part of the rollback, which wouldn't
include the yet to be created new record.
The recent ensure_profile_exists() changes used nested transactions
to minimize the rollback cost of potential collisions, but doing so
incurs a general overhead that may not be beneficial unless racing
collisions occur very frequently. For now the nested transaction
wrapping has been converted to a comment about it's potential use
if the need arrises at a future point.
In ensure_profile_exists() use shared locking in the rescue block when
querying for the profile created by a racing request handler to avoid
triggering a deadlock.
The scc_proxy engine supports PubCloud specific handling of the
announce_system and other requests both in the case of BYOS and
PAYG instances, which needs to be updated appropriately handle
profiles:

   * For BYOS clients, when proxying announce_system requests to
     the SCC, any provided system_profiles should be included in
     the request sent to the SCC, and if the response from the SCC
     includes the X-System-Profiles-Action header then set the
     equivalent header in the response generated by the RMT.
   * For PAYG clients, check for the existence of system_profiles
     in the request and handle them appropriately, similarly to
     the standard RMT announce_system request handling.

To support this the SccProxy.announce_system_scc() has been changed
to return both the JSON parsed response body, and response headers,
allowing them to be checked by callers.

Additionally the announce_system handler in the scc_proxy engine has
been updated to avoid replacing the standard Rails response object
with the JSON parse response body returned by announce_system_scc().

Also refactored the standard RMT announce_system handling to move
system profile processing to a helper method which can be called
by the scc_proxy implementation as needed.
Enhance announce_base_product_hybrid() to include a system's profiles
in the announce_system request that it sends to the SCC.

Add a new as_payload() helper method to the Profile model that returns
the payload representation of a Profile record.

Update the SystemSerializer to leverage the as_payload() helper when
serializing systems.

Simplify a number of unit tests by leveraging the as_payload() helper
and a new test to verify correct operation of the helper.

Update scc_proxy unit tests to ensure that the inclusion of the system
profiles by announce_base_product_hybrid() is exercised.
Co-authored-by: Thomas Schmidt <tschmidt@suse.de>
Add an explanatory comment as to why locking is used in the orphaned
profile destroy handler.

Add a log message in the rescue block for the case where a profile
entry was already destroyed by a racing orphaned profile destroy
handler.
Move the changelog entry for this work to the newly added 2.26 section.
Previously Parag identified that the Rails automatic system.profiles
update management in the complete_profiles=() custom attribute handler
was deleting and recreating entries in the linking system_profiles
table even when the corresponding profiles weren't changing. I had
implemented explicit management of the profiles updates to avoid this,
and a corresponding test case to catch the problem.

Tom recently questioned why we needed the explcit update management
code, and retesting with the automatic Rails update management handling
restored no longer sees the system_profiles entries changing.

It appears that some other change that was made to the code base has
had a side effect that fixed whatever was causing the Rails automatic
update management to recreate the entries, so we no longer need to
implement explicit management of the system.profiles update.
Have also updated associated design doc to call out why these choices
differ from how things are implemented in the equivalent SCC profile
support.
Remove TODO that is no longer relevant - destroy_async is not currently available in the RMT ecosystem, and not sure there is a benefit anyway.
Add comment summarising reason for using an after_commit.
Combine multiple class method definitions inside a block.
The recent rebase to pull in the most recent changes from the master
branch resulted in rubocop failures in the scc_proxy engine changes:
  * class too long for the class Engine definition
  * method too long for the announce_system method in that class

I have addressed the latter by defining an additional helper method
for handling the X-System-Profiles-Action header processing.

For the former I've chosen to just disable this class too long check
for this class definition.
@rtamalin rtamalin force-pushed the data_profiles_support branch from b40ae6c to 9ddefdc Compare February 27, 2026 18:30
@rtamalin
Copy link
Collaborator Author

Rebased again to pick up additional recently landed changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2 reviewers A second reviewer is requested. 2.26 do not merge do not merge until this label is removed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants