-
Notifications
You must be signed in to change notification settings - Fork 900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Opt into purging by destroy for container entities, use delete elsewhere #23389
Conversation
@jrafanie Just for clarification, this changes everything to use destroy as part of the purger, but what schedules the purging of those? Or is that still TODO? Is the plan to let things orphan out the way they do now and then change that in a followup? Or is the plan to switch to destroy and also change the models to do dependent => destroy? |
Just the container entities handled by the purger were changed to destroy. Everything else is still delete. The schedules remain the same:
under the covers, the scheduler calls the same methods with the same interval. When they're executed in batches, they'll now use destroy.
Most of container* associated with these container entities that were missed as highlighted in #23307, such as container conditions/volume/etc. should already be dependent destroy from the entities (container/groups/node/project/etc.) |
FYI, I added a bullet list to the PR description of what container entities and associations I'm verifying. |
0cd93cb
to
0311e21
Compare
I'd like @agrare to also review this so it plays nice with the refresh workers possibly deleting entities. |
Anything being considered for purging should already have been disconnected/archived by the RefreshWorker so there should be no contention here.
|
@@ -38,9 +38,9 @@ class ContainerImage < ApplicationRecord | |||
:inverse_of => :resource | |||
has_one :last_scan_result, :class_name => "ScanResult", :as => :resource, :dependent => :destroy, :autosave => true | |||
|
|||
has_many :metric_rollups, :as => :resource, :dependent => :nullify, :inverse_of => :resource | |||
has_many :metrics, :as => :resource, :dependent => :nullify, :inverse_of => :resource | |||
has_many :vim_performance_states, :as => :resource, :dependent => :nullify, :inverse_of => :resource |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure why these were nullify
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah weird - in other models we just let those be orphaned, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, this is the only one that does nullify on metrics* and vim_perf*
@@ -2,13 +2,13 @@ class ContainerImageRegistry < ApplicationRecord | |||
belongs_to :ext_management_system, :foreign_key => "ems_id" | |||
|
|||
# Associated with images in the registry. | |||
has_many :container_images, :dependent => :nullify |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be handled by the purger as long as refresh marks them as archived via deleted_on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ContainerImages are also related to Containers which could still be around after a container_image_registry is deleted. I think dependent nullify is appropriate here
@@ -44,7 +44,7 @@ class ContainerNode < ApplicationRecord | |||
has_many :metrics, :as => :resource | |||
has_many :metric_rollups, :as => :resource | |||
has_many :vim_performance_states, :as => :resource | |||
has_many :miq_alert_statuses, :as => :resource | |||
has_many :miq_alert_statuses, :as => :resource, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why these 2 were not being destroyed ☝️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seeing thousands of these likely because nodes don't have the same fast lifecycle as containers/pods... should be handled by existing node purgers once this is in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this looks like a good fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: consider a data migration to do orphan cleanup for anything like this that was already removed and associated rows left orphaned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bullet item to do a data migration in the high level issue: #23394
app/models/container_project.rb
Outdated
has_many :all_container_groups, :class_name => "ContainerGroup", :inverse_of => :container_project | ||
has_many :archived_container_groups, -> { archived }, :class_name => "ContainerGroup" | ||
has_many :persistent_volume_claims | ||
has_many :persistent_volume_claims, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WAT, yes, we probably don't have that many container projects but still, I think these should all be destroyed... otherwise the purger for projects will just leave these orphaned.
@agrare do any of these have a chance to be impossible to delete in the UI/backend due to many tens of thousands of rows? Maybe builds? I'm not sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
persistent_volume_claims
are associated to a container_volume
which does the dependent destroy,
app/models/container_volume.rb: belongs_to :persistent_volume_claim, :dependent => :destroy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
app/models/container_service.rb
Outdated
@@ -6,7 +6,7 @@ class ContainerService < ApplicationRecord | |||
|
|||
belongs_to :ext_management_system, :foreign_key => "ems_id" | |||
has_and_belongs_to_many :container_groups, :join_table => :container_groups_container_services | |||
has_many :container_routes | |||
has_many :container_routes, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this one correct @agrare ? projects has many routes and services has many routes? Should projects have routes through services? Or maybe there can be routes not attached to a service? 🤷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we leaving routes orphaned @jrafanie ?
A container route is its own "top-level" managed object so we would destroy it when we get the destroy event from k8s (versus e.g. a container which is only part of a container_group and wouldn't get its own destroy event) I'm surprised it isn't at least dependent nullify though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. I'm seeing less than 200 routes where container conditions, env vars, volumes, security contexts, port configs, and custom attributes are the big ones in the container area over 1 million rows.
I can change it to nullify
. Are there others here that should be treated in the same way? I don't want to change behavior. I'll verify with my table counts from various databases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept it the same and added a comment
@@ -1,7 +1,7 @@ | |||
class PersistentVolumeClaim < ApplicationRecord | |||
belongs_to :ext_management_system, :foreign_key => "ems_id" | |||
belongs_to :container_project | |||
has_many :container_volumes | |||
has_many :container_volumes, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure why this wasn't being destroyed... too many maybe? We don't have a separate purger for volumes though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A persistent volume claim will point to a persistent volume when the claim is satisfied, but it doesn't own the volume. The volume could be reused by a future claim, so deleting the claim leaves the volume around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or more accurately, if the PVC is marked as "Retain", then on deleting it won't also delete the PV.
Kubernetes is hard this way, because technically all these objects are loosely bound to each other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add a comment. That does make sense when thinking about a claim.
app/models/container_build.rb
Outdated
@@ -9,7 +9,7 @@ class ContainerBuild < ApplicationRecord | |||
:as => :resource, | |||
:inverse_of => :resource | |||
|
|||
has_many :container_build_pods | |||
has_many :container_build_pods, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing only 10s of these, perhaps they're managed elsewhere, such as delete evens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container_build_pods
are a top level managed object which should be deleted during refresh
has_many :containers, :through => :container_images | ||
has_many :container_groups, :through => :container_images | ||
|
||
# Associated with serving the registry itself - for openshift's internal | ||
# image registry. These will be empty for external registries. | ||
has_many :container_services | ||
has_many :container_services, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seeing tens of registries to many hundreds of services. Should be handled if registries are removed with the container manager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ContainerServices
should be deleted by refresh when they are removed
I'm starting to wonder if this is a refresh problem. I thought just about everything in Kubernetes was a "top-level" object, since you can create objects willy-nilly that do anything, and there are loose associations between many things by using labels and selectors. There's not really ownership references between them. Are we just missing events during refresh for destroying these orphaned thing, or perhaps they just aren't in the events/watches? |
app/models/container_project.rb
Outdated
has_many :container_routes | ||
has_many :container_replicators | ||
has_many :container_services | ||
has_many :container_routes, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many 10s of these. Maybe handled by event handling or just lower lifecycle churn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
container_routes
are a top-level managed entity that should be deleted by refresh when they are removed from k8s
app/models/container_project.rb
Outdated
has_many :container_replicators | ||
has_many :container_services | ||
has_many :container_routes, :dependent => :destroy | ||
has_many :container_replicators, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
app/models/container_project.rb
Outdated
has_many :container_services | ||
has_many :container_routes, :dependent => :destroy | ||
has_many :container_replicators, :dependent => :destroy | ||
has_many :container_services, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See ☝️ registries: https://github.com/ManageIQ/manageiq/pull/23389/files#r2007811459
There are hundreds of these to hundreds projects. Low churn or handled elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
app/models/container_project.rb
Outdated
has_many :containers, :through => :container_groups | ||
has_many :container_images, -> { distinct }, :through => :container_groups | ||
has_many :container_nodes, -> { distinct }, :through => :container_groups | ||
has_many :container_quotas, -> { active }, :inverse_of => :container_project | ||
has_many :container_quota_scopes, :through => :container_quotas | ||
has_many :container_quota_items, :through => :container_quotas | ||
has_many :container_limits | ||
has_many :container_limits, :dependent => :destroy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0 from example data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are a top-level object which should be deleted by the refresher
Oh, the k8s uid is a really good indicator - I assume they'd always be the |
|
app/models/container_service.rb
Outdated
@@ -6,7 +6,7 @@ class ContainerService < ApplicationRecord | |||
|
|||
belongs_to :ext_management_system, :foreign_key => "ems_id" | |||
has_and_belongs_to_many :container_groups, :join_table => :container_groups_container_services | |||
has_many :container_routes | |||
has_many :container_routes, :dependent => :destroy # TODO: consider nullify for things like routes, which should be removed by event handling |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Routes are a top-level openshift collection that should be deleted by the refresher
bb2d6d3
to
473e094
Compare
app/models/mixins/purging_mixin.rb
Outdated
destroyed = batch_records.destroy_all | ||
destroyed.detect { |d| !d.destroyed? }.tap do |failed| | ||
raise "failed removing record: #{failed.class.name} with id: #{failed.id} with error: #{failed.errors.full_messages}" if failed | ||
end |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test this, this is what I did. I added this to fail to delete containers if their port configs still exist... It's a silly failure but gives an idea of how it looks in the logs:
index f2093b1fa4..63352d093c 100644
--- a/app/models/container.rb
+++ b/app/models/container.rb
@@ -10,7 +10,7 @@ class Container < ApplicationRecord
has_one :container_replicator, :through => :container_group
has_one :container_project, :through => :container_group
belongs_to :container_image
- has_many :container_port_configs, :dependent => :destroy
+ has_many :container_port_configs, :dependent => :restrict_with_error
has_many :container_env_vars, :dependent => :destroy
has_one :container_image_registry, :through => :container_image
has_one :security_context, :as => :resource, :dependent => :destroy
diff --git a/config/settings.yml b/config/settings.yml
[----] I, [2025-03-24T18:05:13.358085#93726:88cc] INFO -- evm: MIQ(Container.purge_by_date) Purging containers older than [2025-03-23 22:05:13 UTC]...
[----] I, [2025-03-24T18:05:13.358273#93726:88cc] INFO -- evm: MIQ(Container.purge_in_batches) Purging 1000 containers.
[----] E, [2025-03-24T18:05:15.513443#93726:88cc] ERROR -- evm: MIQ(MiqQueue#deliver) Message id: [25979608], Error: [failed removing record: ManageIQ::Providers::Openshift::ContainerManager::Container with id: 1835 with error: ["Cannot delete record because dependent manageiq::providers::openshift::containermanager::container: container port configs exist"]]
[----] E, [2025-03-24T18:05:15.513538#93726:88cc] ERROR -- evm: [RuntimeError]: failed removing record: ManageIQ::Providers::Openshift::ContainerManager::Container with id: 1835 with error: ["Cannot delete record because dependent manageiq::providers::openshift::containermanager::container: container port configs exist"] Method:[block (2 levels) in <class:LogProxy>]
[----] E, [2025-03-24T18:05:15.513581#93726:88cc] ERROR -- evm: /Users/joerafaniello/Code/manageiq/app/models/mixins/purging_mixin.rb:235:in `block (2 levels) in purge_in_batches'
<internal:kernel>:90:in `tap'
/Users/joerafaniello/Code/manageiq/app/models/mixins/purging_mixin.rb:234:in `block in purge_in_batches'
<internal:kernel>:187:in `loop'
/Users/joerafaniello/Code/manageiq/app/models/mixins/purging_mixin.rb:208:in `purge_in_batches'
/Users/joerafaniello/Code/manageiq/app/models/mixins/purging_mixin.rb:93:in `purge_by_date'
/Users/joerafaniello/Code/manageiq/app/models/miq_queue.rb:517:in `block in dispatch_method'
/Users/joerafaniello/.gem/ruby/3.3.6/gems/timeout-0.4.3/lib/timeout.rb:185:in `block in timeout'
/Users/joerafaniello/.gem/ruby/3.3.6/gems/timeout-0.4.3/lib/timeout.rb:38:in `handle_timeout'
/Users/joerafaniello/.gem/ruby/3.3.6/gems/timeout-0.4.3/lib/timeout.rb:194:in `timeout'
/Users/joerafaniello/Code/manageiq/app/models/miq_queue.rb:515:in `dispatch_method'
/Users/joerafaniello/Code/manageiq/app/models/miq_queue.rb:484:in `block in deliver'
/Users/joerafaniello/Code/manageiq/app/models/user.rb:390:in `with_user_group'
/Users/joerafaniello/Code/manageiq/app/models/miq_queue.rb:484:in `deliver'
/Users/joerafaniello/Code/manageiq/app/models/miq_queue.rb:508:in `deliver_and_process'
/Users/joerafaniello/Code/manageiq/lib/vmdb/console_methods.rb:67:in `block in simulate_queue_worker'
Fixes most of the issues in ManageIQ#23307 We were leaving around lots of orphaned container* rows when we removed the container entities. This change allows us to opt-into use destroy on the primary table in situations where know the associated records are NOT going to be many tens of thousands of rows. If they are, those associations should NOT be using dependent :destroy, and have their own purger.
Add comment about pvcs from projects as pvcs are removed via the container volume belongs to. Add commment about pvc having container_volumes that can live on their own and be used by a different claim, no need to delete the volume when a claim is removed. Leave it to the purger where we already have purgers No other model nullifies metrics|states.
385b3b9
to
47b2ba1
Compare
@@ -1,6 +1,19 @@ | |||
RSpec.describe PurgingMixin do | |||
let(:example_class) { PolicyEvent } | |||
let(:purge_date) { 2.weeks.ago } | |||
purge_by_delete_classes, purge_by_destroy_classes = ActiveRecord::Base.descendants.select { |m| m.ancestors.include?(PurgingMixin) && m.base_model == m }.partition { |m| m.purge_method == :delete } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to add the base_model check as we don't need to test all the descendant classes such as:
ManageIQ::Providers::Azure::ContainerManager::ContainerGroup.purge_method is destroy
ManageIQ::Providers::Kubernetes::ContainerManager::ContainerGroup.purge_method is destroy
ManageIQ::Providers::Vmware::ContainerManager::ContainerGroup.purge_method is destroy
ManageIQ::Providers::OracleCloud::ContainerManager::Container.purge_method is destroy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: add base_model? check and use it here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah as discussed there's a base_class and base_class? method, so base_model should match that pattern with a base_model? method. Then we can use that here.
@@ -3,6 +3,9 @@ module Purging | |||
extend ActiveSupport::Concern | |||
include PurgingMixin | |||
|
|||
# According to 022e15256fd07fa7bf5b3ade7ce16b13daa87b84 | |||
# This is necessary because ContainerQuotaItem may be archived due to edits | |||
# to parent ContainerQuota that is still alive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a bullet item to review if archiving/purging is needed for container quota/quota scopes/quota items in #23394
@@ -1,7 +1,8 @@ | |||
module ActiveRecord | |||
class Base | |||
class << self | |||
alias_method :base_model, :base_class | |||
alias_method :base_model, :base_class | |||
alias_method :base_model?, :base_class? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the new alias to simplify the purging_mixin test 👇
Fixes most of the issues in #23307
We were leaving around lots of orphaned container* rows when we removed the container entities. This change allows us to opt-into use destroy on the primary table in situations where know the associated records are NOT going to be many tens of thousands of rows. If they are, those associations should NOT be using dependent :destroy, and have their own purger.
TODO:
Note: I've moved additional tasks for more non-container purging to #23394