Skip to content

fix(removal): add conditional lifecycle guards for storage attachment advancement#21750

Open
nvinuesa wants to merge 1 commit intojuju:4.0from
nvinuesa:app_removal_fk
Open

fix(removal): add conditional lifecycle guards for storage attachment advancement#21750
nvinuesa wants to merge 1 commit intojuju:4.0from
nvinuesa:app_removal_fk

Conversation

@nvinuesa
Copy link
Member

@nvinuesa nvinuesa commented Feb 10, 2026

This patch ensures we remove the storage attachments before attempting to remove the machine. This had to be done from the removal domain. Initially this was done in a separate patch (#21761).

Storage attachments, filesystem attachments, and volume attachments were
being unconditionally advanced from dying to dead in removal job
processors. This violates lifecycle rules when provisioners may still be
responsible for them.

Add guards so that:

  • Storage attachment dying->dead: only if force, unit is dead, or the
    storage-attached hook never fired (checked via unit_state YAML)
  • Filesystem/volume attachment dying->dead: only if force, or the
    storage attachment is dead/gone AND the filesystem/volume is
    machine-scoped AND the owning machine is gone
  • Filesystem/volume attachments are now marked as dead before deletion
    when dying

The rules requested in #21761 (comment) were taken into account to fulfill the storage attachment removal: new types were added and queries to ensure that the storage attachment hook was not fired, we query the state of the storage attachment before deciding to advance its life.

QA steps

First pack the dummy-storage charm (from /testcharms). Then deploy it and remove the app before the machine reaches the RUNNING state:

juju deploy ./testcharms/charms/dummy-storage/dummy-storage_amd64.charm --storage multi-fs=10M
juju status # just to check that the app is deploying
juju remove-application dummy-storage

You shouldn't see any errors in the logs and the app/unit/machine should be removed (after a few moments).

Links

Issue: Fixes #21717.

Jira card: JUJU-9147

"DELETE FROM unit_agent_presence WHERE unit_uuid = $entityUUID.uuid",
"DELETE FROM secret_unit_consumer WHERE unit_uuid = $entityUUID.uuid",
"DELETE FROM storage_unit_owner WHERE unit_uuid = $entityUUID.uuid",
"DELETE FROM storage_attachment WHERE unit_uuid = $entityUUID.uuid",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Storage attachment is its own entity, it should not be deleted here, that is like deleting the machine because the unit is being deleted.

Suggested change
"DELETE FROM storage_attachment WHERE unit_uuid = $entityUUID.uuid",

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer applicable, I applied the fix you had requested in the other PR (#21761 (comment)) and make the storage attachment removal in this patch now. See cdc6e00.

Copy link
Member

@hpidcock hpidcock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, getting closer.

Comment on lines +506 to +508
MachineScopeProvisioned: dbVal.ProvisionScopeID == 1,
StorageAttachmentDeadOrGone: dbVal.StorageAttachmentLifeID == int(life.Dead) || dbVal.StorageAttachmentLifeID == -1,
MachineGone: dbVal.MachineGone == 1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is business logic in state, I know I'm being pedantic, but these should just be passed back.

// The following errors may be returned:
// - [storageerrors.StorageAttachmentNotFound] if the storage attachment
// no longer exists in the model.
func (st *State) GetStorageAttachmentHookInfo(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (st *State) GetStorageAttachmentHookInfo(
func (st *State) GetUniterInternalStorageStateByStorageAttachmentUUID(

Comment on lines +373 to +375
CASE WHEN mf.machine_uuid IS NULL THEN 1
WHEN m.uuid IS NULL THEN 1
ELSE 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is business logic.

FROM storage_filesystem_attachment sfa
JOIN storage_filesystem sf ON sfa.storage_filesystem_uuid = sf.uuid
LEFT JOIN storage_instance_filesystem sif ON sf.uuid = sif.storage_filesystem_uuid
LEFT JOIN storage_attachment sa ON sif.storage_instance_uuid = sa.storage_instance_uuid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't care about the storage attachment life, because the filesystem attachment should only go to dying/dead once the storage attachment is dead.

q := `
SELECT sf.provision_scope_id AS &provisionedAttachmentAdvanceInfo.provision_scope_id,
COALESCE(sa.life_id, -1) AS &provisionedAttachmentAdvanceInfo.storage_attachment_life_id,
CASE WHEN mf.machine_uuid IS NULL THEN 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

machine_filesystem is a filesystem owned by the machine, if we've deleted this row, we no longer are owned by a machine and we should not do anything with this filesystem attachment.

The machine_filesystem should only be removed when the filesystem that is owned by the machine is removed, i.e. before the machine.

SELECT sf.provision_scope_id AS &provisionedAttachmentAdvanceInfo.provision_scope_id,
COALESCE(sa.life_id, -1) AS &provisionedAttachmentAdvanceInfo.storage_attachment_life_id,
CASE WHEN mf.machine_uuid IS NULL THEN 1
WHEN m.uuid IS NULL THEN 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tautological business logic.

Comment on lines +461 to +473
SELECT sv.provision_scope_id AS &provisionedAttachmentAdvanceInfo.provision_scope_id,
COALESCE(sa.life_id, -1) AS &provisionedAttachmentAdvanceInfo.storage_attachment_life_id,
CASE WHEN mv.machine_uuid IS NULL THEN 1
WHEN m.uuid IS NULL THEN 1
ELSE 0
END AS &provisionedAttachmentAdvanceInfo.machine_gone
FROM storage_volume_attachment sva
JOIN storage_volume sv ON sva.storage_volume_uuid = sv.uuid
LEFT JOIN storage_instance_volume siv ON sv.uuid = siv.storage_volume_uuid
LEFT JOIN storage_attachment sa ON siv.storage_instance_uuid = sa.storage_instance_uuid
LEFT JOIN machine_volume mv ON sv.uuid = mv.volume_uuid
LEFT JOIN machine m ON mv.machine_uuid = m.uuid
WHERE sva.uuid = $entityUUID.uuid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as for filesystems.

}
var state map[string]bool
if err := yaml.Unmarshal([]byte(storageStateYAML), &state); err != nil {
return false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you get an error, that is an error, it is not proof that the hook did not fire.

)
}

if !hookInfo.UnitDead && storageAttachedHookFired(hookInfo.StorageID, hookInfo.StorageState) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. These should be two different errors.
  2. This is saying if a unit is alive and the storage attached hook fired then exit. This is wrong, it should be if the unit is not dead OR the storageAttachedHookFired.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this is logically equivalent to the rule you states in previous PR.
#21761 (comment)

Storage Attachments MUST not go to Dead until the Uniter says it is Dead or the removal was forceful. The exception here would be if the Uniter never started or the Storage Attached hook never fired.

Here we check that the unit is not dead (so it has started) and that the storageFired

  • !(Started && fired) === !notStarted || !fired

So if started && fired means we are in the exception case were storage attachement cannot go to dead.

Comment on lines +1450 to +1452
return info.MachineScopeProvisioned &&
info.StorageAttachmentDeadOrGone &&
info.MachineGone
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the attachment to be dying we must have a dead storage attachment, so we don't need to check the storage attachment dead or gone.

For the reset, the logic is:

  1. Filesystem Attachment is Machine Provisioned and
  2. Filesystem for the Filesystem attachment is Machine owned -or-
    The Volume backing the Filesystem is Machine Provisioned and is Machine owned.
  3. The Machine is Dead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the Volume attachment to be dying we must have a dead storage attachment, so we don't need to check the storage attachment dead or gone.

For the reset, the logic is:

  1. The Volume for the Volume attachment is Machine Provisioned and is Machine owned.
  2. The Machine is Dead.

advancement

Storage attachments, filesystem attachments, and volume attachments were
being unconditionally advanced from dying to dead in removal job
processors. This violates lifecycle rules when provisioners may still be
responsible for them.

Add guards so that:
- Storage attachment dying->dead: only if force, unit is dead, or the
  storage-attached hook never fired (checked via unit_state YAML)
- Filesystem/volume attachment dying->dead: only if force, or the
  storage attachment is dead/gone AND the filesystem/volume is
  machine-scoped AND the owning machine is gone
- Filesystem/volume attachments are now marked as dead before deletion
  when dying

Also remove storage_attachment and storage_unit_owner cleanup from
deleteForeignKeyUnitReferences, since storage attachments are their own
entities with independent lifecycles and must be removed through their
own removal jobs before unit deletion.
@nvinuesa nvinuesa changed the title fix(removal): clean up FK references on app, unit, and relation delete fix(removal): add conditional lifecycle guards for storage attachment Feb 17, 2026
@nvinuesa nvinuesa changed the title fix(removal): add conditional lifecycle guards for storage attachment fix(removal): add conditional lifecycle guards for storage attachment advancement Feb 17, 2026
@gfouillet gfouillet self-requested a review February 18, 2026 08:24
Copy link
Contributor

@gfouillet gfouillet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QA ok, will take another look when Harry's comments will be addressed.


// StorageAttachmentHookInfo contains the information required to determine
// if a dying storage attachment can be safely advanced to dead.
type StorageAttachmentHookInfo struct {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: Naming.

Both StorageAttachmentHookInfo and ProvisionedAttachmentAdvanceInfo have the same responsibility (holding data to decide wether a thing can be advanced to dead.)

Hook is contextual information that might not remains correct.

Suggested change
type StorageAttachmentHookInfo struct {
type StorageAttachmentAdvanceInfo struct {


// StorageAttachmentDeadOrGone is true when the associated storage
// attachment is dead or no longer exists.
StorageAttachmentDeadOrGone bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: I am unsure about this naming, which is very precise and leak implementation (which may or may not be a good thing)

I wonder if using a semantically less precise wording for all the boolean gates here could be more informative

Instead of having Dead/Gone naming, maybe taking a step back and use "Hold" or similar word that ship the idea of dependencies between entities.

Just a thought, then, if no idea came up i am fine.

storageerrors "github.com/juju/juju/domain/storage/errors"
storageprovisioningerrors "github.com/juju/juju/domain/storageprovisioning/errors"
"github.com/juju/juju/internal/errors"
"gopkg.in/yaml.v3"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo: GCI me

)
}

if !hookInfo.UnitDead && storageAttachedHookFired(hookInfo.StorageID, hookInfo.StorageState) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me this is logically equivalent to the rule you states in previous PR.
#21761 (comment)

Storage Attachments MUST not go to Dead until the Uniter says it is Dead or the removal was forceful. The exception here would be if the Uniter never started or the Storage Attached hook never fired.

Here we check that the unit is not dead (so it has started) and that the storageFired

  • !(Started && fired) === !notStarted || !fired

So if started && fired means we are in the exception case were storage attachement cannot go to dead.

Comment on lines +296 to +304
SELECT u.life_id AS &storageAttachmentUnitInfo.unit_life_id,
si.storage_id AS &storageAttachmentUnitInfo.storage_id,
COALESCE(us.storage_state, '') AS &storageAttachmentUnitInfo.storage_state
FROM storage_attachment sa
JOIN unit u ON sa.unit_uuid = u.uuid
JOIN storage_instance si ON sa.storage_instance_uuid = si.uuid
LEFT JOIN unit_state us ON sa.unit_uuid = us.unit_uuid
WHERE sa.uuid = $entityUUID.uuid
`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo:

Suggested change
SELECT u.life_id AS &storageAttachmentUnitInfo.unit_life_id,
si.storage_id AS &storageAttachmentUnitInfo.storage_id,
COALESCE(us.storage_state, '') AS &storageAttachmentUnitInfo.storage_state
FROM storage_attachment sa
JOIN unit u ON sa.unit_uuid = u.uuid
JOIN storage_instance si ON sa.storage_instance_uuid = si.uuid
LEFT JOIN unit_state us ON sa.unit_uuid = us.unit_uuid
WHERE sa.uuid = $entityUUID.uuid
`
SELECT u.life_id AS &storageAttachmentUnitInfo.unit_life_id,
si.storage_id AS &storageAttachmentUnitInfo.storage_id,
COALESCE(us.storage_state, '') AS &storageAttachmentUnitInfo.storage_state
FROM storage_attachment AS sa
JOIN unit AS u ON sa.unit_uuid = u.uuid
JOIN storage_instance AS si ON sa.storage_instance_uuid = si.uuid
LEFT JOIN unit_state AS us ON sa.unit_uuid = us.unit_uuid
WHERE sa.uuid = $entityUUID.uuid

This is a good thing to follow the sqlfluff rules we enforce in our DDL, even in embedded query

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

Comments