Skip to content
This repository was archived by the owner on Oct 23, 2025. It is now read-only.
This repository was archived by the owner on Oct 23, 2025. It is now read-only.

Error during traffic switch causes 0% traffic for all stacks #560

@ePaul

Description

@ePaul

Background

We had two stacks, one with 100% weight, and a broken one (failed deployment) with 0% weight.
After deploying a third stack, we (our CD system) was switching traffic to it:

13:35:56.202 Running: /tools/run registry.opensource.zalan.do/stups/toolchain-stups:22 -- senza traffic purchase-orders-management.yaml 201904041320 100 --region eu-central-1
13:35:59.030 Calculating new weights.. OK
13:35:59.031 Stack Name                │Version     │Identifier                             │Old Weight%│Delta │Compensation│New Weight%│Current
13:35:59.031 purchase-orders-management              purchase-orders-management-201904031151         0.0                             0.0         
13:35:59.031 purchase-orders-management 201903281417 purchase-orders-management-201903281417       100.0 -100.0                      0.0         
13:35:59.031 purchase-orders-management 201904041320 purchase-orders-management-201904041320         0.0  100.0                    100.0 <       
13:36:01.074 Setting weights for purchase-orders-management.goodbuy.zalan.do...Validation Error: Stack:arn:aws:cloudformation:eu-central-1:383379053614:stack/purchase-orders-management-201904031151/0ecefee0-56ca-11e9-99be-026d43bbed96 is in CREATE_FAILED state and can not be updated.

So the traffic switching failed because of the broken stack. So far, so good.

Problem

But when looking at the setting later, it looked like that:

$ senza traffic purchase-orders-management
Stack Name                │Version     │Identifier                             │Weight%
purchase-orders-management              purchase-orders-management-201904031151     0.0 
purchase-orders-management 201903281417 purchase-orders-management-201903281417     0.0 
purchase-orders-management 201904041320 purchase-orders-management-201904041320     0.0 

So now all stacks (including the broken one) had a weight of 0.0. That is definitely not correct.

Guess on what happened

Looking into the code of senza traffic, it looks like the command computes the new percentages (and displays them, as we can see), and then goes through them one-by-one, issuing the API call to change the weights. As soon as one of them fails, the whole command stops.

This here seems to have the effect that first version 201903281417 is set to 0, then the broken stack is tried to update (which fails), and the setting of 201904041320 to 100 is not even tried.

What should happen

When switching the traffic, the weight-increasing of some instances should be done before decreasing the weight of other instances.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions