feat(e2e): Crossplane DLQ support + drift detection tests + cold-start scaling by atemate · Pull Request #270 · deliveryhero/asya

atemate · 2026-03-06T20:03:39Z

Summary

SQS DLQ queue: Add shared asya-{namespace}-dlq Crossplane Queue resource in asya-crew chart. Each actor queue gets a redrivePolicy pointing at it (constructed from awsAccountId + awsRegion in chart values).
Un-skip DLQ tests: Remove @pytest.mark.skip from test_poison_message_moves_to_dlq_e2e and test_dlq_preserves_message_metadata_e2e.
Crossplane drift tests: Rewrite 4 queue-health tests — replace "operator recreates queue" framing with Crossplane drift reconciliation, reduce timeout from 360s to CROSSPLANE_RECONCILE_TIMEOUT_SECONDS (default 120s).
Cold-start test: Add test_cold_start_backlog_processing — scales actor to 0, enqueues 20-message backlog, asserts 90%+ complete after KEDA scales up.

Merges aints: 1f3k (DLQ un-xfail) + 1fbq (queue health un-skip) + 1f2y (cold-start scaling) → single task 1f2y.

Test plan

helm lint deploy/helm-charts/asya-crew/ passes
helm lint deploy/helm-charts/asya-crossplane/ passes
helm template with awsAccountId=000000000000 shows redrivePolicy in SQS queue
helm template with awsAccountId= shows no redrivePolicy
DLQ tests collected without skip (grep h6h2 returns 0)
Queue health tests collected without skip (grep pwx6 returns 0), renamed to test_crossplane_recreates_*
Cold-start test collected, polls for actual pod drain before enqueuing
make lint clean

… template)

Add app.kubernetes.io/managed-by and app.kubernetes.io/instance labels to the Queue resource metadata. Remove the redundant | default 1209600 filter from messageRetentionSeconds since the default is already set in values.yaml.

…DLQ ARN

…tion

…alth tests

…, rename timeout var

gemini-code-assist

Code Review

This pull request effectively adds SQS DLQ support, enables the corresponding tests, and refactors the queue health tests to align with the new Crossplane-based reconciliation strategy. A new cold-start scaling test is also introduced, which is a great addition for ensuring scale-from-zero reliability. However, a security audit identified two injection vulnerabilities in the Crossplane composition template where Helm values are rendered without proper escaping. Specifically, a potential template injection via awsAccountId and a JSON injection via maxReceiveCount were found. These issues should be remediated by using the toJson filter to ensure values are safely handled during template rendering. Additionally, the implementation of the new cold-start test could be optimized, as waiting for task completions can be significantly improved to reduce test execution time.

deploy/helm-charts/asya-crossplane/templates/composition-sqs.yaml

testing/e2e/tests/test_scaling_performance_e2e.py

deploy/helm-charts/asya-crossplane/templates/composition-sqs.yaml

…t test, increase Crossplane reconcile timeout to 300s

…haos tests

Crossplane's SQS provider default drift detection cycle is ~10 minutes. Queue health chaos tests delete a queue and expect Crossplane to recreate it within 300s, which is impossible with the 10-min default poll interval. Add a DeploymentRuntimeConfig for provider-aws-sqs that passes --poll-interval to the provider pod, gated by providers.aws.pollInterval in values.yaml (empty = use provider default). Set it to "10s" in the sqs-s3 E2E test profile so drift is detected within seconds, making the chaos tests reliably complete within the 300s window. The annotation-based trigger (_trigger_crossplane_reconcile) remains as an explicit immediate trigger, complementing the short poll interval.

…-aws-sqs provider-aws-sqs v1.19.0 (upjet-generated family provider) does not expose --poll-interval as a CLI flag. Passing it via DeploymentRuntimeConfig causes the provider pod to crash-loop with 'unknown long flag --poll-interval', which fails the entire 'Deploy E2E cluster' step before any tests can run. The annotation-based trigger (_trigger_crossplane_reconcile) added in the previous commit already forces immediate reconciliation after queue deletion, making the poll interval override unnecessary for the chaos tests.

atemate added 8 commits March 6, 2026 19:43

feat(crew): add shared SQS DLQ queue via Crossplane (sqs.dlq config +…

18fc34f

… template)

feat(crossplane): add SQS redrivePolicy to actor queues using shared …

dc41641

…DLQ ARN

feat(e2e): enable DLQ queue in sqs-s3 profile + un-skip DLQ tests

d325f90

test(e2e): rewrite queue-health tests for Crossplane drift reconcilia…

267f75c

…tion

fix(e2e): remove stale operator refs + add sleep comments in queue-he…

59208cd

…alth tests

test(e2e): add cold-start backlog processing test for KEDA minReplicas=0

8abc641

fix(e2e): cold-start test — verify scale-to-0, accurate sleep comment…

74e7b62

…, rename timeout var

github-actions bot added charts feat New feature implementation test labels Mar 6, 2026

gemini-code-assist bot reviewed Mar 6, 2026

View reviewed changes

deploy/helm-charts/asya-crossplane/templates/composition-sqs.yaml Outdated Show resolved Hide resolved

testing/e2e/tests/test_scaling_performance_e2e.py Outdated Show resolved Hide resolved

deploy/helm-charts/asya-crossplane/templates/composition-sqs.yaml Outdated Show resolved Hide resolved

atemate added 4 commits March 6, 2026 21:51

fix: toJson escaping in composition, parallel completion in cold-star…

ff5ae16

…t test, increase Crossplane reconcile timeout to 300s

fix(e2e): trigger Crossplane reconciliation after queue deletion in c…

26c13cb

…haos tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(e2e): Crossplane DLQ support + drift detection tests + cold-start scaling#270

feat(e2e): Crossplane DLQ support + drift detection tests + cold-start scaling#270
atemate wants to merge 12 commits intomainfrom
tech-debt-crossplane-e2e-tests/1f2y.fix-scaling-performance-e2e-tests-cold-start-backlog

atemate commented Mar 6, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

atemate commented Mar 6, 2026

Summary

Test plan

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant