Conversation
Add app.kubernetes.io/managed-by and app.kubernetes.io/instance labels to the Queue resource metadata. Remove the redundant | default 1209600 filter from messageRetentionSeconds since the default is already set in values.yaml.
…, rename timeout var
There was a problem hiding this comment.
Code Review
This pull request effectively adds SQS DLQ support, enables the corresponding tests, and refactors the queue health tests to align with the new Crossplane-based reconciliation strategy. A new cold-start scaling test is also introduced, which is a great addition for ensuring scale-from-zero reliability. However, a security audit identified two injection vulnerabilities in the Crossplane composition template where Helm values are rendered without proper escaping. Specifically, a potential template injection via awsAccountId and a JSON injection via maxReceiveCount were found. These issues should be remediated by using the toJson filter to ensure values are safely handled during template rendering. Additionally, the implementation of the new cold-start test could be optimized, as waiting for task completions can be significantly improved to reduce test execution time.
deploy/helm-charts/asya-crossplane/templates/composition-sqs.yaml
Outdated
Show resolved
Hide resolved
deploy/helm-charts/asya-crossplane/templates/composition-sqs.yaml
Outdated
Show resolved
Hide resolved
…t test, increase Crossplane reconcile timeout to 300s
Crossplane's SQS provider default drift detection cycle is ~10 minutes. Queue health chaos tests delete a queue and expect Crossplane to recreate it within 300s, which is impossible with the 10-min default poll interval. Add a DeploymentRuntimeConfig for provider-aws-sqs that passes --poll-interval to the provider pod, gated by providers.aws.pollInterval in values.yaml (empty = use provider default). Set it to "10s" in the sqs-s3 E2E test profile so drift is detected within seconds, making the chaos tests reliably complete within the 300s window. The annotation-based trigger (_trigger_crossplane_reconcile) remains as an explicit immediate trigger, complementing the short poll interval.
…-aws-sqs provider-aws-sqs v1.19.0 (upjet-generated family provider) does not expose --poll-interval as a CLI flag. Passing it via DeploymentRuntimeConfig causes the provider pod to crash-loop with 'unknown long flag --poll-interval', which fails the entire 'Deploy E2E cluster' step before any tests can run. The annotation-based trigger (_trigger_crossplane_reconcile) added in the previous commit already forces immediate reconciliation after queue deletion, making the poll interval override unnecessary for the chaos tests.
Summary
asya-{namespace}-dlqCrossplane Queue resource inasya-crewchart. Each actor queue gets aredrivePolicypointing at it (constructed fromawsAccountId+awsRegionin chart values).@pytest.mark.skipfromtest_poison_message_moves_to_dlq_e2eandtest_dlq_preserves_message_metadata_e2e.CROSSPLANE_RECONCILE_TIMEOUT_SECONDS(default 120s).test_cold_start_backlog_processing— scales actor to 0, enqueues 20-message backlog, asserts 90%+ complete after KEDA scales up.Merges aints:
1f3k(DLQ un-xfail) +1fbq(queue health un-skip) +1f2y(cold-start scaling) → single task1f2y.Test plan
helm lint deploy/helm-charts/asya-crew/passeshelm lint deploy/helm-charts/asya-crossplane/passeshelm templatewithawsAccountId=000000000000showsredrivePolicyin SQS queuehelm templatewithawsAccountId=shows noredrivePolicytest_crossplane_recreates_*make lintclean