Skip to content

test: add chaos-mesh/toxiproxy based network fault injection for integration tests #152

@tinswzy

Description

@tinswzy

Background

Woodpecker heavily depends on object storage (e.g., S3/MinIO) and network stability for WAL write/read paths.
However, current integration tests assume a stable network and do not cover degraded scenarios such as latency spikes, jitter, or transient disconnections.

In real-world deployments (multi-AZ / cloud environments), network instability is common and may lead to:

  • increased write latency
  • retry amplification
  • LAC progression delay
  • read unavailability under certain conditions

Proposal

Introduce chaos-based network fault injection into integration tests using:

  • :contentReference[oaicite:0]{index=0} (for local / docker-compose / CI)
  • :contentReference[oaicite:1]{index=1} (for k8s-based testing)

Scope

1. Toxiproxy-based tests (local / CI)

  • Add toxiproxy container into integration test environment
  • Route object storage traffic through proxy
  • Inject faults:
    • latency (e.g. 200ms / 500ms)
    • jitter (e.g. ±100ms)
    • connection toggle (simulate transient outage)

Example:

toxiproxy-cli toxic add minio -t latency -a latency=500 -a jitter=100
toxiproxy-cli toggle minio

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No fields configured for Task.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions