Skip to content

[CI] test_vxlan_ecmp.py consistently failing on kvmtest-t1-lag-vpp #22659

@rustiqly

Description

@rustiqly

Description

vxlan/test_vxlan_ecmp.py is consistently failing on the kvmtest-t1-lag-vpp CI pipeline, causing the VPP test job to fail on virtually all PRs.

Evidence

Checked across multiple unrelated PRs (both rustiqly and others):

PR Build VPP Result Failing Test
#22589 1047276 ❌ FAIL vxlan/test_vxlan_ecmp.py (2 failures)
#22591 1047540 ❌ FAIL vxlan/test_vxlan_ecmp.py (2 failures)
#22592 1047541 ❌ FAIL vxlan/test_vxlan_ecmp.py (2 failures)
#22594 1047367 ❌ FAIL vxlan/test_vxlan_ecmp.py (2 failures)
#22650 1047563 ❌ FAIL vxlan/test_vxlan_ecmp.py (2 failures)
#22581 1045657 ❌ FAIL Test plan stuck & timeout
#22575 1045455 ✅ PASS — (occasional pass)

The test fails with RUN_TEST_CASE_FAILED after 2 attempts, which triggers the test plan stop policy.

Impact

  • The kvmtest-t1-lag-vpp job is marked [OPTIONAL] so it doesn't block merges
  • However, the overall Azure.sonic-mgmt check shows as FAILURE, causing confusion for PR authors
  • No PR code changes are related — this is a pre-existing/infrastructure issue

Topology & Test Details

  • Pipeline: kvmtest-t1-lag-vpp
  • Test file: tests/vxlan/test_vxlan_ecmp.py
  • The test explicitly supports VPP (asic_type == 'vpp' handled in setUp fixture, line 188)
  • No recent changes to tests/vxlan/test_vxlan_ecmp.py in master

Additional: kvmtest-t2 Failures

The kvmtest-t2 pipeline also frequently fails with two separate issues:

  1. LOCK_TESTBED_FAILED: 7200s timeout waiting for testbed allocation (infra/resource contention)
  2. everflow/test_everflow_testbed.py: Intermittent test failures on t2 topology
  3. telemetry/test_telemetry_poll.py: Intermittent failures on t2

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions