Skip to content

[TST] Add stress tests for concurrent init/destroy #24

@tazarov

Description

@tazarov

Description

Add comprehensive stress tests to verify the environment management code handles high-concurrency scenarios correctly, particularly around the reference counting and mutex protection.

Background

From PR #15 review:

While the current concurrency tests are good, we need stress tests to catch subtle race conditions that only appear under load. The race detector is disabled for FFI code, making manual stress testing critical.

Current Test Coverage

Existing tests (good foundation):

  • TestConcurrentInitialization - 10 goroutines, simple increment
  • TestConcurrentDestroy - 10 goroutines, simple decrement
  • Both test basic correctness but not stress scenarios

Proposed Stress Tests

1. High-Concurrency Init/Destroy Cycles

func TestStressConcurrentInitDestroy(t *testing.T) {
    if testing.Short() {
        t.Skip("Skipping stress test in short mode")
    }
    
    const (
        goroutines = 100
        iterations = 1000
    )
    
    // Test rapid init/destroy cycles from many goroutines
    // Verify refCount always returns to 0 at the end
}

2. Mixed Operations Under Load

func TestStressMixedOperations(t *testing.T) {
    // Concurrent:
    // - InitializeEnvironment()
    // - DestroyEnvironment()  
    // - IsInitialized()
    // - GetVersionString()
    // - SetSharedLibraryPath() (before init)
    // - SetLogLevel() (before init)
    
    // Run for 10 seconds, verify no panics or deadlocks
}

3. Rapid Init/Destroy Stress

func TestStressRapidInitDestroy(t *testing.T) {
    // Single goroutine, but very rapid cycles
    // Simulates applications that frequently restart environments
    // Run 10,000 init/destroy pairs
    // Verify no memory corruption or state corruption
}

4. Stress Test with go test Flags

# Run with higher iteration counts and parallelism
go test -v -run=TestStress -count=100 -parallel=8 ./ort/...

# Run for extended duration
go test -v -run=TestStress -timeout=5m ./ort/...

Why This Matters

Race Detector Limitation

From .github/workflows/ci.yml:

# Note: -race flag disabled because checkptr is incompatible with purego's
# C string conversion (unsafe.Slice on C allocations).

Since we can't use -race flag, stress testing is our primary defense against concurrency bugs.

Potential Issues to Catch

  1. Deadlocks: Goroutine hangs waiting for mutex
  2. Race conditions: RefCount corruption under load
  3. Memory corruption: Unsafe operations on shared state
  4. Panic under load: Edge cases only visible with many goroutines
  5. Resource leaks: File handles not properly closed

Implementation Plan

Phase 1: Add Stress Tests

  1. Create environment_stress_test.go
  2. Implement tests above with -short flag support
  3. Document how to run stress tests in TESTING.md

Phase 2: CI Integration

Add stress test job to CI:

stress-test:
  name: Stress Test
  runs-on: ubuntu-latest
  steps:
    - name: Run stress tests
      run: go test -v -run=TestStress -count=50 -parallel=4 ./ort/...
      timeout-minutes: 10

Phase 3: Pre-Release Validation

Before any release, run extended stress tests:

# Long-running comprehensive stress
go test -v -run=TestStress -count=500 -parallel=16 -timeout=30m ./ort/...

Acceptance Criteria

  • At least 3 stress test functions added
  • Tests use testing.Short() to skip in normal runs
  • TESTING.md documents how to run stress tests
  • CI runs stress tests (with moderate load)
  • All stress tests pass consistently (run 100 times)

Success Metrics

  • Tests run successfully 100+ times without failures
  • No panics, deadlocks, or timeouts
  • RefCount always returns to expected state
  • No memory leaks visible (even with -memprofile)

Related Issues

Priority

Medium - Should be done before v0.2.0 or before claiming production-ready status

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions