Description
Add comprehensive stress tests to verify the environment management code handles high-concurrency scenarios correctly, particularly around the reference counting and mutex protection.
Background
From PR #15 review:
While the current concurrency tests are good, we need stress tests to catch subtle race conditions that only appear under load. The race detector is disabled for FFI code, making manual stress testing critical.
Current Test Coverage
Existing tests (good foundation):
TestConcurrentInitialization - 10 goroutines, simple increment
TestConcurrentDestroy - 10 goroutines, simple decrement
- Both test basic correctness but not stress scenarios
Proposed Stress Tests
1. High-Concurrency Init/Destroy Cycles
func TestStressConcurrentInitDestroy(t *testing.T) {
if testing.Short() {
t.Skip("Skipping stress test in short mode")
}
const (
goroutines = 100
iterations = 1000
)
// Test rapid init/destroy cycles from many goroutines
// Verify refCount always returns to 0 at the end
}
2. Mixed Operations Under Load
func TestStressMixedOperations(t *testing.T) {
// Concurrent:
// - InitializeEnvironment()
// - DestroyEnvironment()
// - IsInitialized()
// - GetVersionString()
// - SetSharedLibraryPath() (before init)
// - SetLogLevel() (before init)
// Run for 10 seconds, verify no panics or deadlocks
}
3. Rapid Init/Destroy Stress
func TestStressRapidInitDestroy(t *testing.T) {
// Single goroutine, but very rapid cycles
// Simulates applications that frequently restart environments
// Run 10,000 init/destroy pairs
// Verify no memory corruption or state corruption
}
4. Stress Test with go test Flags
# Run with higher iteration counts and parallelism
go test -v -run=TestStress -count=100 -parallel=8 ./ort/...
# Run for extended duration
go test -v -run=TestStress -timeout=5m ./ort/...
Why This Matters
Race Detector Limitation
From .github/workflows/ci.yml:
# Note: -race flag disabled because checkptr is incompatible with purego's
# C string conversion (unsafe.Slice on C allocations).
Since we can't use -race flag, stress testing is our primary defense against concurrency bugs.
Potential Issues to Catch
- Deadlocks: Goroutine hangs waiting for mutex
- Race conditions: RefCount corruption under load
- Memory corruption: Unsafe operations on shared state
- Panic under load: Edge cases only visible with many goroutines
- Resource leaks: File handles not properly closed
Implementation Plan
Phase 1: Add Stress Tests
- Create
environment_stress_test.go
- Implement tests above with
-short flag support
- Document how to run stress tests in
TESTING.md
Phase 2: CI Integration
Add stress test job to CI:
stress-test:
name: Stress Test
runs-on: ubuntu-latest
steps:
- name: Run stress tests
run: go test -v -run=TestStress -count=50 -parallel=4 ./ort/...
timeout-minutes: 10
Phase 3: Pre-Release Validation
Before any release, run extended stress tests:
# Long-running comprehensive stress
go test -v -run=TestStress -count=500 -parallel=16 -timeout=30m ./ort/...
Acceptance Criteria
Success Metrics
- Tests run successfully 100+ times without failures
- No panics, deadlocks, or timeouts
- RefCount always returns to expected state
- No memory leaks visible (even with -memprofile)
Related Issues
Priority
Medium - Should be done before v0.2.0 or before claiming production-ready status
Description
Add comprehensive stress tests to verify the environment management code handles high-concurrency scenarios correctly, particularly around the reference counting and mutex protection.
Background
From PR #15 review:
Current Test Coverage
Existing tests (good foundation):
TestConcurrentInitialization- 10 goroutines, simple incrementTestConcurrentDestroy- 10 goroutines, simple decrementProposed Stress Tests
1. High-Concurrency Init/Destroy Cycles
2. Mixed Operations Under Load
3. Rapid Init/Destroy Stress
4. Stress Test with go test Flags
Why This Matters
Race Detector Limitation
From
.github/workflows/ci.yml:Since we can't use
-raceflag, stress testing is our primary defense against concurrency bugs.Potential Issues to Catch
Implementation Plan
Phase 1: Add Stress Tests
environment_stress_test.go-shortflag supportTESTING.mdPhase 2: CI Integration
Add stress test job to CI:
Phase 3: Pre-Release Validation
Before any release, run extended stress tests:
Acceptance Criteria
testing.Short()to skip in normal runsTESTING.mddocuments how to run stress testsSuccess Metrics
Related Issues
Priority
Medium - Should be done before v0.2.0 or before claiming production-ready status