Skip to content

fix(transport,pool): resolve concurrency issues and add timeout policy#62

Merged
YuminosukeSato merged 2 commits into
mainfrom
fix/concurrency-timeout-policy
Dec 25, 2025
Merged

fix(transport,pool): resolve concurrency issues and add timeout policy#62
YuminosukeSato merged 2 commits into
mainfrom
fix/concurrency-timeout-policy

Conversation

@YuminosukeSato

Copy link
Copy Markdown
Owner

This change addresses Issues #34 and #35 by fixing concurrency bugs in MultiplexedTransport and ensuring BatchCall always returns on context cancellation.

MultiplexedTransport Concurrency Fixes (#35)

  • Add writeMu to serialize frame writes and prevent frame corruption
  • Introduce cleanupOnce for safe pending request cleanup
  • Fix request ID handling to prefer payload ID over frame header
  • Prevent double-close of closeCh in handleReadError and Close
  • Use RLock instead of Lock when iterating pending requests

BatchCall Timeout Handling (#34)

  • BatchCall now returns immediately when ctx.Done() fires
  • Incomplete items receive TimeoutError in their error slots
  • No more hangs when context deadline expires

Timeout Policy (timeout-policy spec)

  • Add TimeoutError type with Kind (Context/PerCall/Transport)
  • TimeoutError.Unwrap returns underlying cause for errors.Is compat
  • effectiveDeadline selects min(ctx, perCall, transportDefault)
  • All timeout errors now use consistent classification

Test Coverage

  • Add concurrent request tests with unique payloads
  • Add BatchCall timeout tests ensuring no hangs
  • Add comprehensive TimeoutError unit tests
  • Add effectiveDeadline priority tests

Closes #34, Closes #35

🤖 Generated with Claude Code

Description

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update
  • Performance improvement
  • Refactoring (no functional changes)
  • CI/CD or build changes

Related Issues

Closes #

Checklist

  • All tests pass locally (go test ./... and cd worker/python && uv run pytest)
  • Code follows the project's style guidelines (go fmt, ruff)
  • Commit messages follow Conventional Commits format
  • Documentation is updated (if applicable)
  • I have self-reviewed my code
  • I have added tests that prove my fix/feature works (if applicable)

Testing

Go Tests

# Run: go test -v ./...

Python Tests

# Run: cd worker/python && uv run pytest -v

Performance Impact

Benchmark Results (if applicable)
# Run: cd bench && go test -bench=. -benchmem

Additional Context

Screenshots/Examples

Copilot AI review requested due to automatic review settings December 25, 2025 03:36
@github-actions github-actions Bot added bug Something isn't working lang/go Go code changes area/pool Worker pool management labels Dec 25, 2025

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses concurrency issues in MultiplexedTransport and fixes timeout handling in BatchCall to prevent hangs when context deadlines expire. It also introduces a structured TimeoutError type with timeout source classification.

Key Changes:

  • Fixed race conditions in MultiplexedTransport by adding write serialization, safe cleanup patterns, and proper channel close guards
  • Made BatchCall return immediately on context cancellation with TimeoutError for incomplete items
  • Added TimeoutError type with Kind classification and Unwrap support for errors.Is compatibility

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
pkg/pyproc/transport_multiplexed.go Added writeMu for frame write serialization, cleanupOnce for safe pending cleanup, and comprehensive timeout handling with effectiveDeadline
pkg/pyproc/transport_multiplexed_unit_test.go New unit tests covering concurrent requests, context cancellation, transport timeouts, and error handling paths
pkg/pyproc/transport_multiplexed_test.go Enhanced concurrent request test with synchronization barriers and retry helper for transport creation
pkg/pyproc/timeout.go New timeout infrastructure with TimeoutError type, effectiveDeadline for timeout priority selection, and helper functions
pkg/pyproc/timeout_test.go Comprehensive tests for timeout error formatting, deadline selection, and context timeout classification
pkg/pyproc/pool_generic.go Updated BatchCall to handle context cancellation and populate incomplete items with TimeoutError
pkg/pyproc/pool_generic_test.go Added test verifying BatchCall returns on timeout with proper TimeoutError classification
pkg/pyproc/pool_generic_unit_test.go New unit tests for typed pool lifecycle, batch operations, and error wrapping

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +20 to +21
tmpDir := filepath.Join("/tmp", "pyproc")
_ = os.MkdirAll(tmpDir, 0o755)

Copilot AI Dec 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error from os.MkdirAll is being ignored. If directory creation fails, subsequent socket operations will fail with unclear errors. Consider using t.TempDir() instead for automatic cleanup and error handling, or at minimum check and handle the error.

Suggested change
tmpDir := filepath.Join("/tmp", "pyproc")
_ = os.MkdirAll(tmpDir, 0o755)
tmpDir := t.TempDir()

Copilot uses AI. Check for mistakes.
opts := PoolOptions{
Config: PoolConfig{Workers: 1, MaxInFlight: 2},
WorkerConfig: WorkerConfig{
SocketPath: "/tmp/test-typed-batch-timeout.sock",

Copilot AI Dec 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded socket paths in /tmp can cause conflicts in parallel test execution and leave orphaned sockets. Use t.TempDir() combined with filepath.Join to generate unique socket paths, or reuse the newShortSocketPath helper from the unit test file.

Copilot uses AI. Check for mistakes.
return nil, fmt.Errorf("request timeout after %v", timeout)
case <-ctx.Done():
if ctx.Err() == context.DeadlineExceeded {
return nil, newTimeoutError(TimeoutKindContext, timeoutDuration(start, deadline), context.DeadlineExceeded)

Copilot AI Dec 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deadline variable may be uninitialized if effectiveDeadline returned hasDeadline=false. This would pass a zero time to timeoutDuration, returning 0. Either use the context deadline directly from ctx.Deadline(), or ensure the deadline variable is always set when ctx.Err() == context.DeadlineExceeded.

Suggested change
return nil, newTimeoutError(TimeoutKindContext, timeoutDuration(start, deadline), context.DeadlineExceeded)
// Prefer the context's own deadline when computing the timeout duration
ctxDeadline, ok := ctx.Deadline()
var timeout time.Duration
if ok {
timeout = timeoutDuration(start, ctxDeadline)
} else if hasDeadline {
timeout = timeoutDuration(start, deadline)
} else {
timeout = time.Since(start)
}
return nil, newTimeoutError(TimeoutKindContext, timeout, context.DeadlineExceeded)

Copilot uses AI. Check for mistakes.
if ctx.Err() != nil {
t.Fatalf("Failed to create transport: %v", err)
}
_ = sleepWithCtx(ctx, 10*time.Millisecond)

Copilot AI Dec 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The retry loop ignores the error from sleepWithCtx. If the context is cancelled during sleep, the loop continues and checks ctx.Err() on the next iteration, but it would be clearer to check the sleep error directly and return immediately.

Suggested change
_ = sleepWithCtx(ctx, 10*time.Millisecond)
if err := sleepWithCtx(ctx, 10*time.Millisecond); err != nil {
t.Fatalf("Failed to wait for transport creation: %v", err)
}

Copilot uses AI. Check for mistakes.
Comment on lines +354 to +356
transport := &MultiplexedTransport{
config: TransportConfig{
Address: "/tmp/nonexistent-transport.sock",

Copilot AI Dec 25, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard-coded path in /tmp may cause issues on systems where /tmp is mounted with noexec or has restricted permissions. Consider using t.TempDir() to generate a guaranteed-writable temporary directory path for more robust testing across different environments.

Suggested change
transport := &MultiplexedTransport{
config: TransportConfig{
Address: "/tmp/nonexistent-transport.sock",
tmpDir := t.TempDir()
socketPath := filepath.Join(tmpDir, "nonexistent-transport.sock")
transport := &MultiplexedTransport{
config: TransportConfig{
Address: socketPath,

Copilot uses AI. Check for mistakes.
This change addresses Issues #34 and #35 by fixing concurrency bugs
in MultiplexedTransport and ensuring BatchCall always returns on
context cancellation.

## MultiplexedTransport Concurrency Fixes (#35)
- Add writeMu to serialize frame writes and prevent frame corruption
- Introduce cleanupOnce for safe pending request cleanup
- Fix request ID handling to prefer payload ID over frame header
- Prevent double-close of closeCh in handleReadError and Close
- Use RLock instead of Lock when iterating pending requests

## BatchCall Timeout Handling (#34)
- BatchCall now returns immediately when ctx.Done() fires
- Incomplete items receive TimeoutError in their error slots
- No more hangs when context deadline expires

## Timeout Policy (timeout-policy spec)
- Add TimeoutError type with Kind (Context/PerCall/Transport)
- TimeoutError.Unwrap returns underlying cause for errors.Is compat
- effectiveDeadline selects min(ctx, perCall, transportDefault)
- All timeout errors now use consistent classification

## Test Coverage
- Add concurrent request tests with unique payloads
- Add BatchCall timeout tests ensuring no hangs
- Add comprehensive TimeoutError unit tests
- Add effectiveDeadline priority tests

Closes #34, Closes #35

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@YuminosukeSato YuminosukeSato force-pushed the fix/concurrency-timeout-policy branch from 67789ab to f4c424a Compare December 25, 2025 03:38
To satisfy per-function 100% coverage for the A1 scope, add a
marshal hook for deterministic error tests and remove an
unreachable branch in BatchCall to exercise all paths.
@YuminosukeSato YuminosukeSato merged commit 31a3d8c into main Dec 25, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/pool Worker pool management bug Something isn't working lang/go Go code changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants