Skip to content

Fix TaskContext leak and resource cleanup in getRDDPartition#463

Open
rexminnis wants to merge 1 commit intoray-project:masterfrom
rexminnis:fix/getRDDPartition-taskcontext-leak
Open

Fix TaskContext leak and resource cleanup in getRDDPartition#463
rexminnis wants to merge 1 commit intoray-project:masterfrom
rexminnis:fix/getRDDPartition-taskcontext-leak

Conversation

@rexminnis
Copy link
Contributor

@rexminnis rexminnis commented Feb 17, 2026

Summary

  • sets a dummy but never unsets it. Since executor actors reuse threads (), a stale leaks into subsequent
    operations on the same thread, which can cause subtle issues with metrics, accumulators, and task-local state.
  • If an exception occurs mid-method, neither the / nor the read locks are cleaned up.
  • Wraps the method body in to guarantee:
    • read locks are released via
    • is unset via
    • and are closed

Test plan

  • Verify core compiles cleanly against all shim targets (3.2.2, 3.3.0, 3.4.0, 3.5.0)
  • Run round-trip conversion
  • Confirm no resource leak warnings in executor logs after repeated conversions.

   resource cleanup in getRDDPartition

   getRDDPartition sets a dummy TaskContext but never unsets it. Since
   executor actors reuse threads (maxConcurrency=2), a stale TaskContext
   from one call leaks into subsequent operations on the same thread,
   which can cause subtle issues with metrics, accumulators, and
   task-local state.

   Additionally, if an exception occurs mid-method, neither the
   WriteChannel/ByteArrayOutputStream nor the BlockManager read locks
   are cleaned up.

   Wrap the method body in try/finally to guarantee:
   - BlockManager read locks are released via releaseAllLocksForTask
   - TaskContext is unset via TaskContext.unset()
   - WriteChannel and ByteArrayOutputStream are closed.
@rexminnis
Copy link
Contributor Author

CI failure is pre-existing and unrelated to this change — actions/setup-python fails to install Python 3.9 on the runner (candidates is not iterable). Python 3.9 has been removed from GitHub-hosted ubuntu-latest runners since it reached EOL.

All recent raydp.yml runs are failing with the same error, including PR #450.

I'll open a separate PR to fix the CI workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant