Skip to content

[Log](pyudf) Add progress logs for python process pool init#62974

Open
linrrzqqq wants to merge 2 commits intoapache:masterfrom
linrrzqqq:add-py-pool-init-log
Open

[Log](pyudf) Add progress logs for python process pool init#62974
linrrzqqq wants to merge 2 commits intoapache:masterfrom
linrrzqqq:add-py-pool-init-log

Conversation

@linrrzqqq
Copy link
Copy Markdown
Collaborator

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Improve observability for rare Python UDF process pool initialization stalls.

Previously, when initialization was blocked or unusually slow, logs only showed the start of pool creation, making it difficult to tell whether BE was still waiting on process startup or where time was being spent. This adds lightweight progress and elapsed-time logs to help diagnose initialization hangs without increasing normal-case log volume.

And reduce the max_python_process_num to 16 in regression test, we don't need so many processes for testing.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

/review

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the PR diff from GitHub. I did not find a blocking issue.

Critical checkpoint conclusions:

  • Goal and tests: The change adds periodic elapsed-time/progress logging for Python UDF process pool initialization and reduces regression pipeline Python process counts. The code accomplishes the stated observability goal. No new test is required for the log-only path, and existing Python UDF regression coverage should still exercise pool initialization.
  • Scope: The code change is small and focused in PythonServerManager::ensure_pool_initialized; config changes are limited to regression pipeline BE configs.
  • Concurrency: Pool initialization still runs under _pools_mutex as before. The new wait_for loop does not introduce new shared mutable state; success_count/failure_count remain local to the initializing thread, and future ownership is unchanged.
  • Lifecycle/static initialization: No new static/global object or non-intuitive lifecycle dependency was added. The health-check thread lifecycle is unchanged.
  • Configuration: No new config item was added. Existing regression config values are reduced from 64 to 16 only.
  • Compatibility/storage/transactions: No protocol, storage format, transaction, visible-version, or persistence path is changed.
  • Parallel code paths: UDF/UDAF/UDTF clients all share this same PythonServerManager::get_client path, so the logging applies consistently.
  • Special conditions/error handling: Existing Status propagation remains checked. The new wait loop only observes future readiness before the existing get().
  • Test result changes: No result-file updates are part of the actual GitHub PR diff.
  • Observability: The added logs include Python version, waiting slot, success/failure counts, and elapsed time, which should help diagnose slow pool initialization without changing normal success behavior.
  • Performance: In normal initialization the added wait_for call should be negligible; in slow/hung cases it logs at a 20s interval, which is reasonable.
  • User focus: No additional user-provided review focus was specified.

Residual note: the progress log is sequential by slot, so if an earlier slot is slow while later futures have already completed, success may not reflect those later completed futures until the earlier slot is collected. This is acceptable for diagnostic progress logging and not a correctness issue.

@linrrzqqq
Copy link
Copy Markdown
Collaborator Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants