Skip to content

Introduce a helper to reduce logs in frequently invoked codepaths. #34977

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

tvalentyn
Copy link
Contributor

@tvalentyn tvalentyn commented May 16, 2025

We have several places in Python SDK where a certain log entry is emitted multiple times, when a single warning would be sufficient. This is causing excessive noise, particularly visible during job submission, when INFO logs are enabled.

We also have codepaths where we may want to log a particular message, but not more often than once per x minutes. Every time that happens, we add the same logic counting time, for example:

if self._large_flush_last_observed_timestamp + 600 < time.time():

I suggest to introduce a helper that can streamline this functionality at the cost storing identifiers of a log entry in memory, such as a position in the codebase where the log is emitted, or some identifier to dedup or group similar messages.

For example, a caller can use an ID to print a GCS bucket configuration note only once per bucket, using bucket name as an ID.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

Build python source distribution and wheels
Python tests
Java tests
Go tests

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Copy link

codecov bot commented May 16, 2025

Codecov Report

Attention: Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.

Project coverage is 54.51%. Comparing base (5f9cd73) to head (3fbe327).
Report is 3 commits behind head on master.

Files with missing lines Patch % Lines
...dks/python/apache_beam/options/pipeline_options.py 90.90% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #34977   +/-   ##
=========================================
  Coverage     54.50%   54.51%           
  Complexity     1479     1479           
=========================================
  Files          1012     1013    +1     
  Lines        160667   160691   +24     
  Branches       1079     1079           
=========================================
+ Hits          87577    87603   +26     
+ Misses        70990    70988    -2     
  Partials       2100     2100           
Flag Coverage Δ
python 81.00% <96.87%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Abacn
Copy link
Contributor

Abacn commented May 16, 2025

+1 this is a useful helper. Always wondering there is a Python implementation of absl's LOG_EVERY_N_SEC

Copy link
Contributor

Assigning reviewers:

R: @claudevdm for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

@tvalentyn tvalentyn force-pushed the less_logs branch 2 times, most recently from f6384e1 to 995e489 Compare May 17, 2025 01:06
@tvalentyn
Copy link
Contributor Author

Always wondering there is a Python implementation of absl's LOG_EVERY_N_SEC

Looks like there is:
https://github.com/abseil/abseil-py/blob/369ce9badbda914b7d3b975b7272e1194b419213/absl/logging/__init__.py#L494

Looks interesting, i wonder if we can use absl.logging instead without causing some unintentional consequences. It already has all these helpers.

@tvalentyn tvalentyn marked this pull request as draft May 17, 2025 03:16
@tvalentyn
Copy link
Contributor Author

tvalentyn commented May 19, 2025

Looks interesting, i wonder if we can use absl.logging instead without causing some unintentional consequences. It already has all these helpers.

Took a look. I think the main issue is we adopted hierarchical logging with named loggers to allow for per-file logging levels. Absl uses a different method of per-file logging that would require code and documentation changes to adopt, so seems like not a drop-in replacement.

absl-py apis look better, I'll try to TAL if we can have something similar.

@tvalentyn
Copy link
Contributor Author

Found an implementation accomplishes what i'd like to have, will need to polish the change a bit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants