Introduce a helper to reduce logs in frequently invoked codepaths. #34977

tvalentyn · 2025-05-16T19:03:29Z

We have several places in Python SDK where a certain log entry is emitted multiple times, when a single warning would be sufficient. This is causing excessive noise, particularly visible during job submission, when INFO logs are enabled.

We also have codepaths where we may want to log a particular message, but not more often than once per x minutes. Every time that happens, we add the same logic counting time, for example:

beam/sdks/python/apache_beam/runners/worker/data_plane.py

Line 167 in bea0444

if self._large_flush_last_observed_timestamp + 600 < time.time():

I suggest to introduce a helper that can streamline this functionality at the cost storing identifiers of a log entry in memory, such as a position in the codebase where the log is emitted, or some identifier to dedup or group similar messages.

For example, a caller can use an ID to print a GCS bucket configuration note only once per bucket, using bucket name as an ID.

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

codecov · 2025-05-16T19:49:49Z

Codecov Report

Attention: Patch coverage is 96.87500% with 1 line in your changes missing coverage. Please review.

Project coverage is 54.51%. Comparing base (5f9cd73) to head (3fbe327).
Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
...dks/python/apache_beam/options/pipeline_options.py	90.90%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff            @@
##             master   #34977   +/-   ##
=========================================
  Coverage     54.50%   54.51%           
  Complexity     1479     1479           
=========================================
  Files          1012     1013    +1     
  Lines        160667   160691   +24     
  Branches       1079     1079           
=========================================
+ Hits          87577    87603   +26     
+ Misses        70990    70988    -2     
  Partials       2100     2100

Flag	Coverage Δ
python	`81.00% <96.87%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Abacn · 2025-05-16T20:06:12Z

+1 this is a useful helper. Always wondering there is a Python implementation of absl's LOG_EVERY_N_SEC

github-actions · 2025-05-16T20:07:40Z

Assigning reviewers:

R: @claudevdm for label python.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

tvalentyn · 2025-05-17T03:16:05Z

Always wondering there is a Python implementation of absl's LOG_EVERY_N_SEC

Looks like there is:
https://github.com/abseil/abseil-py/blob/369ce9badbda914b7d3b975b7272e1194b419213/absl/logging/__init__.py#L494

Looks interesting, i wonder if we can use absl.logging instead without causing some unintentional consequences. It already has all these helpers.

tvalentyn · 2025-05-19T19:43:27Z

Looks interesting, i wonder if we can use absl.logging instead without causing some unintentional consequences. It already has all these helpers.

Took a look. I think the main issue is we adopted hierarchical logging with named loggers to allow for per-file logging levels. Absl uses a different method of per-file logging that would require code and documentation changes to adopt, so seems like not a drop-in replacement.

absl-py apis look better, I'll try to TAL if we can have something similar.

…warning.

tvalentyn · 2025-05-20T00:03:09Z

Found an implementation accomplishes what i'd like to have, will need to polish the change a bit.

github-actions · 2025-05-27T12:15:25Z

Reminder, please take a look at this pr: @claudevdm

tvalentyn · 2025-05-27T18:49:08Z

waiting on author

github-actions · 2025-05-30T12:15:02Z

Assigning new set of reviewers because Pr has gone too long without review. If you would like to opt out of this review, comment assign to next reviewer:

R: @jrmccluskey for label python.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

github-actions bot added python io gcp labels May 16, 2025

github-actions bot added the Next Action: Reviewers label May 16, 2025

tvalentyn force-pushed the less_logs branch 2 times, most recently from f6384e1 to 995e489 Compare May 17, 2025 01:06

tvalentyn marked this pull request as draft May 17, 2025 03:16

tvalentyn added 3 commits May 19, 2025 16:59

Introduce a helper to reduce logs in frequently invoked codepaths.

ebbd482

Reduce repetitive logs in pipeline_options. Driveby: fix soft-delete …

3e871ba

…warning.

lint

7dbec31

tvalentyn force-pushed the less_logs branch from 995e489 to 0ca9652 Compare May 20, 2025 00:02

tvalentyn added 4 commits May 19, 2025 21:25

Add logger helper functions from detectron2 (licensed as Apache 2.0)

3bb669b

Type hints

51c412f

Add some tests.

c0586cc

Allow *args.

0b01c48

tvalentyn force-pushed the less_logs branch from 0ca9652 to 0b01c48 Compare May 20, 2025 04:25

github-actions bot added the slow-review label May 27, 2025

github-actions bot added Next Action: Author and removed Next Action: Reviewers labels May 27, 2025

github-actions bot added reassigned-reviewers and removed slow-review labels May 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce a helper to reduce logs in frequently invoked codepaths. #34977

Introduce a helper to reduce logs in frequently invoked codepaths. #34977

Uh oh!

tvalentyn commented May 16, 2025 •

edited

Loading

Uh oh!

codecov bot commented May 16, 2025 •

edited

Loading

Uh oh!

Abacn commented May 16, 2025

Uh oh!

github-actions bot commented May 16, 2025

Uh oh!

tvalentyn commented May 17, 2025

Uh oh!

tvalentyn commented May 19, 2025 •

edited

Loading

Uh oh!

tvalentyn commented May 20, 2025

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

tvalentyn commented May 27, 2025

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

Uh oh!

Introduce a helper to reduce logs in frequently invoked codepaths. #34977

Are you sure you want to change the base?

Introduce a helper to reduce logs in frequently invoked codepaths. #34977

Uh oh!

Conversation

tvalentyn commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Actions Tests Status (on master branch)

Uh oh!

codecov bot commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Abacn commented May 16, 2025

Uh oh!

github-actions bot commented May 16, 2025

Uh oh!

tvalentyn commented May 17, 2025

Uh oh!

tvalentyn commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tvalentyn commented May 20, 2025

Uh oh!

github-actions bot commented May 27, 2025

Uh oh!

tvalentyn commented May 27, 2025

Uh oh!

github-actions bot commented May 30, 2025

Uh oh!

Uh oh!

tvalentyn commented May 16, 2025 •

edited

Loading

codecov bot commented May 16, 2025 •

edited

Loading

tvalentyn commented May 19, 2025 •

edited

Loading