Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

stats: add output_file option to output the stats to a file #20512

Conversation

AlexTereshenkov
Copy link
Member

@AlexTereshenkov AlexTereshenkov commented Feb 8, 2024

Currently when running a Pants goal with the --stats-log enabled, the stats are showing at the end of the goal's output. This makes distinguishing between the goal output and stats difficult. One cannot redirect only the stats data to a file as it's just part of the goal output.

This PR adds a new option to request saving the stats into a file to be able to parse it later or archive.

As a next step, I'd like to see a structured data produced. JSON seems sensible so that one can mung it to obtain data in any desired format such as Prometheus. The logged stats data is not really suitable for parsing.


Usage:

$ rm -rf stats.txt; pants --stats-log --stats-output-file="stats.txt" roots; head stats.txt
.
3rdparty/jvm
3rdparty/python
3rdparty/tools
build-support/bin
build-support/flake8
build-support/migration-support
pants-plugins
src/assets
src/python
src/rust
src/rust/engine/dep_inference/src/javascript
src/rust/engine/dep_inference/src/python
src/rust/engine/src/externs
src/rust/engine/ui/src/instance
testprojects/pants-plugins/src/python
testprojects/src/go
testprojects/src/js
testprojects/src/js/coverage_workspaces/jest_coverage/src/test
testprojects/src/js/coverage_workspaces/mocha_coverage/src/test
testprojects/src/js/jest_coverage/src/test
testprojects/src/js/mocha_coverage/src/test
testprojects/src/jvm
testprojects/src/python
testprojects/src/ts
tests/python
Counters:
  backtrack_attempts: 0
  docker_execution_errors: 0
  docker_execution_requests: 0
  docker_execution_successes: 0
  local_cache_read_errors: 0
  local_cache_requests: 0
  local_cache_requests_cached: 0
  local_cache_requests_uncached: 0
  local_cache_total_time_saved_ms: 0

Comment on lines 166 to 167
with open(self.output_file, "w") as fh:
fh.write(output_contents)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this doesn't happen in a any rule, I can't use https://www.pantsbuild.org/2.18/docs/writing-plugins/the-rules-api/file-system#workspacewrite_digest-save-to-disk, can I? Would this be an appropriate way to write into the build root directory?

Copy link
Contributor

@huonw huonw Feb 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially the construct_callback rule could request the Workspace and pass it in, although I don't know if holding on to that object outside the single rule invocation is a problem (i.e. if it's stashed in the callback for later).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I've experimented a bit, but it was a mess adding a workspace writing digest call as I have to construct a Digest outside of rule as well and to have it, notable amount of refactoring would be required which I'd like to avoid. It looks like open is used often in the repo to write things in the build root, so I assume this approach should be legit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea, the thing to consider is if this may be called as part of a remote process invocation (i.e. on a remote build server rather than the local machine where pants was invoked.) In this case, I imagine that it being processed during the "finished" state, that it will be on the local machine only.. and as such, using open is legit. c.f.

if publish.output:
with open(publish.output, mode="w") as fd:
fd.write(output_data)

@AlexTereshenkov AlexTereshenkov force-pushed the python/add-output-file-option-stats-subsystem branch from abf88b7 to bbf3b99 Compare February 9, 2024 15:42
@AlexTereshenkov AlexTereshenkov marked this pull request as ready for review February 9, 2024 16:10
@AlexTereshenkov AlexTereshenkov requested review from huonw and kaos February 9, 2024 16:10
@AlexTereshenkov AlexTereshenkov requested a review from kaos February 12, 2024 10:49
Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I really like the idea here. It's been something I've been annoyed by on our internal CI too!

"""Send text to the stdout or write to the output file."""
if text:
if output_file:
with open(output_file, "w") as fh:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, people seem to generally run pants in CI as multiple invocations, e.g. pants lint check :: followed by pants test :: (https://github.com/pantsbuild/example-python/blob/26d55f2772b72944d7f092019e263f5a5d8984d2/.github/workflows/pants.yaml#L50-L63). It'd be kinda convenient if each invocation could either:

  • have its own output file automatically in some way
  • be appended to a single output file without overwriting

In particular, it'd be nifty to be able to have a static pants.ci.toml that configures this, like:

[stats]
log = true
output_file = "path/to/file.txt"

As it is, it seems that CI that runs multiple separate pants invocations will need to add a unique --stats-output-file=path/to/file-lint.txt to each invocation and then save/export multiple all of those files.

Two ideas:

  • open this file in append mode, and print a header about which goal and maybe some metadata like timestamp for each 'write'?
  • support some sort of placeholder in the output file name, e.g. output_file = "path/to/file-{unique}.txt" that has a unique random value inserted

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the append idea over using unique files.

Copy link
Member Author

@AlexTereshenkov AlexTereshenkov Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion, I think indeed providing individual files for each Pants invocation may be unfeasible, so aggregating the stats is better. When I'll submit a PR for creating a JSON, it will be easier as each invocation will be a separate object in the list of invocations. For now, the plain text is added in this manner.

$ pants --stats-log --stats-output-file="stats.txt" --stats-memory-summary list filedeps roots ::
$ pants --stats-log --stats-output-file="stats.txt" --stats-memory-summary peek roots ::
2024-02-13 15:31:51 Executing goals: list,filedeps,roots
Counters:
  backtrack_attempts: 0
  docker_execution_errors: 0
...
  p75: 131199
  p90: 221183
  p95: 255487
  p99: 258303

2024-02-13 15:32:47 Executing goals: peek,roots
Counters:
  backtrack_attempts: 0
  docker_execution_errors: 0
  docker_execution_requests: 0
  docker_execution_successes: 0
  local_cache_read_errors: 0
  local_cache_requests: 92
...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timestamp and the executed goals are only shown when a file is produced, the console output is unchanged.

@AlexTereshenkov AlexTereshenkov force-pushed the python/add-output-file-option-stats-subsystem branch from b790073 to e845d53 Compare February 13, 2024 16:44
Copy link
Contributor

@huonw huonw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! Looks great other than the minor comments about run_tracker.

I'm excited to clean up our CI logs with this!

Copy link
Member

@kaos kaos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀 (same caveats as from Huon's excellent review)

@AlexTereshenkov AlexTereshenkov merged commit e9d9941 into pantsbuild:main Feb 14, 2024
24 checks passed
@AlexTereshenkov AlexTereshenkov deleted the python/add-output-file-option-stats-subsystem branch February 14, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants