-
-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stats: add output_file option to output the stats to a file #20512
stats: add output_file option to output the stats to a file #20512
Conversation
with open(self.output_file, "w") as fh: | ||
fh.write(output_contents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As this doesn't happen in a any rule, I can't use https://www.pantsbuild.org/2.18/docs/writing-plugins/the-rules-api/file-system#workspacewrite_digest-save-to-disk, can I? Would this be an appropriate way to write into the build root directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potentially the construct_callback
rule could request the Workspace
and pass it in, although I don't know if holding on to that object outside the single rule invocation is a problem (i.e. if it's stashed in the callback for later).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I've experimented a bit, but it was a mess adding a workspace writing digest call as I have to construct a Digest outside of rule as well and to have it, notable amount of refactoring would be required which I'd like to avoid. It looks like open
is used often in the repo to write things in the build root, so I assume this approach should be legit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yea, the thing to consider is if this may be called as part of a remote process invocation (i.e. on a remote build server rather than the local machine where pants
was invoked.) In this case, I imagine that it being processed during the "finished" state, that it will be on the local machine only.. and as such, using open
is legit. c.f.
pants/src/python/pants/core/goals/publish.py
Lines 288 to 290 in e30da1c
if publish.output: | |
with open(publish.output, mode="w") as fd: | |
fd.write(output_data) |
abf88b7
to
bbf3b99
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice, I really like the idea here. It's been something I've been annoyed by on our internal CI too!
"""Send text to the stdout or write to the output file.""" | ||
if text: | ||
if output_file: | ||
with open(output_file, "w") as fh: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, people seem to generally run pants in CI as multiple invocations, e.g. pants lint check ::
followed by pants test ::
(https://github.com/pantsbuild/example-python/blob/26d55f2772b72944d7f092019e263f5a5d8984d2/.github/workflows/pants.yaml#L50-L63). It'd be kinda convenient if each invocation could either:
- have its own output file automatically in some way
- be appended to a single output file without overwriting
In particular, it'd be nifty to be able to have a static pants.ci.toml
that configures this, like:
[stats]
log = true
output_file = "path/to/file.txt"
As it is, it seems that CI that runs multiple separate pants invocations will need to add a unique --stats-output-file=path/to/file-lint.txt
to each invocation and then save/export multiple all of those files.
Two ideas:
- open this file in append mode, and print a header about which goal and maybe some metadata like timestamp for each 'write'?
- support some sort of placeholder in the output file name, e.g.
output_file = "path/to/file-{unique}.txt"
that has a unique random value inserted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the append idea over using unique files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion, I think indeed providing individual files for each Pants invocation may be unfeasible, so aggregating the stats is better. When I'll submit a PR for creating a JSON, it will be easier as each invocation will be a separate object in the list of invocations. For now, the plain text is added in this manner.
$ pants --stats-log --stats-output-file="stats.txt" --stats-memory-summary list filedeps roots ::
$ pants --stats-log --stats-output-file="stats.txt" --stats-memory-summary peek roots ::
2024-02-13 15:31:51 Executing goals: list,filedeps,roots
Counters:
backtrack_attempts: 0
docker_execution_errors: 0
...
p75: 131199
p90: 221183
p95: 255487
p99: 258303
2024-02-13 15:32:47 Executing goals: peek,roots
Counters:
backtrack_attempts: 0
docker_execution_errors: 0
docker_execution_requests: 0
docker_execution_successes: 0
local_cache_read_errors: 0
local_cache_requests: 92
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The timestamp and the executed goals are only shown when a file is produced, the console output is unchanged.
b790073
to
e845d53
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! Looks great other than the minor comments about run_tracker
.
I'm excited to clean up our CI logs with this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀 (same caveats as from Huon's excellent review)
Currently when running a Pants goal with the
--stats-log
enabled, the stats are showing at the end of the goal's output. This makes distinguishing between the goal output and stats difficult. One cannot redirect only the stats data to a file as it's just part of the goal output.This PR adds a new option to request saving the stats into a file to be able to parse it later or archive.
As a next step, I'd like to see a structured data produced. JSON seems sensible so that one can mung it to obtain data in any desired format such as Prometheus. The logged stats data is not really suitable for parsing.
Usage: