-
Notifications
You must be signed in to change notification settings - Fork 548
Feature:4015 Log metadata changes #4049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
6784dcb
to
67dfc61
Compare
|
||
def __hash__(self) -> int: | ||
"""Hash operator.""" | ||
# Only safe if _key() is made from immutable values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does _key()
here refer to?
The `log_metadata` function does not support logging metadata for | ||
multiple entities of the same type. To do it, you can use the ZenML Client | ||
functionality directly: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `log_metadata` function does not support logging metadata for | |
multiple entities of the same type. To do it, you can use the ZenML Client | |
functionality directly: | |
The `log_metadata` function does not support logging the same metadata for | |
multiple entities of the same type at once. To do it, you can use the ZenML Client | |
functionality directly: |
src/zenml/utils/metadata_utils.py
Outdated
# Manual logging to a step | ||
log_metadata(metadata={}, step_name=..., run_id_name_or_prefix=...) | ||
log_metadata(metadata={}, step_id=...) | ||
# Manual logging to a run | ||
log_metadata(metadata={}, run_id_name_or_prefix=...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure how to fix this without breaking the functionality but I think there is something off here. Previously, if I would provide:
- Step id -> I log it to the step
- Run id name or prefix -> I log it to the run
- Run id name or prefix and step name -> I log it to the step
While the first two stay the same, the third one now logs it not only to the step but also the run. In a setting, where there are a lot of steps, this might end up overloading the pipeline run. Any ideas on how to fix it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if I provide all three of them together, I think step_id
will be processed but step_run_name
and run_id_name_or_prefix
will be silently ignored. I haven't checked it yet but the same might apply to other entities as well. I am not sure if ignoring it is the best idea here. I was more leaning towards raising a ValueError or in the worst case, throwing a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add a validation mechanism, sure. So you either pass values by id or name/version. I feel like the best possible solution here is to introduce a new function with a new signature that groups those options to valid parameter groups:
def log_metadata(
step_identifier: StepIdentifier | None,
model_identifier: VersionedIdentifier | None,
artifact_identifier: VersionedIdentifier | None,
pipeline_run_id
):
Where for instance identifiers are:
class VersionedIdentifier:
uuid: UUID | None
name: str | None
version: str | None
def validate_options():
# validate either id or name/version are set
...
For step we could have something like:
class StepRunIdentifier:
id: UUID | None
name: str | None
pipeline_identifier: str | UUID | None
def validate_options():
# validate either id or name/pipeline are used
...
With this implementation when you specify the StepRunIdentifier it doesn't get mixed up with the pipeline_run_id, so you can control and separate when it is logged for the pipeline or not.
a1ac646
to
f65865f
Compare
ZenML CLI Performance Comparison (Threshold: 1.0s, Timeout: 60s, Slow: 5s)❌ Failed Commands on Current Branch (feature/4015-multi-log-metadata)
🚨 New Failures IntroducedThe following commands fail on your branch but worked on the target branch:
Performance Comparison
Summary
Environment Info
|
ed97a14
to
1e71ad9
Compare
1e71ad9
to
5f1c747
Compare
Describe changes
I implemented/fixed _ to achieve _.
Pre-requisites
Please ensure you have done the following:
develop
and the open PR is targetingdevelop
. If your branch wasn't based on develop read Contribution guide on rebasing branch to develop.Types of changes