-
Notifications
You must be signed in to change notification settings - Fork 151
Add support for Experiment tracking in Model Registry, fixes #1224 #1318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for Experiment tracking in Model Registry, fixes #1224 #1318
Conversation
|
I'm working on a couple of more fixes for minor issues in testing against the MLflow plugin POC. |
|
/hold ref #1318 (comment) |
61e46d2 to
1aa1492
Compare
|
Fixed the last few issues with metric history. All tests now pass with the mlflow plugin coming in another PR. |
|
Looks like the embedmd DB testing classes were modified on |
pboyd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is bigger than I can effectively review in one sitting. Is it possible to split it up?
|
Would it be worth it to remove the |
|
@syntaxsdev I added the generated client files to pass a check that was failing. But they are not part of the MR client SDK and we'd still need the client SDK work you did in another PR after merging this PR. |
ce11089 to
f721ae7
Compare
Signed-off-by: Dhiraj Bokde <[email protected]>
Signed-off-by: Dhiraj Bokde <[email protected]>
Signed-off-by: Dhiraj Bokde <[email protected]>
Signed-off-by: Dhiraj Bokde <[email protected]>
Signed-off-by: Dhiraj Bokde <[email protected]>
…rvice tests Signed-off-by: Dhiraj Bokde <[email protected]>
…ype properties only support int32, not int64 Signed-off-by: Dhiraj Bokde <[email protected]>
…es in artifact queries Signed-off-by: Dhiraj Bokde <[email protected]>
…riment run metrics Signed-off-by: Dhiraj Bokde <[email protected]>
Signed-off-by: Dhiraj Bokde <[email protected]>
Signed-off-by: Dhiraj Bokde <[email protected]>
Signed-off-by: Dhiraj Bokde <[email protected]>
…nflicts Signed-off-by: Dhiraj Bokde <[email protected]>
…ies and custom properties Signed-off-by: Dhiraj Bokde <[email protected]>
Signed-off-by: Dhiraj Bokde <[email protected]>
fege
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Signed-off-by: Dhiraj Bokde <[email protected]>
|
Identified one more thing that needs to be fixed and detected by fuzzy tests. Working on a fix now. That should be the last one for all fuzzy tests. |
Signed-off-by: Dhiraj Bokde <[email protected]>
|
All fuzzer tests are now passing. The PR is ready to merge. |
|
/verified |
|
Any concerns @Al-Pragliola @tarilabs in moving forward and merging this? |
|
/lgtm |
we should account for unresolved comment and ensure the "edge cases" in the backend do not spawn further same goes for general comment that doing feature this way is not efficient that said we already discussed 1. Release, 2. merge this, 3. merge MLMD codepath removal PR, did you had another plan of step in mind @rareddy ? 🤔 |
|
/lgtm reviewed delta changes since last approval |
tarilabs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- #1318 (comment)
- #1318 (comment)
- #1318 (comment)
- #1318 (review)
- #1318 (comment)
- #1318 (review)
- #1318 (review)
again many thanks to @dhirajsb for leading this impressive work and the exceptional review efforts by all involved
per discussions
/approve
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: tarilabs The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
Great work by all the folks in seeing this through! |
|
Thanks for the detailed reviews on a big PR everyone. Great team work for a huge feature improvement in the next phase of this project. |
…#1224 (kubeflow#1318) * feat: initial version of experiments and runs API Signed-off-by: Dhiraj Bokde <[email protected]> * feat: experiments and runs initial implementation (wip) Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fixed failing unit tests for experiments and runs Signed-off-by: Dhiraj Bokde <[email protected]> * fix: added experiment and experimentrun tests Signed-off-by: Dhiraj Bokde <[email protected]> * feat: added DataSet, Metric, and Parameter types Signed-off-by: Dhiraj Bokde <[email protected]> * feat: added implementatio of DataSet, Metric, and Param, including service tests Signed-off-by: Dhiraj Bokde <[email protected]> * fix: replace int properties for timestamps with string because mlmd type properties only support int32, not int64 Signed-off-by: Dhiraj Bokde <[email protected]> * feat: add support for artifactType query param to filter artifact types in artifact queries Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add metrics history endpoint and metric history storage for experiment run metrics Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix artifactType query param type in generated service Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix go lint error in unit test Signed-off-by: Dhiraj Bokde <[email protected]> * fix: filter out metric history from artifacts endpoints Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix metric history name to use last update time to avoid name conflicts Signed-off-by: Dhiraj Bokde <[email protected]> * feat: add filterQuery param on all context types to search by properties and custom properties Signed-off-by: Dhiraj Bokde <[email protected]> * feat: initial version of experiment tracking implemented on embedmd, rebased on main Signed-off-by: Dhiraj Bokde <[email protected]> * feat: add support for filterQuery parameter for all ListResponse endpoints for embedmd datastore Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add support for stepIds query parameter in embedmd datastore Signed-off-by: Dhiraj Bokde <[email protected]> * feat: refactor embedmd db service to use generic repository implementation to reduce code duplication Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add support for artifactType query parameter for embedmd datastore Signed-off-by: Dhiraj Bokde <[email protected]> * fix: use mysql 8.3 in unit tests Signed-off-by: Dhiraj Bokde <[email protected]> * fix: refactor name mapping and default name handling in embedmd datastore Signed-off-by: Dhiraj Bokde <[email protected]> * feat: support updating metrics and parameters by name, fix ignoring metric history when retrieving all artifacts for runs and versions Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add missing generated openapi python client files for PR github action check Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix failing shared db tests Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add support for metric and parameter description, add missing type property migraiton Signed-off-by: Dhiraj Bokde <[email protected]> * chore: update files from main Signed-off-by: Alessio Pragliola <[email protected]> * fix: added missing godoc comments in pkg/api/api.go Signed-off-by: Dhiraj Bokde <[email protected]> * fix: replace ambiguous ArtifactListReponse return type from GetExperimentRunMetricHistory with MetricListResponse Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fixed incorrect artifactType in dataset response, added tests to verify all artifact types Signed-off-by: Dhiraj Bokde <[email protected]> * feat: add validation for endTimeSinceEpoch property on experiment run updates Signed-off-by: Dhiraj Bokde <[email protected]> * Replace value type validation map with a switch in query_translator.go Co-authored-by: Paul Boyd <[email protected]> Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add service e2e tests for filterQuery, fix name query param handling, fix DB tests that didn't use parent id prefix Signed-off-by: Dhiraj Bokde <[email protected]> * chore: code cleanup, replace interface{} with any, added vetting for internal/db/filter Signed-off-by: Dhiraj Bokde <[email protected]> * chore: added flag vF for fixed string grep exclude Signed-off-by: Dhiraj Bokde <[email protected]> * fix: copied orderby and parameters back to registry and catalog to have different values Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fixed mlmd query translator handling of escaped backslashes Signed-off-by: Dhiraj Bokde <[email protected]> * chore: add test to verify parseCustomPropertyField won't panic with a property name ending in dot Signed-off-by: Dhiraj Bokde <[email protected]> * fix: sync generated python client code Signed-off-by: Dhiraj Bokde <[email protected]> * fix: readiness probe tests and new types Signed-off-by: Alessio Pragliola <[email protected]> * chore: refactor readiness_test Signed-off-by: Alessio Pragliola <[email protected]> * fix: ensure parentResourceId is used to filter resource lookup by params, add unit tests for duplicate child resource lookups Signed-off-by: Dhiraj Bokde <[email protected]> * fix: throw an error if a metric value is missing, add test to validate Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix http status error code for invalid ids Signed-off-by: Dhiraj Bokde <[email protected]> * fix: more id validation, fixed filterQuery passing to DB layer Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix failing unit test Signed-off-by: Dhiraj Bokde <[email protected]> * fix: validate experiment id when listing runs Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix failing validation test after fixing http status codes Signed-off-by: Dhiraj Bokde <[email protected]> * fix: avoid duplicate key errors if externalid is set in metric when creating metric history entries Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add fuzzer tests for experiment runs and new artifact types Signed-off-by: Dhiraj Bokde <[email protected]> * chore: code cleanup and format fuzzer tests Signed-off-by: Dhiraj Bokde <[email protected]> * fix: log error in fuzzer test Signed-off-by: Dhiraj Bokde <[email protected]> * fix: handle null artifact names correctly on create Signed-off-by: Dhiraj Bokde <[email protected]> --------- Signed-off-by: Dhiraj Bokde <[email protected]> Signed-off-by: Alessio Pragliola <[email protected]> Signed-off-by: Alessio Pragliola <[email protected]> Co-authored-by: Alessio Pragliola <[email protected]> Co-authored-by: Alessio Pragliola <[email protected]> Co-authored-by: Paul Boyd <[email protected]> Signed-off-by: syntaxsdev <[email protected]>
…#1224 (kubeflow#1318) * feat: initial version of experiments and runs API Signed-off-by: Dhiraj Bokde <[email protected]> * feat: experiments and runs initial implementation (wip) Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fixed failing unit tests for experiments and runs Signed-off-by: Dhiraj Bokde <[email protected]> * fix: added experiment and experimentrun tests Signed-off-by: Dhiraj Bokde <[email protected]> * feat: added DataSet, Metric, and Parameter types Signed-off-by: Dhiraj Bokde <[email protected]> * feat: added implementatio of DataSet, Metric, and Param, including service tests Signed-off-by: Dhiraj Bokde <[email protected]> * fix: replace int properties for timestamps with string because mlmd type properties only support int32, not int64 Signed-off-by: Dhiraj Bokde <[email protected]> * feat: add support for artifactType query param to filter artifact types in artifact queries Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add metrics history endpoint and metric history storage for experiment run metrics Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix artifactType query param type in generated service Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix go lint error in unit test Signed-off-by: Dhiraj Bokde <[email protected]> * fix: filter out metric history from artifacts endpoints Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix metric history name to use last update time to avoid name conflicts Signed-off-by: Dhiraj Bokde <[email protected]> * feat: add filterQuery param on all context types to search by properties and custom properties Signed-off-by: Dhiraj Bokde <[email protected]> * feat: initial version of experiment tracking implemented on embedmd, rebased on main Signed-off-by: Dhiraj Bokde <[email protected]> * feat: add support for filterQuery parameter for all ListResponse endpoints for embedmd datastore Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add support for stepIds query parameter in embedmd datastore Signed-off-by: Dhiraj Bokde <[email protected]> * feat: refactor embedmd db service to use generic repository implementation to reduce code duplication Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add support for artifactType query parameter for embedmd datastore Signed-off-by: Dhiraj Bokde <[email protected]> * fix: use mysql 8.3 in unit tests Signed-off-by: Dhiraj Bokde <[email protected]> * fix: refactor name mapping and default name handling in embedmd datastore Signed-off-by: Dhiraj Bokde <[email protected]> * feat: support updating metrics and parameters by name, fix ignoring metric history when retrieving all artifacts for runs and versions Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add missing generated openapi python client files for PR github action check Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix failing shared db tests Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add support for metric and parameter description, add missing type property migraiton Signed-off-by: Dhiraj Bokde <[email protected]> * chore: update files from main Signed-off-by: Alessio Pragliola <[email protected]> * fix: added missing godoc comments in pkg/api/api.go Signed-off-by: Dhiraj Bokde <[email protected]> * fix: replace ambiguous ArtifactListReponse return type from GetExperimentRunMetricHistory with MetricListResponse Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fixed incorrect artifactType in dataset response, added tests to verify all artifact types Signed-off-by: Dhiraj Bokde <[email protected]> * feat: add validation for endTimeSinceEpoch property on experiment run updates Signed-off-by: Dhiraj Bokde <[email protected]> * Replace value type validation map with a switch in query_translator.go Co-authored-by: Paul Boyd <[email protected]> Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add service e2e tests for filterQuery, fix name query param handling, fix DB tests that didn't use parent id prefix Signed-off-by: Dhiraj Bokde <[email protected]> * chore: code cleanup, replace interface{} with any, added vetting for internal/db/filter Signed-off-by: Dhiraj Bokde <[email protected]> * chore: added flag vF for fixed string grep exclude Signed-off-by: Dhiraj Bokde <[email protected]> * fix: copied orderby and parameters back to registry and catalog to have different values Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fixed mlmd query translator handling of escaped backslashes Signed-off-by: Dhiraj Bokde <[email protected]> * chore: add test to verify parseCustomPropertyField won't panic with a property name ending in dot Signed-off-by: Dhiraj Bokde <[email protected]> * fix: sync generated python client code Signed-off-by: Dhiraj Bokde <[email protected]> * fix: readiness probe tests and new types Signed-off-by: Alessio Pragliola <[email protected]> * chore: refactor readiness_test Signed-off-by: Alessio Pragliola <[email protected]> * fix: ensure parentResourceId is used to filter resource lookup by params, add unit tests for duplicate child resource lookups Signed-off-by: Dhiraj Bokde <[email protected]> * fix: throw an error if a metric value is missing, add test to validate Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix http status error code for invalid ids Signed-off-by: Dhiraj Bokde <[email protected]> * fix: more id validation, fixed filterQuery passing to DB layer Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix failing unit test Signed-off-by: Dhiraj Bokde <[email protected]> * fix: validate experiment id when listing runs Signed-off-by: Dhiraj Bokde <[email protected]> * fix: fix failing validation test after fixing http status codes Signed-off-by: Dhiraj Bokde <[email protected]> * fix: avoid duplicate key errors if externalid is set in metric when creating metric history entries Signed-off-by: Dhiraj Bokde <[email protected]> * fix: add fuzzer tests for experiment runs and new artifact types Signed-off-by: Dhiraj Bokde <[email protected]> * chore: code cleanup and format fuzzer tests Signed-off-by: Dhiraj Bokde <[email protected]> * fix: log error in fuzzer test Signed-off-by: Dhiraj Bokde <[email protected]> * fix: handle null artifact names correctly on create Signed-off-by: Dhiraj Bokde <[email protected]> --------- Signed-off-by: Dhiraj Bokde <[email protected]> Signed-off-by: Alessio Pragliola <[email protected]> Signed-off-by: Alessio Pragliola <[email protected]> Co-authored-by: Alessio Pragliola <[email protected]> Co-authored-by: Alessio Pragliola <[email protected]> Co-authored-by: Paul Boyd <[email protected]> Signed-off-by: syntaxsdev <[email protected]>
Description
Add support for Experiment Tracking APIs similare to MLflow and other registries like wandb.
This is required for experiment tracking using model registry SDK as well as forms the basis of the MLflow SDK integration feature #1225.
See the feature request #1224 for detailed description of enhancements in this PR.
How Has This Been Tested?
Added extensive unit and integration tests for every entity and endpoint for Experiment tracking.
Detailed unit tests for the filterQuery search enhancement.
Merge criteria:
DCOcheck)ok-to-testhas been added to the PR.If you have UI changes