-
Notifications
You must be signed in to change notification settings - Fork 344
NoSQL: Metastore maintenance #3268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @snazy !
-
The maintenance configs are not scoped down to NoSql which is required as per concencus, dev list discussion :
Prev discussion : #3135 (comment)
dev list discussion: https://lists.apache.org/thread/vmdpb45j1nmmlhswz7fm87gxgoldmr28 -
Unclear why maintenance module should be part of Polaris repo and not Polaris tools ? In Nosql presentation this module is not mentioned either
https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24
can you please open dev list thread for the same -
Using https://github.com/projectnessie/cel-java/ for rules, why such rules are required in the first place ? why not use https://github.com/google/cel-java directly which is actively maintained by google developers vs using https://github.com/projectnessie/cel-java/ which has just done dependency updates this year https://github.com/projectnessie/cel-java/pulls?page=8&q=is%3Apr+is%3Aclosed
| } | ||
| } | ||
| }, | ||
| x -> {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we have meaningful name what does x denotes ?
| * Polaris stores a history of changes per kind of object (principals, principal roles, grants, | ||
| * immediate tasks, catalog roles and catalog state). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not true for all persistence in Apache Polaris.
| * Polaris stores a history of changes per kind of object (principals, principal roles, grants, | |
| * immediate tasks, catalog roles and catalog state). | |
| * No SQL persistence implementation of Polaris stores a history of changes per kind of object (principals, principal roles, grants, | |
| * immediate tasks, catalog roles and catalog state). |
| * Polaris stores a history of changes per kind of object (principals, principal roles, grants, | ||
| * immediate tasks, catalog roles and catalog state). | ||
| * | ||
| * <p>The rules are defined using a <a href="https://github.com/projectnessie/cel-java/">CEL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why are we using this library to define the rules, have we considered alternatives ? what kind of rules is required ?
| * <li>{@code false} retains the most recent commit | ||
| * </ul> | ||
| */ | ||
| @ConfigMapping(prefix = "polaris.persistence.maintenance.catalog") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @ConfigMapping(prefix = "polaris.persistence.maintenance.catalog") | |
| @ConfigMapping(prefix = "polaris.persistence.nosql.maintenance.catalog") |
this is not persistence implementation agnostic
prev discussion : https://lists.apache.org/thread/vmdpb45j1nmmlhswz7fm87gxgoldmr28
similar callout: #3135 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I know there was some ongoing debate about config keys, but IIRC we did move ahead with the polaris.persistence.nosql key prefix in #3135 so I'm assuming additional config keys will follow the same pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the discussion in #3135, I think it is reasonable to use the polaris.persistence.nosql config prefix in this PR.
The end-to-end maintenance workflow naturally includes some "service" code and "tool / CLI" code. This PR contains only the service portion, which has to align with the NoSQL metastore implementation, so it belongs in this repository, IMHO. |
dennishuo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like understanding #3028 and #3077 are somewhat important to understanding this PR in context and there's not a lot of discussion in those to help explain the overall maintenance design here.
Looks like the main maintenance README https://github.com/apache/polaris/blob/main/persistence/nosql/persistence/maintenance/README.md just points at https://github.com/apache/polaris/blob/main/persistence/nosql/persistence/maintenance/api/src/main/java/org/apache/polaris/persistence/nosql/maintenance/api/package-info.java which is certainly useful for understanding the high-level mark-and-sweep approach, but doesn't quite cover how admins are actually expected to interact with the maintenance service (e.g. is it normally supposed to be bundled into the main Polaris service fat jar but invoked as a separate main? Is it supposed to be an always-running singleton service that runs in an infinite loop? Scheduled as a crontab? etc)
Maybe providing a user guide README (or pointing at it in these PRs if it already exists somewhere else) could help answer some of the questions others had such as how to actually configure the CEL expressions and what exactly is meant by The rules are defined using a <a href="https://github.com/projectnessie/cel-java/">CEL script</a>
Seeing the end-to-end admin flow would also help contextualize the split of "server-side" and "client-side" code related to this maintenance, that @dimas-b talked about
| * <li>{@code false} retains the most recent commit | ||
| * </ul> | ||
| */ | ||
| @ConfigMapping(prefix = "polaris.persistence.maintenance.catalog") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1, I know there was some ongoing debate about config keys, but IIRC we did move ahead with the polaris.persistence.nosql key prefix in #3135 so I'm assuming additional config keys will follow the same pattern.
| String DEFAULT_CATALOGS_HISTORY_RETAIN = "false"; | ||
| String DEFAULT_CATALOG_ROLES_RETAIN = "false"; | ||
| String DEFAULT_CATALOG_POLICIES_RETAIN = "ageDays < 30 || commits <= 1"; | ||
| String DEFAULT_CATALOG_STATE_RETAIN = "ageDays < 30 || commits <= 1"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the ageDays condition the only thing to distinguish between "staged but not yet committed" states that are being assembled as part of a transaction and old states that were once committed/valid but are expired now?
Hypothetically (ATM), the Admin tool could involve the maintenance code to perform its job. Other implementations are possible too. It's an open-ended design where downstream projects are free to implement their own triggering mechanisms. I imagine k8s jobs is one of possible options. |
Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.