Skip to content

Conversation

@snazy
Copy link
Member

@snazy snazy commented Dec 14, 2025

Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.

Implementation of the NoSQL meta-store maintenance implementation. It adds the meta-store specific handling to the existing NoSQL maintenance service to purge unreferenced and unneeded data from the database.
Copy link
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @snazy !

  1. The maintenance configs are not scoped down to NoSql which is required as per concencus, dev list discussion :
    Prev discussion : #3135 (comment)
    dev list discussion: https://lists.apache.org/thread/vmdpb45j1nmmlhswz7fm87gxgoldmr28

  2. Unclear why maintenance module should be part of Polaris repo and not Polaris tools ? In Nosql presentation this module is not mentioned either
    https://docs.google.com/presentation/d/1lX2EdvM0SeyuOdO_u1idlWfmnlH3hFE16JEyWo45Bdo/edit?slide=id.p24#slide=id.p24
    can you please open dev list thread for the same

  3. Using https://github.com/projectnessie/cel-java/ for rules, why such rules are required in the first place ? why not use https://github.com/google/cel-java directly which is actively maintained by google developers vs using https://github.com/projectnessie/cel-java/ which has just done dependency updates this year https://github.com/projectnessie/cel-java/pulls?page=8&q=is%3Apr+is%3Aclosed

}
}
},
x -> {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have meaningful name what does x denotes ?

Comment on lines +29 to +30
* Polaris stores a history of changes per kind of object (principals, principal roles, grants,
* immediate tasks, catalog roles and catalog state).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true for all persistence in Apache Polaris.

Suggested change
* Polaris stores a history of changes per kind of object (principals, principal roles, grants,
* immediate tasks, catalog roles and catalog state).
* No SQL persistence implementation of Polaris stores a history of changes per kind of object (principals, principal roles, grants,
* immediate tasks, catalog roles and catalog state).

* Polaris stores a history of changes per kind of object (principals, principal roles, grants,
* immediate tasks, catalog roles and catalog state).
*
* <p>The rules are defined using a <a href="https://github.com/projectnessie/cel-java/">CEL
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we using this library to define the rules, have we considered alternatives ? what kind of rules is required ?

* <li>{@code false} retains the most recent commit
* </ul>
*/
@ConfigMapping(prefix = "polaris.persistence.maintenance.catalog")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@ConfigMapping(prefix = "polaris.persistence.maintenance.catalog")
@ConfigMapping(prefix = "polaris.persistence.nosql.maintenance.catalog")

this is not persistence implementation agnostic

prev discussion : https://lists.apache.org/thread/vmdpb45j1nmmlhswz7fm87gxgoldmr28
similar callout: #3135 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I know there was some ongoing debate about config keys, but IIRC we did move ahead with the polaris.persistence.nosql key prefix in #3135 so I'm assuming additional config keys will follow the same pattern.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the discussion in #3135, I think it is reasonable to use the polaris.persistence.nosql config prefix in this PR.

@singhpk234 singhpk234 requested a review from dennishuo December 15, 2025 04:25
@dimas-b
Copy link
Contributor

dimas-b commented Dec 15, 2025

Unclear why maintenance module should be part of Polaris repo and not Polaris tools ?

The end-to-end maintenance workflow naturally includes some "service" code and "tool / CLI" code. This PR contains only the service portion, which has to align with the NoSQL metastore implementation, so it belongs in this repository, IMHO.

Copy link
Contributor

@dennishuo dennishuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like understanding #3028 and #3077 are somewhat important to understanding this PR in context and there's not a lot of discussion in those to help explain the overall maintenance design here.

Looks like the main maintenance README https://github.com/apache/polaris/blob/main/persistence/nosql/persistence/maintenance/README.md just points at https://github.com/apache/polaris/blob/main/persistence/nosql/persistence/maintenance/api/src/main/java/org/apache/polaris/persistence/nosql/maintenance/api/package-info.java which is certainly useful for understanding the high-level mark-and-sweep approach, but doesn't quite cover how admins are actually expected to interact with the maintenance service (e.g. is it normally supposed to be bundled into the main Polaris service fat jar but invoked as a separate main? Is it supposed to be an always-running singleton service that runs in an infinite loop? Scheduled as a crontab? etc)

Maybe providing a user guide README (or pointing at it in these PRs if it already exists somewhere else) could help answer some of the questions others had such as how to actually configure the CEL expressions and what exactly is meant by The rules are defined using a <a href="https://github.com/projectnessie/cel-java/">CEL script</a>

Seeing the end-to-end admin flow would also help contextualize the split of "server-side" and "client-side" code related to this maintenance, that @dimas-b talked about

* <li>{@code false} retains the most recent commit
* </ul>
*/
@ConfigMapping(prefix = "polaris.persistence.maintenance.catalog")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I know there was some ongoing debate about config keys, but IIRC we did move ahead with the polaris.persistence.nosql key prefix in #3135 so I'm assuming additional config keys will follow the same pattern.

String DEFAULT_CATALOGS_HISTORY_RETAIN = "false";
String DEFAULT_CATALOG_ROLES_RETAIN = "false";
String DEFAULT_CATALOG_POLICIES_RETAIN = "ageDays < 30 || commits <= 1";
String DEFAULT_CATALOG_STATE_RETAIN = "ageDays < 30 || commits <= 1";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ageDays condition the only thing to distinguish between "staged but not yet committed" states that are being assembled as part of a transaction and old states that were once committed/valid but are expired now?

@dimas-b
Copy link
Contributor

dimas-b commented Dec 16, 2025

Seeing the end-to-end admin flow would also help contextualize the split of "server-side" and "client-side" code related to this maintenance, that @dimas-b talked about

Hypothetically (ATM), the Admin tool could involve the maintenance code to perform its job.

Other implementations are possible too. It's an open-ended design where downstream projects are free to implement their own triggering mechanisms. I imagine k8s jobs is one of possible options.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants