Skip to content

Conversation

@1daidai1
Copy link
Collaborator

@1daidai1 1daidai1 commented Jul 29, 2025

添加
image
删除
image
修改默认default-wckey
image
提交作业带参数
image
提交作业不带wckey,采用默认wckey
image

Summary by CodeRabbit

  • New Features
    • Workload-key (wckey) management: add, delete, query, set default via new service endpoints and APIs; persistent storage and lifecycle handling; runtime flag to enable wckey.
  • Behavior
    • Tasks may include or inherit a default wckey; validation enforces existence and marks default usage in task attributes.
  • Bug Fixes
    • Data wipe tools updated to clear wckey data when requested or in default wipe modes.
  • Documentation
    • Added Wckey Management docs (add/delete/modify/show) and sample commented config lines.
  • Style
    • New user-facing error messages for clearer wckey diagnostics.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 29, 2025

Warning

Rate limit exceeded

@L-Xiafeng has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 24 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 24a99bf and 7b56f5b.

📒 Files selected for processing (20)
  • docs/en/command/cacctmgr.md
  • docs/zh/command/cacctmgr.md
  • etc/config.yaml
  • protos/Crane.proto
  • protos/PublicDefs.proto
  • scripts/wipe_data.py
  • scripts/wipe_data.sh
  • src/CraneCtld/AccountManager.cpp
  • src/CraneCtld/AccountManager.h
  • src/CraneCtld/CraneCtld.cpp
  • src/CraneCtld/CtldPreCompiledHeader.h
  • src/CraneCtld/CtldPublicDefs.cpp
  • src/CraneCtld/CtldPublicDefs.h
  • src/CraneCtld/DbClient.cpp
  • src/CraneCtld/DbClient.h
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
  • src/CraneCtld/RpcService/CtldGrpcServer.h
  • src/CraneCtld/TaskScheduler.cpp
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h
  • src/Utilities/PublicHeader/include/crane/String.h
📝 Walkthrough

Walkthrough

Adds end-to-end "wckey" support: protobufs and gRPC RPCs; AccountManager in-memory + transactional Wckey CRUD and default management; DbClient persistence and document mapping to new wckey_table; TaskScheduler wckey handling; config parsing flag, wipe scripts, error strings, and docs.

Changes

Cohort / File(s) Change Summary
Config & Scripts
etc/config.yaml, scripts/wipe_data.sh, scripts/wipe_data.py
Added commented TrackWCKey hint in etc/config.yaml; added wckey_table wipe options and updated default/flag-driven wipe behavior in shell and Python wipe scripts.
Protobuf & Public defs
protos/Crane.proto, protos/PublicDefs.proto
Added Wckey RPCs/messages (AddWckey, DeleteWckey, ModifyDefaultWckey, QueryWckeyInfo, WckeyInfo, QueryWckeyInfo); added TaskToCtld.wckey and TaskInfo.wckey; appended ErrCode values and ModifyField::DefaultWckey.
Account manager
src/CraneCtld/AccountManager.h, src/CraneCtld/AccountManager.cpp
Introduced Wckey type, in-memory m_wckey_map_ + RW mutex; public APIs: AddUserWckey, DeleteWckey, QueryAllWckeyInfo, ModifyDefaultWckey, GetExistedDefaultWckeyName, GetExistedWckeyInfo; private transactional helpers and default-management.
DB client & storage
src/CraneCtld/DbClient.h, src/CraneCtld/DbClient.cpp
Added EntityType::WCKEY, wckey_table collection name, InsertWckey/UpdateWckey/SelectAllWckey, WckeyToDocument_/ViewToWckey_; extended User/Task document mappings to include default_wckey/wckey; added UpdateEntityOneByFields helper and document mapping specialization.
gRPC service
src/CraneCtld/RpcService/CtldGrpcServer.h, src/CraneCtld/RpcService/CtldGrpcServer.cpp
Added gRPC handlers and declarations for AddWckey, DeleteWckey, ModifyDefaultWckey, QueryWckeyInfo; handlers validate caller and delegate to AccountManager and populate replies.
Task scheduler & public defs
src/CraneCtld/TaskScheduler.cpp, src/CraneCtld/CtldPublicDefs.cpp, src/CraneCtld/CtldPublicDefs.h
Added Ctld::Wckey struct and fields (TaskInCtld.wckey, User.default_wckey, Ctld::Config.WckeyValid); TaskScheduler assigns/validates task wckey, marks default usage with *, and returns ERR_INVALID_WCKEY when invalid; TaskInfo.wckey populated.
Precompiled header & strings
src/CraneCtld/CtldPreCompiledHeader.h, src/Utilities/PublicHeader/include/crane/PublicHeader.h, src/Utilities/PublicHeader/include/crane/String.h
Added #include <absl/hash/hash.h>; appended five wckey-related error strings (indices 85–89) to error catalog; minor whitespace formatting change.
Docs / CLI
docs/en/command/cacctmgr.md, docs/zh/command/cacctmgr.md
Added "Wckey Management" section covering Add/Delete/ModifyDefault/Show commands with syntax and examples.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Ctld as Ctld gRPC
    participant AM as AccountManager
    participant DB as DbClient
    participant Mongo as MongoDB

    Client->>Ctld: AddWckeyRequest(uid, WckeyInfo)
    Ctld->>AM: AddUserWckey(uid, Wckey)
    AM->>DB: InsertWckey(wckey)
    DB->>Mongo: insert into wckey_table
    Mongo-->>DB: ack
    DB-->>AM: success / error
    AM-->>Ctld: result (ok / ErrCode)
    Ctld-->>Client: AddWckeyReply

    Client->>Ctld: QueryWckeyInfoRequest(uid, optional list)
    Ctld->>AM: QueryAllWckeyInfo(uid, filter_set)
    AM->>DB: SelectAllWckey()
    DB->>Mongo: query wckey_table
    Mongo-->>DB: documents
    DB-->>AM: wckey list
    AM-->>Ctld: populated WckeyInfo list
    Ctld-->>Client: QueryWckeyInfoReply
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Focus areas:
    • AccountManager: transactional flows for Add/Update/Delete, default is_default transitions, lock ordering and concurrency around m_wckey_map_ and user.default_wckey.
    • DbClient: BSON filter/update correctness, UpdateEntityOneByFields template, collection routing and document field ordering consistency (task/user wckey slots).
    • Protos & enums: tag numbering and ErrCode append alignment with error strings and generated code.
    • TaskScheduler and ParseConfig: correct gating by WckeyValid and error propagation.
    • gRPC handlers: validation of permissions/cert UID and mapping of internal errors to reply fields.

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • L-Xiafeng
  • NamelessOIer
  • RileyWen

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 6.90% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add job wckey' directly and clearly describes the main change—introducing wckey (wallet key) support for jobs throughout the codebase.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

🧹 Nitpick comments (8)
src/CraneCtld/CraneCtld.cpp (1)

551-555: Consider explicit initialization for consistency.

The logic correctly parses the comma-separated values and sets MustNeedWckey to true when "wckeys" is present. However, consider explicitly setting it to false when "wckeys" is not present to ensure consistent behavior during configuration reloads.

Apply this diff for explicit initialization:

 if (config["AccountingStorageEnforce"]) {
   auto val = config["AccountingStorageEnforce"].as<std::string>();
   std::unordered_set<std::string> items = util::SplitStr(val, ',');
-  if (items.contains("wckeys")) g_config.MustNeedWckey = true;
+  g_config.MustNeedWckey = items.contains("wckeys");
 }
protos/PublicDefs.proto (2)

319-320: Consider using optional for wckey field consistency

Instead of using a separate wckey_valid boolean field, consider making wckey an optional field. This would be more consistent with modern protobuf practices and the pattern used in TaskToCtld (line 168).

-  bool wckey_valid = 41;
-  string wckey = 42;
+  optional string wckey = 41;

553-557: Potential duplication between UserInfo.Wckey and WckeyInfo messages

The UserInfo.Wckey nested message and the top-level WckeyInfo message share most fields. Consider consolidating these to avoid duplication and maintain consistency.

You could either:

  1. Reuse WckeyInfo within UserInfo instead of defining a nested Wckey message
  2. Or extend the nested Wckey message with additional fields when needed

This would reduce code duplication and ensure consistent field definitions across the codebase.

Also applies to: 578-583

protos/Crane.proto (1)

411-414: Clarify wckey_list format in QueryWckeyInfoRequest

The field name wckey_list is ambiguous. Based on other Query operations, this should contain identifiers to filter the query.

Consider renaming for clarity:

 message QueryWckeyInfoRequest {
   uint32 uid = 1;
-  repeated string wckey_list = 2;
+  repeated string wckey_names = 2;  // Filter by wckey names
 }
src/CraneCtld/DbClient.h (1)

232-251: Complex error message construction could be simplified

The error message construction with manual JSON-like formatting is verbose and error-prone.

Consider using a JSON library or simplifying:

-      std::string filter_str = "{";
-      for (size_t i = 0; i < filter_fields.size(); ++i) {
-        filter_str += std::visit(
-            [&](auto&& arg) -> std::string {
-              if constexpr (std::is_same_v<std::decay_t<decltype(arg)>, bool>) {
-                return std::format("{}:{}", filter_fields[i].first,
-                                   arg ? "true" : "false");
-              } else {
-                return std::format("{}:{}", filter_fields[i].first, arg);
-              }
-            },
-            filter_fields[i].second);
-        if (i != filter_fields.size() - 1) filter_str += ", ";
-      }
-      filter_str += "}";
+      std::string filter_str = bsoncxx::to_json(filter.view());
src/CraneCtld/DbClient.cpp (2)

510-518: Consider pagination for SelectAllWckey

The method loads all Wckey documents into memory at once, which could be problematic with large datasets.

Consider adding pagination support similar to FetchJobRecords:

-void MongodbClient::SelectAllWckey(std::list<Ctld::Wckey>* wckey_list) {
+void MongodbClient::SelectAllWckey(std::list<Ctld::Wckey>* wckey_list,
+                                   size_t limit = 0) {
+  mongocxx::options::find options;
+  if (limit > 0) {
+    options = options.limit(limit);
+  }
   mongocxx::cursor cursor =
-      (*GetClient_())[m_db_name_][m_wckey_collection_name_].find({});
+      (*GetClient_())[m_db_name_][m_wckey_collection_name_].find({}, options);

846-848: Consider using uint32_t for uid field consistency

The uid is cast to int64_t, but it's uint32_t in the Wckey struct. While this works, it's inconsistent with the data type.

Consider using int32_t for consistency with other entity types:

-      wckey.cluster,   static_cast<int64_t>(wckey.uid),
+      wckey.cluster,   static_cast<int32_t>(wckey.uid),

And update the corresponding read operation:

-    wckey->uid = wckey_view["uid"].get_int64().value;
+    wckey->uid = wckey_view["uid"].get_int32().value;
src/CraneCtld/AccountManager.cpp (1)

1152-1155: Address TODO for multi-cluster support.

The hardcoded cluster validation will need revision for multi-cluster support. Consider tracking this technical debt.

Would you like me to create an issue to track the multi-cluster enhancement for wckey management?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 980704f and 787f658.

📒 Files selected for processing (16)
  • etc/config.yaml (1 hunks)
  • protos/Crane.proto (5 hunks)
  • protos/PublicDefs.proto (6 hunks)
  • scripts/wipe_data.sh (2 hunks)
  • src/CraneCtld/AccountManager.cpp (11 hunks)
  • src/CraneCtld/AccountManager.h (10 hunks)
  • src/CraneCtld/CraneCtld.cpp (1 hunks)
  • src/CraneCtld/CtldPreCompiledHeader.h (1 hunks)
  • src/CraneCtld/CtldPublicDefs.h (4 hunks)
  • src/CraneCtld/DbClient.cpp (15 hunks)
  • src/CraneCtld/DbClient.h (7 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (4 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.h (4 hunks)
  • src/CraneCtld/TaskScheduler.cpp (1 hunks)
  • src/Utilities/PublicHeader/String.cpp (1 hunks)
  • src/Utilities/PublicHeader/include/crane/String.h (1 hunks)
🧰 Additional context used
🧠 Learnings (8)
src/CraneCtld/CtldPreCompiledHeader.h (3)

Learnt from: L-Xiafeng
PR: #496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In CraneSched, phmap::parallel_flat_hash_set from the Parallel Hashmap library (parallel_hashmap/phmap.h) is used for thread-safe containers. This container implements internal sharding with separate locks for different parts of the hash table, making it inherently thread-safe for concurrent operations without requiring external synchronization.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: The phmap::parallel_flat_hash_set used in CraneSched's VaultClientWrapper is from the Parallel Hashmap library, which provides thread-safe containers for concurrent access. It uses internal partitioning to reduce contention, making it safe for use in multithreaded environments without external synchronization.

src/Utilities/PublicHeader/include/crane/String.h (1)

Learnt from: L-Xiafeng
PR: #496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.

src/CraneCtld/CtldPublicDefs.h (10)

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.

Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Learnt from: 1daidai1
PR: #458
File: src/CraneCtld/CtldPublicDefs.h:0-0
Timestamp: 2025-05-23T02:32:43.952Z
Learning: In the CraneSched project, allocated_res_view is handled/updated separately before calling SetAllocatedRes, so it does not need to be updated again within the method itself.

Learnt from: 1daidai1
PR: #458
File: src/CraneCtld/CtldPublicDefs.h:0-0
Timestamp: 2025-05-23T02:32:43.952Z
Learning: In the CraneSched project, allocated_res_view is updated before calling SetAllocatedRes, so it doesn't need to be updated again within the method itself.

Learnt from: 1daidai1
PR: #458
File: src/CraneCtld/CtldPublicDefs.h:0-0
Timestamp: 2025-05-23T02:32:43.952Z
Learning: In the CraneSched project, allocated_res_view is updated before calling SetAllocatedRes, so it doesn't need to be updated again within the method itself.

Learnt from: huerni
PR: #469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.

Learnt from: huerni
PR: #469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

src/CraneCtld/RpcService/CtldGrpcServer.h (3)

Learnt from: L-Xiafeng
PR: #520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, g_ctld_client is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.

Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.

src/CraneCtld/AccountManager.h (10)

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Learnt from: L-Xiafeng
PR: #496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.

Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: The phmap::parallel_flat_hash_set used in CraneSched's VaultClientWrapper is from the Parallel Hashmap library, which provides thread-safe containers for concurrent access. It uses internal partitioning to reduce contention, making it safe for use in multithreaded environments without external synchronization.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In CraneSched, phmap::parallel_flat_hash_set from the Parallel Hashmap library (parallel_hashmap/phmap.h) is used for thread-safe containers. This container implements internal sharding with separate locks for different parts of the hash table, making it inherently thread-safe for concurrent operations without requiring external synchronization.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.

src/CraneCtld/TaskScheduler.cpp (7)

Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Learnt from: huerni
PR: #469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().

Learnt from: huerni
PR: #469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.

Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Learnt from: L-Xiafeng
PR: #399
File: src/Craned/Supervisor/TaskManager.cpp:396-396
Timestamp: 2025-05-25T04:11:27.740Z
Learning: In TaskManager.cpp, GetCrunMeta() calls don't need null checks because they're only called in contexts where the task is guaranteed to be a CRUN task (e.g., SetupChildProcessCrunX11_ is only called when step->IsCrun() && x11() conditions are met), ensuring the metadata will always be valid.

Learnt from: huerni
PR: #537
File: src/CraneCtld/CtldGrpcServer.cpp:1662-1669
Timestamp: 2025-06-30T08:38:18.748Z
Learning: In CraneSched's CPU frequency feature, validation is split between controller and compute nodes: CtldGrpcServer only validates governor against configuration (g_config.CpuFreqGovernors), while detailed validation of low/high frequency values is deferred to TaskManager.cpp via CpuFreqValidateAndSet() where actual hardware capabilities can be checked.

Learnt from: L-Xiafeng
PR: #520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, g_ctld_client is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.

src/CraneCtld/RpcService/CtldGrpcServer.cpp (3)

Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Learnt from: L-Xiafeng
PR: #520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, g_ctld_client is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.

src/CraneCtld/AccountManager.cpp (8)

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Learnt from: L-Xiafeng
PR: #461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The AtomicHashMap::operator[] implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a ValueExclusivePtr that will be empty if the key is not found.

Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ is an std::unordered_set<std::string> which is not inherently thread-safe. The IsCertAllowed method in VaultClientWrapper involves both read and write operations on this container and could cause race conditions when called from multiple threads.

🧬 Code Graph Analysis (6)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
  • wipe_collection (49-57)
src/Utilities/PublicHeader/include/crane/String.h (1)
src/Utilities/PublicHeader/String.cpp (4)
  • SplitStr (414-427)
  • SplitStr (414-414)
  • ToLower (429-434)
  • ToLower (429-429)
src/CraneCtld/CraneCtld.cpp (2)
src/Utilities/PublicHeader/String.cpp (4)
  • SplitStr (414-427)
  • SplitStr (414-414)
  • ToLower (429-434)
  • ToLower (429-429)
src/Craned/CranedPublicDefs.h (1)
  • g_config (107-107)
src/CraneCtld/CtldPublicDefs.h (2)
src/Craned/CranedPublicDefs.h (1)
  • g_config (107-107)
src/CraneCtld/DbClient.h (2)
  • string (323-348)
  • string (324-332)
src/CraneCtld/RpcService/CtldGrpcServer.h (2)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (8)
  • AddWckey (546-570)
  • AddWckey (546-548)
  • ModifyWckey (796-814)
  • ModifyWckey (796-799)
  • QueryWckeyInfo (987-1020)
  • QueryWckeyInfo (987-990)
  • DeleteWckey (1151-1172)
  • DeleteWckey (1151-1154)
src/CraneCtld/AccountManager.cpp (4)
  • ModifyWckey (1136-1164)
  • ModifyWckey (1136-1139)
  • DeleteWckey (272-300)
  • DeleteWckey (272-275)
src/CraneCtld/DbClient.h (1)
src/CraneCtld/DbClient.cpp (14)
  • InsertWckey (414-426)
  • InsertWckey (414-414)
  • SelectAllWckey (510-518)
  • SelectAllWckey (510-510)
  • GetClient_ (728-734)
  • GetClient_ (728-728)
  • GetSession_ (736-744)
  • GetSession_ (736-736)
  • UpdateWckey (557-575)
  • UpdateWckey (557-557)
  • WckeyToDocument_ (840-850)
  • WckeyToDocument_ (840-841)
  • ViewToWckey_ (852-864)
  • ViewToWckey_ (852-853)
🔇 Additional comments (34)
src/Utilities/PublicHeader/include/crane/String.h (1)

93-96: LGTM! Clean utility function additions.

The function declarations are well-defined and match their implementations. These utilities will support the wckey configuration parsing functionality introduced in this PR.

src/CraneCtld/CtldPreCompiledHeader.h (1)

64-64: LGTM! Necessary include for hash functionality.

The addition of absl/hash/hash.h is properly placed with other Abseil includes and supports the WckeyKey hashing functionality introduced in this PR.

scripts/wipe_data.sh (2)

3-3: LGTM! Usage message correctly updated.

The usage message accurately reflects that modes 4 and 6 now include wiping the wckey_table alongside existing tables.


55-55: LGTM! Consistent wiping logic for wckey_table.

The addition of wckey_table wiping follows the existing pattern and is appropriately included in modes 4, 5, and 6 where user-related data is cleared.

etc/config.yaml (1)

125-128: LGTM! Helpful configuration documentation.

The commented configuration options provide clear examples for users wanting to enable wckey tracking and enforcement. Keeping them commented prevents unintended activation while serving as documentation.

src/Utilities/PublicHeader/String.cpp (2)

414-427: LGTM! Well-implemented string splitting with proper trimming.

The SplitStr function correctly handles:

  • String splitting by delimiter using stringstream
  • Whitespace trimming from both ends
  • Automatic deduplication via unordered_set
  • Edge cases where trimmed strings are empty

429-434: LGTM! Modern and correct lowercase conversion.

The ToLower function uses std::ranges::transform with proper unsigned char casting in the lambda, which is the correct approach for std::tolower to avoid undefined behavior with signed char values.

src/CraneCtld/CraneCtld.cpp (1)

556-567: LGTM! Well-implemented configuration parsing with proper error handling.

The implementation correctly handles case-insensitive parsing of boolean values with clear error messaging. The explicit boolean assignment and consistent error handling align well with the existing codebase patterns.

src/CraneCtld/TaskScheduler.cpp (1)

3112-3129: LGTM! Well-structured wckey validation and assignment logic.

The implementation correctly handles wckey requirements based on configuration:

  1. Proper validation: Validates user-provided wckeys against the account manager database
  2. Fallback mechanism: Assigns default wckey with "*" prefix when none provided
  3. Clean separation: Clears wckey when not required by configuration
  4. Error handling: Returns appropriate error codes (ERR_INVALID_WCKEY) with descriptive logging

The integration with g_account_manager follows the established patterns in the codebase and aligns well with the comprehensive wckey support introduced in this PR.

src/CraneCtld/RpcService/CtldGrpcServer.h (4)

101-103: LGTM - Wckey management method declarations are consistent.

The new AddWckey method declaration follows the established pattern of other CRUD operations in the service interface.


118-120: LGTM - ModifyWckey method declaration follows conventions.

The method signature is consistent with other modify operations in the service.


132-135: LGTM - QueryWckeyInfo method declaration is well-structured.

The method follows the established query pattern used by other info query methods.


154-156: LGTM - DeleteWckey method declaration completes the CRUD interface.

The method signature is consistent with other delete operations in the service.

src/CraneCtld/RpcService/CtldGrpcServer.cpp (5)

546-570: LGTM - AddWckey implementation follows established patterns.

The implementation correctly:

  • Checks server readiness
  • Constructs a Wckey object from the request
  • Calls g_account_manager->AddUserWckey with proper parameters
  • Sets response fields based on the result
  • Returns grpc::Status::OK as required by the learned conventions

796-814: LGTM - ModifyWckey implementation is consistent.

The method properly handles the modify operation by calling g_account_manager->ModifyWckey and setting appropriate response fields based on the result.


987-1020: LGTM - QueryWckeyInfo implementation is well-structured.

The implementation correctly:

  • Calls g_account_manager->QueryAllWckeyInfo
  • Handles error cases by populating rich_error_list
  • Iterates through results to populate wckey_list in the response
  • Sets response status based on error presence

1151-1172: LGTM - DeleteWckey implementation follows conventions.

The method properly calls g_account_manager->DeleteWckey and handles both success and error cases appropriately.


967-980: LGTM - QueryUserInfo enhancement for Wckey support.

The code properly iterates through the user's wckey_map and populates the wckey_list in the response. It correctly checks the default_wckey_map to set the is_def flag for default wckeys.

src/CraneCtld/CtldPublicDefs.h (7)

169-170: LGTM - Configuration flags for Wckey validation and enforcement.

The two boolean flags MustNeedWckey and WckeyValid provide appropriate control over Wckey functionality, with sensible default values of false.


399-399: LGTM - Wckey field addition to TaskInCtld.

The wckey string field is appropriately placed in the task structure and will be used to associate wckeys with tasks.


866-867: LGTM - WckeyMap type alias is well-defined.

The type alias clearly represents the mapping from cluster names to sets of wckey names.


873-873: LGTM - Default wckey mapping addition.

The default_wckey_map field appropriately maps cluster names to default wckey names for each user.


877-877: LGTM - Wckey map integration in User struct.

The wckey_map field of type WckeyMap properly stores user-associated wckeys organized by cluster.


880-892: LGTM - Wckey struct definition is comprehensive.

The struct includes all necessary fields:

  • deleted flag for soft deletion
  • name, cluster, uid, user_name for identification
  • is_def flag for default wckey indication
  • Equality operator for comparisons

The structure follows patterns established by other entity structs in the codebase.


713-718: LGTM - Wckey validation logic in SetFieldsOfTaskInfo.

The code correctly sets the wckey_valid field based on the configuration flags (WckeyValid or MustNeedWckey) and populates the wckey field from the task's wckey member. This provides proper integration between configuration and task information.

src/CraneCtld/AccountManager.h (1)

409-428: Well-implemented composite key structure

The WckeyKey struct with custom hash support is well-designed for the three-field composite key requirement.

src/CraneCtld/DbClient.cpp (2)

374-378: Good defensive programming for wckey field handling

The code properly checks for field existence and type before accessing the wckey value, providing good backward compatibility.


661-671: Template specialization for string maps looks correct

The template specialization properly handles serialization of std::unordered_map<std::string, std::string> to BSON subdocuments.

src/CraneCtld/AccountManager.cpp (6)

287-290: Good defensive check for default wckey deletion.

The check prevents deletion of default wckeys, which is essential for maintaining data consistency.


454-462: Consider aligning wckey query permissions with user query model.

Unlike QueryAllUserInfo which allows non-operators to query users within their account hierarchy, this method completely restricts non-operators. Consider whether wckey visibility should follow account boundaries for consistency.

If the current restriction is intentional, consider adding a comment explaining why wckeys have stricter visibility rules than users.


1771-1771: Proper initialization of wckey data with consistent lock ordering.

The wckey initialization follows the established pattern and maintains consistent lock ordering to prevent deadlocks.

Also applies to: 1791-1796


1970-2053: Well-structured transaction handling for wckey creation.

The implementation correctly handles all edge cases including updating existing deleted entries and managing default wckey transitions.

Consider defining constants for the map key patterns (e.g., "wckey_map." + cluster) to avoid string concatenation errors.


2260-2284: Correct implementation of wckey soft deletion.

The method properly maintains consistency between wckey and user collections using transactions, and follows the established soft-delete pattern.


2404-2458: Efficient default wckey update with proper state management.

The implementation correctly handles the transition of default wckeys, including the optimization to skip unnecessary updates when the wckey is already default.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/CraneCtld/AccountManager.cpp (2)

1152-1155: TODO: Multi-cluster support needs implementation.

The current implementation only supports a single cluster. This limitation should be addressed for multi-cluster deployments.

Would you like me to help implement multi-cluster support or create an issue to track this enhancement?


1970-2046: Robust implementation with proper transaction handling.

The function correctly manages default wckey state and ensures data consistency. Consider extracting the transaction callback into a separate named function for better readability given its complexity.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 787f658 and e13cc26.

📒 Files selected for processing (5)
  • protos/PublicDefs.proto (4 hunks)
  • src/CraneCtld/AccountManager.cpp (11 hunks)
  • src/CraneCtld/CtldPublicDefs.h (4 hunks)
  • src/CraneCtld/DbClient.cpp (15 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • protos/PublicDefs.proto
  • src/CraneCtld/CtldPublicDefs.h
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
🧰 Additional context used
🧠 Learnings (2)
src/CraneCtld/DbClient.cpp (1)

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

src/CraneCtld/AccountManager.cpp (11)

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ is an std::unordered_set<std::string> which is not inherently thread-safe. The IsCertAllowed method in VaultClientWrapper involves both read and write operations on this container and could cause race conditions when called from multiple threads.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.

Learnt from: L-Xiafeng
PR: #399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.

Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Learnt from: L-Xiafeng
PR: #520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.

Learnt from: L-Xiafeng
PR: #461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The AtomicHashMap::operator[] implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a ValueExclusivePtr that will be empty if the key is not found.

🔇 Additional comments (17)
src/CraneCtld/DbClient.cpp (12)

374-378: LGTM! Defensive null check for wckey field.

Good defensive programming by checking if the field exists and has the correct type before accessing it, with a sensible default empty string fallback.


414-426: LGTM! Consistent implementation with other entity insertion methods.

The InsertWckey method follows the established pattern for entity insertion with proper session handling and creation timestamp.


469-471: The missing break statement issue has been resolved.

The switch case for EntityType::WCKEY now correctly includes a break statement at line 471.


510-518: LGTM! Consistent implementation with other entity selection methods.

The SelectAllWckey method follows the established pattern for retrieving all entities of a specific type.


557-575: LGTM! Appropriate use of composite key for Wckey updates.

The use of a composite filter (name + cluster + user_name) is appropriate for uniquely identifying Wckey entities across users and clusters.


661-685: LGTM! Well-structured template specializations for map serialization.

The template specializations correctly handle the serialization of both simple string-to-string maps and the more complex WckeyMap (string to set of strings) structure.


781-791: LGTM! Robust deserialization with proper type checking.

The implementation properly validates field existence and types before deserialization, preventing potential runtime errors.


799-819: LGTM! User document structure properly extended.

The field array and tuple are correctly updated to include the new default_wckey_map field.


821-831: LGTM! Consistent Wckey serialization implementation.

The WckeyToDocument_ method properly serializes all Wckey fields with appropriate type conversions.


833-845: LGTM! Proper Wckey deserialization with error handling.

The ViewToWckey_ method correctly deserializes all fields with appropriate type conversions and exception handling.


1062-1123: LGTM! Task document structure properly extended for embedded DB.

The fields array and tuple are correctly updated to include the wckey field from the task's gRPC message.


1152-1205: LGTM! Task document structure properly extended for in-memory tasks.

The fields array and tuple are correctly updated to include the wckey field from the TaskInCtld object.

src/CraneCtld/AccountManager.cpp (5)

272-300: Well-structured deletion logic with proper validation checks.

The implementation correctly prevents deletion of default wckeys and follows the established locking patterns.


443-465: Clean implementation with proper permission checks.

The function correctly restricts wckey queries to operators and filters out deleted entries appropriately.


2253-2274: Clean soft-delete implementation.

The function correctly implements soft deletion with proper transaction handling and maintains consistency between database and in-memory state.


2394-2448: Efficient implementation with proper state management.

The function correctly handles default wckey transitions and includes an optimization to skip unnecessary updates when the wckey is already default.


1771-1796: Proper initialization of wckey data structures.

The changes correctly integrate wckey loading into the initialization sequence with appropriate locking.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
src/CraneCtld/AccountManager.h (1)

157-159: Rename method to clarify its purpose

The method name ModifyDefaultWckey is misleading as it only sets the default Wckey without general modification capabilities.

-  CraneExpected<void> ModifyDefaultWckey(uint32_t uid, const std::string& name,
-                                  const std::string& cluster,
-                                  const std::string& user_name);
+  CraneExpected<void> SetDefaultWckey(uint32_t uid, const std::string& name,
+                                      const std::string& cluster,
+                                      const std::string& user_name);
src/CraneCtld/DbClient.h (1)

181-254: Fix debug logging level

The method implementation is solid, but the debug logging uses an inappropriate error level.

Remove or change the log level at lines 247-250:

-      CRANE_ERROR(
-          "Update date in database fail(filter:{}, opt:{}, key:{}, value:{}, "
-          "{})",
-          filter_str, opt, key, value, result ? result->modified_count() : -1);
+      CRANE_ERROR(
+          "Update data in database failed (filter:{}, opt:{}, key:{}, value:{}, "
+          "modified_count:{})",
+          filter_str, opt, key, value, result ? result->modified_count() : -1);

Also note the typo "date" should be "data".

src/CraneCtld/AccountManager.cpp (2)

379-392: RAII locking pattern correctly implemented.

The method now uses util::read_lock_guard for proper RAII-style mutex management, which addresses the previous concern about manual lock/unlock operations.


1854-1860: Deletion flag check is properly implemented.

The GetExistedWckeyInfoNoLock_ method correctly checks both existence and the deleted flag before returning the wckey.

🧹 Nitpick comments (2)
src/CraneCtld/AccountManager.h (1)

409-428: Fix naming inconsistency

There's an inconsistency between WckeyKey and WcKeyHash. Use consistent naming throughout.

-  struct WcKeyHash {
+  struct WckeyKeyHash {
     std::size_t operator()(const WckeyKey& key) const {
       return absl::Hash<WckeyKey>{}(key);
     }
   };

Also update the usage in line 438:

-  std::unordered_map<WckeyKey, std::unique_ptr<Wckey>, WcKeyHash> m_wckey_map_;
+  std::unordered_map<WckeyKey, std::unique_ptr<Wckey>, WckeyKeyHash> m_wckey_map_;
src/CraneCtld/AccountManager.cpp (1)

1156-1159: Track the multi-cluster TODO.

The current implementation correctly validates the cluster name, but the TODO comment indicates future work is needed for multi-cluster support.

Would you like me to create an issue to track the multi-cluster support enhancement?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e13cc26 and 9acba49.

📒 Files selected for processing (7)
  • protos/Crane.proto (5 hunks)
  • src/CraneCtld/AccountManager.cpp (11 hunks)
  • src/CraneCtld/AccountManager.h (10 hunks)
  • src/CraneCtld/DbClient.cpp (15 hunks)
  • src/CraneCtld/DbClient.h (7 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (4 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.h (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/CraneCtld/RpcService/CtldGrpcServer.h
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
  • protos/Crane.proto
🧰 Additional context used
🧠 Learnings (4)
src/CraneCtld/AccountManager.cpp (11)

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ is an std::unordered_set<std::string> which is not inherently thread-safe. The IsCertAllowed method in VaultClientWrapper involves both read and write operations on this container and could cause race conditions when called from multiple threads.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.

Learnt from: L-Xiafeng
PR: #399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.

Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Learnt from: L-Xiafeng
PR: #520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.

Learnt from: L-Xiafeng
PR: #461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The AtomicHashMap::operator[] implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a ValueExclusivePtr that will be empty if the key is not found.

src/CraneCtld/DbClient.h (1)

Learnt from: MCKZX-llx
PR: #513
File: src/Utilities/EmbeddedDb/EmbeddedDb.cpp:411-411
Timestamp: 2025-06-08T07:45:15.739Z
Learning: In the CraneSched project's Berkeley DB implementation in src/Utilities/EmbeddedDb/EmbeddedDb.cpp, explicit deletion of DbTxn objects after commit() or abort() is not required, contrary to general Berkeley DB documentation. The project maintainer confirmed that the memory management is handled automatically in their specific implementation.

src/CraneCtld/AccountManager.h (10)

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Learnt from: L-Xiafeng
PR: #496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.

Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: The phmap::parallel_flat_hash_set used in CraneSched's VaultClientWrapper is from the Parallel Hashmap library, which provides thread-safe containers for concurrent access. It uses internal partitioning to reduce contention, making it safe for use in multithreaded environments without external synchronization.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In CraneSched, phmap::parallel_flat_hash_set from the Parallel Hashmap library (parallel_hashmap/phmap.h) is used for thread-safe containers. This container implements internal sharding with separate locks for different parts of the hash table, making it inherently thread-safe for concurrent operations without requiring external synchronization.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.

src/CraneCtld/DbClient.cpp (1)

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

🔇 Additional comments (24)
src/CraneCtld/AccountManager.h (5)

42-42: LGTM!

The WckeyMutexSharedPtr type alias follows the established pattern for other entity types in the class.


54-55: LGTM!

The AddUserWckey method signature is consistent with other entity addition methods.


63-66: LGTM!

The DeleteWckey method correctly uses a composite key (name, cluster, user_name) to identify Wckey entities, which aligns with the WckeyKey struct design.


91-96: LGTM!

The Wckey getter methods correctly use composite keys and follow the established patterns. The GetExistedDefaultWckeyName method appropriately returns the default Wckey name for a given cluster/user combination.


438-439: LGTM!

The m_wckey_map_ member correctly uses a composite key for Wckey storage and follows the established thread-safety pattern with m_rw_wckey_mutex_.

src/CraneCtld/DbClient.h (3)

47-47: LGTM!

The WCKEY entity type is correctly added to the enum.


161-165: LGTM!

The WCKEY case correctly returns false with an appropriate error message, addressing the previous review feedback about silent failures.


68-68: LGTM!

The Wckey-related method declarations and collection name member follow the established patterns for entity management.

Also applies to: 110-110, 257-257, 294-297, 324-324

src/CraneCtld/DbClient.cpp (9)

374-378: LGTM!

The wckey field extraction is implemented defensively with proper type checking and fallback to empty string.


414-426: LGTM!

The InsertWckey implementation follows the established pattern for entity insertion.


470-472: LGTM!

The WCKEY case correctly includes the break statement, properly addressing the previous review feedback about fall-through.


511-519: LGTM!

The SelectAllWckey implementation follows the established pattern for entity selection.


558-576: LGTM!

The UpdateWckey implementation correctly uses the composite key (name, cluster, user_name) for filtering.


662-686: LGTM!

The template specializations for wckey-related map types correctly handle BSON serialization of complex data structures.


782-792: LGTM!

The User schema changes correctly handle the default_wckey_map field with proper backward compatibility checks.

Also applies to: 800-819


822-846: LGTM!

The Wckey serialization/deserialization methods correctly handle all fields with appropriate type conversions.


1063-1063: LGTM!

The task schema changes consistently add the wckey field to both embedded and in-memory task document structures.

Also applies to: 1079-1080, 1090-1090, 1124-1124, 1153-1153, 1169-1169, 1180-1180, 1206-1206

src/CraneCtld/AccountManager.cpp (7)

169-201: Good implementation of double-checked locking pattern.

The re-validation of user existence under write lock (lines 190-193) properly addresses the potential race condition where a user could be deleted between releasing the read lock and acquiring the write lock.


277-305: Proper validation logic for wckey deletion.

The method correctly prevents deletion of default wckeys and follows the established permission checking pattern.


394-406: Manual locking is appropriate for shared pointer pattern.

The manual lock/unlock pattern here is intentional and necessary since the method returns a WckeyMutexSharedPtr that manages the mutex lifecycle. The lock must remain held until the caller releases the shared pointer.


447-469: Consistent permission model and filtering logic.

The method correctly restricts wckey queries to operators and above, and properly filters out deleted entries.


1771-1801: Proper initialization of wckey data structures.

The method correctly loads wckey data from the database and uses an appropriate composite key structure for the wckey map.


1974-2050: Well-implemented atomic wckey addition with proper default handling.

The method correctly manages default wckey transitions and ensures database and in-memory state consistency through transactions.


2398-2452: Correct implementation of default wckey switching.

The method properly handles the atomic transition of default wckey status, ensuring only one wckey is marked as default per user/cluster combination.

@1daidai1 1daidai1 force-pushed the dev/wckey branch 2 times, most recently from fe51039 to f9bfa94 Compare July 31, 2025 03:19
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/CraneCtld/AccountManager.cpp (2)

390-402: Consider using RAII for lock management in the future.

While this method follows the existing pattern used by other GetExisted*Info methods, the manual lock/unlock approach could be improved with RAII. Consider refactoring all these methods together in a future update to use std::shared_lock or similar RAII wrappers.


1136-1164: TODO: Multi-cluster support needs implementation.

The current implementation only supports a single cluster as indicated by the TODO comment. The hardcoded cluster validation will need to be updated when multi-cluster support is added.

Would you like me to create an issue to track the multi-cluster support implementation?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fe51039 and f9bfa94.

📒 Files selected for processing (18)
  • etc/config.yaml (1 hunks)
  • protos/Crane.proto (5 hunks)
  • protos/PublicDefs.proto (4 hunks)
  • scripts/wipe_data.sh (2 hunks)
  • src/CraneCtld/AccountManager.cpp (11 hunks)
  • src/CraneCtld/AccountManager.h (10 hunks)
  • src/CraneCtld/CraneCtld.cpp (1 hunks)
  • src/CraneCtld/CtldPreCompiledHeader.h (1 hunks)
  • src/CraneCtld/CtldPublicDefs.cpp (1 hunks)
  • src/CraneCtld/CtldPublicDefs.h (3 hunks)
  • src/CraneCtld/DbClient.cpp (16 hunks)
  • src/CraneCtld/DbClient.h (7 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (4 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.h (4 hunks)
  • src/CraneCtld/TaskScheduler.cpp (1 hunks)
  • src/Utilities/PublicHeader/String.cpp (1 hunks)
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h (1 hunks)
  • src/Utilities/PublicHeader/include/crane/String.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (15)
  • src/CraneCtld/CtldPreCompiledHeader.h
  • etc/config.yaml
  • scripts/wipe_data.sh
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h
  • src/Utilities/PublicHeader/String.cpp
  • src/CraneCtld/CtldPublicDefs.cpp
  • src/CraneCtld/TaskScheduler.cpp
  • src/Utilities/PublicHeader/include/crane/String.h
  • src/CraneCtld/CtldPublicDefs.h
  • src/CraneCtld/CraneCtld.cpp
  • protos/PublicDefs.proto
  • protos/Crane.proto
  • src/CraneCtld/RpcService/CtldGrpcServer.h
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
  • src/CraneCtld/AccountManager.h
🧰 Additional context used
🧠 Learnings (2)
src/CraneCtld/AccountManager.cpp (13)

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ is an std::unordered_set<std::string> which is not inherently thread-safe. The IsCertAllowed method in VaultClientWrapper involves both read and write operations on this container and could cause race conditions when called from multiple threads.

Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.

Learnt from: L-Xiafeng
PR: #461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.

Learnt from: L-Xiafeng
PR: #520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.

Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.

Learnt from: L-Xiafeng
PR: #399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.

Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Learnt from: L-Xiafeng
PR: #461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The AtomicHashMap::operator[] implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a ValueExclusivePtr that will be empty if the key is not found.

src/CraneCtld/DbClient.h (1)

Learnt from: MCKZX-llx
PR: #513
File: src/Utilities/EmbeddedDb/EmbeddedDb.cpp:411-411
Timestamp: 2025-06-08T07:45:15.739Z
Learning: In the CraneSched project's Berkeley DB implementation in src/Utilities/EmbeddedDb/EmbeddedDb.cpp, explicit deletion of DbTxn objects after commit() or abort() is not required, contrary to general Berkeley DB documentation. The project maintainer confirmed that the memory management is handled automatically in their specific implementation.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: format
🔇 Additional comments (25)
src/CraneCtld/DbClient.h (5)

47-47: LGTM!

The WCKEY entity type addition follows the existing enum pattern correctly.


68-68: LGTM!

The Wckey method declarations follow the established patterns for other entity types and maintain consistency in the interface design.

Also applies to: 110-110, 250-250


161-166: Proper error handling for WCKEY updates.

Good improvement! The explicit error message clearly directs users to the appropriate method for WCKEY updates, addressing the previous concern about silent failures.


181-247: Well-designed multi-field update method.

The UpdateEntityOneByFields template method provides a flexible solution for complex entity updates. The use of std::variant for FilterValue and comprehensive error logging with proper boolean formatting demonstrates good design practices.


287-290: LGTM!

The private helper method declarations and collection name constant follow the established patterns and naming conventions for other entity types.

Also applies to: 317-317

src/CraneCtld/DbClient.cpp (11)

375-379: LGTM!

The wckey field extraction includes proper BSON type checking and safe fallback handling, following the defensive programming patterns used elsewhere in the method.


415-427: LGTM!

The InsertWckey method follows the established pattern for insert operations, including proper serialization and timestamp handling.


471-473: LGTM!

The WCKEY case is properly implemented with the break statement, addressing the previous review feedback about missing break statements.


512-520: LGTM!

The SelectAllWckey method follows the established pattern for select-all operations with proper deserialization handling.


559-577: LGTM!

The UpdateWckey method correctly uses a multi-field filter (name, cluster, user_name) which explains why WCKEY entities require the specialized UpdateEntityOneByFields method rather than the single-name-based UpdateEntityOne.


663-673: LGTM!

The template specialization for std::unordered_map<std::string, std::string> correctly handles BSON serialization for the User entity's default_wckey_map field.


769-779: LGTM!

The default_wckey_map field parsing includes proper BSON type checking and safe access patterns. Clearing the map before population ensures clean state.


787-807: LGTM!

The UserToDocument_ method correctly incorporates the default_wckey_map field with proper field count updates and tuple type modifications.


809-819: LGTM!

The WckeyToDocument_ method follows the established serialization pattern and includes all necessary Wckey fields with appropriate type casting for BSON compatibility.


821-833: LGTM!

The ViewToWckey_ method correctly deserializes BSON documents with proper exception handling and type conversions for all Wckey fields.


1048-1048: LGTM!

Both task serialization methods are consistently updated to include the wckey field with proper field count increments and tuple type modifications. The wckey sources are appropriate for each context.

Also applies to: 1066-1066, 1077-1077, 1111-1111, 1137-1137, 1156-1156, 1167-1167, 1193-1193

src/CraneCtld/AccountManager.cpp (9)

169-197: Good job addressing the race condition!

The implementation correctly re-validates user existence under write lock after releasing the read lock, preventing use-after-free issues.


273-301: Proper validation to prevent default wckey deletion.

Good implementation that prevents deletion of default wckeys, maintaining system consistency.


375-388: Excellent use of RAII for lock management.

The method correctly uses util::read_lock_guard which ensures automatic unlocking even on early returns.


443-465: Correct permission enforcement for wckey queries.

The implementation properly restricts wckey queries to operators and above, maintaining security boundaries.


1771-1771: Correct initialization of wckey data structures.

The wckey loading follows the established pattern and properly uses composite keys for unique identification.

Also applies to: 1791-1796


1836-1856: Correct implementation of wckey lookup methods.

The methods properly handle the deleted flag check and follow the established pattern for entity lookups.


1970-2046: Well-implemented default wckey management.

The method correctly handles the complex logic of updating default wckeys, ensuring both database and in-memory state remain consistent. The transactional approach guarantees atomicity of all related updates.


2253-2274: Correct implementation of wckey soft deletion.

The method properly uses filter fields for composite key lookup and maintains consistency between database and in-memory state.


2394-2448: Excellent implementation of default wckey updates.

The method comprehensively handles all aspects of changing default wckeys, including updating both old and new default wckeys in both tables, with proper transaction support.

@Nativu5 Nativu5 linked an issue Aug 4, 2025 that may be closed by this pull request
@1daidai1 1daidai1 force-pushed the dev/wckey branch 2 times, most recently from 45bdcc2 to d9484ea Compare August 4, 2025 08:54
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
src/CraneCtld/AccountManager.cpp (3)

390-403: Consider using RAII for mutex management.

The manual lock_shared()/unlock_shared() pattern could lead to issues if exceptions are thrown. Consider refactoring to use RAII or document why manual locking is necessary here.

If the current design is intentional for returning a custom shared pointer wrapper, consider adding a comment explaining the design choice. Otherwise, consider using RAII-based locking similar to GetExistedDefaultWckeyName.


1136-1165: TODO indicates incomplete multi-cluster support.

The function correctly validates inputs and permissions. However, the TODO comment suggests multi-cluster support needs implementation.

Would you like me to create an issue to track the multi-cluster support implementation for Wckey functionality?


2394-2449: Robust default Wckey switching logic.

The function correctly handles all cases of default Wckey updates with proper transactional consistency.

Consider adding an early return at line 2402 to avoid unnecessary operations when the Wckey is already the default:

 if (m_wckey_map_[new_wckey_key] && m_wckey_map_[new_wckey_key]->is_def) {
+  CRANE_DEBUG("Wckey {} is already default for user {} in cluster {}", 
+              new_def_wckey, user_name, cluster);
   return {};
 }
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 45bdcc2 and d9484ea.

📒 Files selected for processing (3)
  • protos/PublicDefs.proto (4 hunks)
  • src/CraneCtld/AccountManager.cpp (11 hunks)
  • src/CraneCtld/CraneCtld.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • protos/PublicDefs.proto
  • src/CraneCtld/CraneCtld.cpp
🧰 Additional context used
🧠 Learnings (13)
📚 Learning: in cranesched's accountmetacontainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe...
Learnt from: huerni
PR: PKUHPC/CraneSched#499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: in the cranesched project, the `m_allowed_certs_` is an `std::unordered_set` which is n...
Learnt from: RileyWen
PR: PKUHPC/CraneSched#456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: in cranectld/cranedkeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guarant...
Learnt from: RileyWen
PR: PKUHPC/CraneSched#475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: the userhastask() method in accountmetacontainer does not attempt to acquire any locks, so there is ...
Learnt from: huerni
PR: PKUHPC/CraneSched#528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: in the cgroupmanager's getjobbpfmapcgroupsv2 method, the developer has confirmed that cg_ino_job_id_...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: in src/cranectld/cranedkeeper.h, the m_shutting_down_ member in cranedstub class is declared as std:...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: in the cranesched project, the `m_allowed_certs_` in `vaultclientwrapper` is implemented as a `phmap...
Learnt from: RileyWen
PR: PKUHPC/CraneSched#456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: in the cranectld codebase, the variables m_disconnected_ and m_registered_ in cranedstub class are a...
Learnt from: RileyWen
PR: PKUHPC/CraneSched#475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: the cranesched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps,...
Learnt from: huerni
PR: PKUHPC/CraneSched#499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: the configurecraned function in src/cranectld/cranedkeeper.cpp is called from a thread pool, so ther...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: in taskmanager.cpp, the security model relies on administrator-configured command templates in parse...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: in the cranesched codebase, src/craned/craned.cpp guarantees that g_config.cranedres.contains(g_conf...
Learnt from: huerni
PR: PKUHPC/CraneSched#537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: the `atomichashmap::operator[]` implementation in the cranesched project does not create elements wh...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The `AtomicHashMap::operator[]` implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a `ValueExclusivePtr` that will be empty if the key is not found.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
🔇 Additional comments (8)
src/CraneCtld/AccountManager.cpp (8)

169-198: LGTM! Proper thread-safe implementation.

The function correctly validates user existence under write lock after permission checks, preventing race conditions. The handling of stale (deleted) Wckey entries is appropriate.


273-302: Good validation logic for default Wckey protection.

The function appropriately prevents deletion of default Wckeys, ensuring data consistency. The error code ERR_DEL_DEFAULT_WCKEY clearly indicates why the deletion failed.


375-389: Clean implementation with proper RAII.

The function correctly uses util::read_lock_guard for automatic lock management and provides clear error codes for different failure scenarios.


443-466: Appropriate permission model for Wckey queries.

The function correctly restricts Wckey queries to operators and above, maintaining consistency with the security model. The filtering of deleted entries is properly implemented.


1767-1797: Proper initialization of Wckey data structures.

The function correctly loads Wckey data alongside other entities, using an appropriate composite key structure for the map.


1836-1857: Consistent implementation of Wckey lookup methods.

The helper methods follow the established pattern of having separate functions for retrieving all entries vs. only non-deleted entries.


1970-2047: Well-implemented transactional Wckey addition.

The function correctly handles the complexity of default Wckey management within a transaction, ensuring atomicity when switching defaults. The update of both Wckey and User documents maintains data consistency.


2253-2275: Consistent soft-delete implementation.

The function properly implements soft deletion using the established pattern, with appropriate use of FilterFields for composite key operations.

@RileyWen RileyWen marked this pull request as draft August 7, 2025 09:55
@1daidai1 1daidai1 marked this pull request as ready for review November 11, 2025 08:29
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d9484ea and c59afaf.

📒 Files selected for processing (4)
  • protos/PublicDefs.proto (4 hunks)
  • src/CraneCtld/AccountManager.cpp (11 hunks)
  • src/CraneCtld/CraneCtld.cpp (1 hunks)
  • src/CraneCtld/CtldPublicDefs.h (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/CraneCtld/CraneCtld.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-30T02:09:41.625Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
🧬 Code graph analysis (2)
src/CraneCtld/CtldPublicDefs.h (1)
src/CraneCtld/DbClient.h (2)
  • string (316-341)
  • string (317-325)
src/CraneCtld/AccountManager.cpp (2)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
  • DeleteWckey (1177-1198)
  • DeleteWckey (1177-1180)
  • ModifyDefaultWckey (836-854)
  • ModifyDefaultWckey (836-839)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (135-135)
🔇 Additional comments (18)
protos/PublicDefs.proto (4)

171-171: LGTM!

The optional wckey field is correctly added to TaskToCtld with appropriate field numbering and type.


329-330: LGTM!

The wckey fields are appropriately added to TaskInfo with sequential field numbering.


494-498: LGTM!

The new error codes comprehensively cover wckey-related error scenarios with sequential numbering.


582-587: LGTM!

The WckeyInfo message structure appropriately represents wckey data for gRPC communication.

src/CraneCtld/CtldPublicDefs.h (4)

169-170: LGTM!

The configuration flags for wckey support are appropriately added with safe defaults maintaining backward compatibility.


388-388: LGTM!

The wckey field is appropriately added to TaskInCtld in the submission-time fields section.


611-611: LGTM!

The default_wckey_map appropriately supports per-cluster default wckey configuration for users.


617-629: LGTM!

The Wckey struct is well-designed with appropriate fields and follows the established pattern for entity structs in the codebase.

src/CraneCtld/AccountManager.cpp (10)

169-197: LGTM!

The AddUserWckey method correctly addresses the race condition from previous reviews by re-validating user existence under write lock.


273-301: LGTM!

The DeleteWckey method correctly prevents deletion of the default wckey and follows proper locking patterns.


375-388: LGTM!

The GetExistedDefaultWckeyName method correctly uses RAII locking, addressing the previous review concern.


390-402: LGTM!

The GetExistedWckeyInfo method follows the established manual locking pattern consistent with other similar methods in this file (GetExistedUserInfo, GetExistedAccountInfo, etc.).


443-465: LGTM!

The QueryAllWckeyInfo method follows the established pattern for query methods with appropriate privilege checks.


1136-1164: LGTM!

The ModifyDefaultWckey method includes proper validation and locking. Note the TODO comment about multi-cluster support being a current limitation.


1767-1797: LGTM!

The InitDataMap_ method is correctly updated to load wckey data from the database into the in-memory map.


1836-1856: LGTM!

The wckey lookup methods correctly implement the deletion flag check, addressing the previous review concern.


1970-2046: LGTM!

The AddWckey_ method correctly manages wckey creation and default wckey handling. Note that newly added wckeys always become the default for their cluster (line 1980), requiring ModifyDefaultWckey to change to a different default later.


2253-2274: LGTM!

The DeleteWckey_ method correctly implements soft deletion consistent with other entity deletion methods in the codebase.

Copy link
Collaborator

@Nativu5 Nativu5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase 一下吧。

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
protos/Crane.proto (2)

417-427: Missing modification fields in ModifyDefaultWckeyRequest.

The request only contains identifier fields (uid, name, cluster, user_name) but lacks explicit fields indicating what modification to perform. While the RPC name suggests setting a default wckey, making this explicit in the message (e.g., bool set_as_default = 5;) would improve clarity and allow future extensibility.

Based on past review comments.


363-373: Inconsistent deletion API compared to other entity types.

The DeleteWckeyRequest accepts individual fields (name, cluster, user_name) while DeleteAccountRequest, DeleteUserRequest, and DeleteQosRequest accept repeated lists, allowing batch operations. This inconsistency makes the API harder to use and less efficient for bulk deletions.

Based on past review comments.

src/CraneCtld/AccountManager.cpp (1)

2212-2283: Use of operator[] on m_wckey_map_ with std::unique_ptr risks null dereference

Both AddWckey_ and SetUserDefaultWckey_ rely on operator[] on m_wckey_map_ and then immediately dereference the stored std::unique_ptr<Wckey>:

  • In AddWckey_:
    • m_wckey_map_[{old_def_wckey, res_wckey.cluster, res_wckey.user_name}]->is_def = false;
  • In SetUserDefaultWckey_:
    • if (m_wckey_map_[new_wckey_key] && m_wckey_map_[new_wckey_key]->is_def) ...
    • m_wckey_map_[new_wckey_key]->is_def = true;
    • m_wckey_map_[{old_def_wckey, cluster, user_name}]->is_def = false;

operator[] will insert a new entry with a default‑constructed (null) unique_ptr when the key does not exist, and subsequent ->is_def dereferences will crash. Even if invariants should guarantee presence, the pattern is fragile and was previously flagged.

Recommend switching to find-based access and checking both iterator and unique_ptr before dereferencing, e.g.:

@@ SetUserDefaultWckey_
-  WckeyKey new_wckey_key = {.name = new_def_wckey,
-                            .cluster = cluster,
-                            .user_name = user_name};
-  if (m_wckey_map_[new_wckey_key] && m_wckey_map_[new_wckey_key]->is_def) {
-    return {};
-  }
+  WckeyKey new_wckey_key{.name = new_def_wckey,
+                         .cluster = cluster,
+                         .user_name = user_name};
+  auto it_new = m_wckey_map_.find(new_wckey_key);
+  if (it_new != m_wckey_map_.end() && it_new->second &&
+      it_new->second->is_def) {
+    return {};
+  }
@@
-  m_wckey_map_[new_wckey_key]->is_def = true;
+  it_new = m_wckey_map_.find(new_wckey_key);
+  if (it_new != m_wckey_map_.end() && it_new->second) {
+    it_new->second->is_def = true;
+  }
   m_user_map_[user_name]->default_wckey_map[cluster] = new_def_wckey;
   if (need_update_table) {
-    m_wckey_map_[{old_def_wckey, cluster, user_name}]->is_def = false;
+    WckeyKey old_key{.name = old_def_wckey,
+                     .cluster = cluster,
+                     .user_name = user_name};
+    auto it_old = m_wckey_map_.find(old_key);
+    if (it_old != m_wckey_map_.end() && it_old->second) {
+      it_old->second->is_def = false;
+    }
   }

Similarly in AddWckey_:

-  if (update_wckey)
-    m_wckey_map_[{old_def_wckey, res_wckey.cluster, res_wckey.user_name}]
-        ->is_def = false;
+  if (update_wckey) {
+    WckeyKey old_key{.name = old_def_wckey,
+                     .cluster = res_wckey.cluster,
+                     .user_name = res_wckey.user_name};
+    auto it_old = m_wckey_map_.find(old_key);
+    if (it_old != m_wckey_map_.end() && it_old->second) {
+      it_old->second->is_def = false;
+    }
+  }

This removes the possibility of accidentally inserting null entries and then dereferencing them.

Also applies to: 2579-2633

🧹 Nitpick comments (6)
src/CraneCtld/CraneCtld.cpp (1)

644-664: Consider case-insensitive matching for AccountingStorageEnforce.

Line 651 checks for an exact match of "wckeys". Users might reasonably enter "wckey" (singular) or mixed-case variants, which would silently fail to enable wckey enforcement. Consider normalizing to lowercase (already done for TrackWCKey on line 655) and accepting both singular and plural forms.

Apply this diff to improve robustness:

       std::unordered_set<std::string> items;
       for (absl::string_view item : absl::StrSplit(item_list, ',')) {
         item = absl::StripAsciiWhitespace(item);
-        if (!item.empty()) items.emplace(item);
+        if (!item.empty()) items.emplace(absl::AsciiStrToLower(item));
       }
-      if (items.contains("wckeys")) g_config.MustNeedWckey = true;
+      if (items.contains("wckey") || items.contains("wckeys")) 
+        g_config.MustNeedWckey = true;
     }
src/CraneCtld/DbClient.h (1)

244-310: Consider refactoring verbose error logging.

Lines 287-306 manually construct a filter string for error logging. While functional, this is verbose and duplicates formatting logic. Consider extracting this to a helper function if similar logging is needed elsewhere, or simplifying to log the BSON filter directly.

Example simplification:

     if (!result || !result->modified_count()) {
-      std::string filter_str = "{";
-      for (size_t i = 0; i < filter_fields.size(); ++i) {
-        filter_str += std::visit(
-            [&](auto&& arg) -> std::string {
-              if constexpr (std::is_same_v<std::decay_t<decltype(arg)>, bool>) {
-                return std::format("{}:{}", filter_fields[i].first,
-                                   arg ? "true" : "false");
-              } else {
-                return std::format("{}:{}", filter_fields[i].first, arg);
-              }
-            },
-            filter_fields[i].second);
-        if (i != filter_fields.size() - 1) filter_str += ", ";
-      }
-      filter_str += "}";
-
       CRANE_ERROR(
-          "Update date in database fail(filter:{}, opt:{}, key:{}, value:{}, "
-          "{})",
-          filter_str, opt, key, value, result ? result->modified_count() : -1);
+          "Update in database failed: opt={}, key={}, value={}, modified={}",
+          opt, key, value, result ? result->modified_count() : -1);
       return false;
src/CraneCtld/AccountManager.cpp (1)

389-402: Minor behavior considerations for default Wckey lookup and QueryAllWckeyInfo

  • GetExistedDefaultWckeyName and GetExistedWckeyInfo are straightforward and correctly respect deletion flags; thread‑safety via read locks and WckeyMutexSharedPtr is fine.
  • QueryAllWckeyInfo:
    • Only operators/admins can list Wckeys; non‑privileged users always get ERR_PERMISSION_USER. This is stricter than QueryAllUserInfo, which returns a scoped subset. If normal users are expected to introspect their own Wckeys, you might want to relax this.
    • When wckey_list is non‑empty, filtering is by wckey->name only. If name collisions across clusters/users are possible, callers may get more entries than intended.

If those semantics are deliberate, no changes are needed; otherwise consider:

  • Allowing non‑admin users to query only their own Wckeys.
  • Extending the filter API to accept (name, cluster, user) triples, or documenting that Wckey names are globally unique.

Also applies to: 404-416, 457-487

src/CraneCtld/DbClient.cpp (2)

370-378: FetchJobRecords: wckey extraction is robust to missing/legacy documents

The new wckey handling:

  • Safely checks both existence and k_utf8 type before reading.
  • Falls back to empty string when absent or non‑UTF‑8, which is compatible with older job documents that don’t carry wckey.

If you like, you could avoid two lookups by binding the element once (auto ele = view["wckey"];), but behavior is already correct.

Also applies to: 460-464


620-633: Wckey CRUD integration is correct; consider matching exception‑handling style in SelectAllWckey

  • InsertWckey / UpdateWckey use the same WckeyToDocument_ schema as ViewToWckey_, and filter on (name, cluster, user_name), which matches how AccountManager looks up Wckeys.
  • These methods are only invoked inside CommitTransaction callbacks in the current code, so letting Mongo exceptions bubble to with_transaction is acceptable.
  • SelectAllWckey mirrors other SelectAll* methods but lacks a try/catch around the find/cursor loop; SelectAllUser, SelectAllAccount, and SelectAllQos already guard those calls.

To keep behavior consistent and avoid an uncaught exception during initialization, consider wrapping SelectAllWckey in a try/catch like the others and logging errors via CRANE_LOGGER_ERROR.

Also applies to: 732-740, 791-809

src/CraneCtld/AccountManager.h (1)

43-44: Wckey public APIs and mutex alias are coherent and match implementation

  • WckeyMutexSharedPtr mirrors the existing UserMutexSharedPtr/AccountMutexSharedPtr pattern, enabling safe shared‑lock access.
  • Public methods AddUserWckey, DeleteWckey, QueryAllWckeyInfo, GetExistedWckeyInfo, GetExistedDefaultWckeyName, and ModifyDefaultWckey align with the corresponding definitions in AccountManager.cpp (same parameter sets and semantics).
  • Grouping ModifyDefaultWckey with other “Modify*” methods keeps the API surface consistent.

Only minor nit: if QueryAllWckeyInfo is expected to accept large filter sets, you might eventually switch wckey_list to const std::unordered_set<std::string>& to avoid copies, but current signature is acceptable.

Also applies to: 57-58, 66-74, 95-100, 161-163

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c59afaf and dde33bc.

📒 Files selected for processing (15)
  • etc/config.yaml (1 hunks)
  • protos/Crane.proto (5 hunks)
  • protos/PublicDefs.proto (4 hunks)
  • scripts/wipe_data.sh (2 hunks)
  • src/CraneCtld/AccountManager.cpp (11 hunks)
  • src/CraneCtld/AccountManager.h (10 hunks)
  • src/CraneCtld/CraneCtld.cpp (1 hunks)
  • src/CraneCtld/CtldPreCompiledHeader.h (1 hunks)
  • src/CraneCtld/CtldPublicDefs.cpp (1 hunks)
  • src/CraneCtld/CtldPublicDefs.h (4 hunks)
  • src/CraneCtld/DbClient.cpp (16 hunks)
  • src/CraneCtld/DbClient.h (7 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (4 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.h (4 hunks)
  • src/CraneCtld/TaskScheduler.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • src/CraneCtld/CtldPublicDefs.cpp
  • src/CraneCtld/RpcService/CtldGrpcServer.h
  • src/CraneCtld/TaskScheduler.cpp
  • protos/PublicDefs.proto
  • etc/config.yaml
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Applied to files:

  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.

Applied to files:

  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
  • src/CraneCtld/AccountManager.h
📚 Learning: 2025-05-30T02:09:41.625Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In CraneSched, `phmap::parallel_flat_hash_set` from the Parallel Hashmap library (`parallel_hashmap/phmap.h`) is used for thread-safe containers. This container implements internal sharding with separate locks for different parts of the hash table, making it inherently thread-safe for concurrent operations without requiring external synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.h
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: The `phmap::parallel_flat_hash_set` used in CraneSched's `VaultClientWrapper` is from the Parallel Hashmap library, which provides thread-safe containers for concurrent access. It uses internal partitioning to reduce contention, making it safe for use in multithreaded environments without external synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.h
🧬 Code graph analysis (6)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
  • wipe_collection (51-59)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
src/CraneCtld/AccountManager.cpp (4)
  • ModifyDefaultWckey (1181-1209)
  • ModifyDefaultWckey (1181-1183)
  • DeleteWckey (287-315)
  • DeleteWckey (287-290)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
  • string (70-72)
  • string (384-426)
  • string (385-394)
src/CraneCtld/AccountManager.cpp (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
  • DeleteWckey (1612-1633)
  • DeleteWckey (1612-1615)
  • ModifyDefaultWckey (1235-1253)
  • ModifyDefaultWckey (1235-1238)
src/CraneCtld/CraneCtld.cpp (1)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (160-160)
src/CraneCtld/AccountManager.h (1)
src/CraneCtld/AccountManager.cpp (30)
  • AddUserWckey (179-207)
  • AddUserWckey (179-180)
  • DeleteUser (209-230)
  • DeleteUser (209-211)
  • DeleteAccount (232-259)
  • DeleteAccount (232-233)
  • DeleteQos (261-285)
  • DeleteQos (261-262)
  • DeleteWckey (287-315)
  • DeleteWckey (287-290)
  • QueryAllUserInfo (418-455)
  • QueryAllUserInfo (418-419)
  • QueryAllWckeyInfo (457-487)
  • QueryAllWckeyInfo (457-458)
  • GetExistedWckeyInfo (404-416)
  • GetExistedWckeyInfo (404-406)
  • GetExistedDefaultWckeyName (389-402)
  • GetExistedDefaultWckeyName (389-390)
  • ModifyDefaultWckey (1181-1209)
  • ModifyDefaultWckey (1181-1183)
  • GetExistedWckeyInfoNoLock_ (1994-2000)
  • GetExistedWckeyInfoNoLock_ (1994-1996)
  • GetWckeyInfoNoLock_ (1983-1989)
  • GetWckeyInfoNoLock_ (1983-1985)
  • AddWckey_ (2212-2288)
  • AddWckey_ (2212-2214)
  • DeleteWckey_ (2425-2446)
  • DeleteWckey_ (2425-2427)
  • SetUserDefaultWckey_ (2579-2633)
  • SetUserDefaultWckey_ (2579-2581)
🔇 Additional comments (13)
scripts/wipe_data.sh (2)

3-3: ✓ Usage message accurately updated.

The help text correctly reflects the expanded mode definitions, with wckey_table now included in modes 4 and 6.


53-56: ✓ wckey_table wiping correctly implemented.

The wckey_table wipe is properly added alongside user_table, and the conditional logic (mode 4 || 5 || 6) ensures it is wiped:

  • Mode 4: user_table + wckey_table
  • Mode 5: all tables including wckey_table
  • Mode 6: acct_table + qos_table + user_table + wckey_table

The placement with user_table is logical, and MongoDB's deleteMany({}) safely handles non-existent collections.

src/CraneCtld/CtldPreCompiledHeader.h (1)

64-64: LGTM!

The addition of absl/hash/hash.h appropriately enables Abseil hashing utilities for the new Wckey-related data structures introduced in this PR.

src/CraneCtld/CtldPublicDefs.h (1)

199-200: LGTM!

The Wckey-related additions to Config, TaskInCtld, User, and the new Wckey struct are well-structured and follow existing patterns in the codebase. The equality operator in Wckey correctly compares all relevant fields.

Also applies to: 710-710, 1019-1020, 1075-1087

src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)

1415-1447: LGTM!

The QueryWckeyInfo RPC correctly includes the authentication check and follows the established pattern of other query methods. The implementation properly delegates to AccountManager and populates the response fields.

src/CraneCtld/DbClient.h (2)

224-228: LGTM!

The explicit error handling for WCKEY in UpdateEntityOne addresses the past review comment correctly. This prevents silent failures and directs developers to use UpdateEntityOneByFields for wckey updates.

Based on past review comments.


104-104: LGTM!

The Wckey-related additions to DbClient follow the established patterns for Account, User, and Qos entities. The EntityType enum extension, new public methods, and private helpers are well-structured and consistent with the existing codebase.

Also applies to: 131-131, 173-173, 313-313, 351-354, 385-385

src/CraneCtld/AccountManager.cpp (3)

179-207: AddUserWckey / ModifyDefaultWckey locking and privilege checks look consistent

  • Privilege checks (CheckIfUserHasHigherPrivThan_(..., User::None)) mirror AddQos/AddAccount patterns.
  • Re-validating the target user under m_rw_user_mutex_ write lock in AddUserWckey closes the earlier race on user_exist.
  • Write-lock acquisition order m_rw_wckey_mutex_m_rw_user_mutex_ is consistent across AddUserWckey, DeleteWckey, and ModifyDefaultWckey, avoiding deadlock with other Wckey paths.

No functional or concurrency issues stand out here.

Also applies to: 1181-1209


287-315: Wckey delete flow correctly guards default keys and maintains in‑memory/DB consistency

  • DeleteWckey refuses to delete a Wckey if it is currently the user’s default for that cluster (ERR_DEL_DEFAULT_WCKEY), which matches the intended invariant.
  • User existence and wckey existence are revalidated under write locks before calling DeleteWckey_.
  • DeleteWckey_ marks the Wckey as deleted both in MongoDB (via deleted: true) and in m_wckey_map_, without touching default_wckey_map (safe because default deletions are already blocked).

This path looks correct and thread‑safe.

Also applies to: 2425-2446


1915-1940: InitDataMap_ and Wckey getters integrate cleanly with the new in‑memory map

  • InitDataMap_ now acquires m_rw_wckey_mutex_ alongside other write locks and bulk‑loads all Wckeys into m_wckey_map_ keyed by (name, cluster, user_name), matching the lookup key.
  • GetWckeyInfoNoLock_ / GetExistedWckeyInfoNoLock_ use find + deleted flag consistently, aligned with how users/accounts/QoS are handled.

This initialization and lookup path looks correct and consistent with the rest of AccountManager.

Also applies to: 1980-2000

src/CraneCtld/DbClient.cpp (2)

1258-1268: User.default_wckey_map BSON mapping is symmetric and backward‑compatible

  • DocumentAppendItem_<std::unordered_map<std::string,std::string>> serializes the map as a nested document { key: value, ... }, which is simple and efficient.
  • UserToDocument_ adds default_wckey_map as a top‑level field, and ViewToUser_:
    • Clears the in‑memory map first.
    • Checks that the BSON element exists and is a document.
    • Populates only k_utf8 entries, which avoids crashes on malformed documents.
  • For legacy user documents lacking default_wckey_map, the code leaves the field empty, so old data remains valid.

This end‑to‑end mapping looks solid.

Also applies to: 1274-1298, 962-972


1300-1325: Wckey entity and task document schema are consistent across DB and RPC

  • WckeyToDocument_ / ViewToWckey_ round‑trip all essential fields: deleted, name, cluster, uid, user_name, is_def, matching the Ctld::Wckey struct used by AccountManager.
  • TaskInEmbeddedDbToDocument_ and TaskInCtldToDocument_:
    • Extend the fields array to include "wckey" as the final field.
    • Add the corresponding tuple element (task_to_ctld.wckey() / task->wckey) with type std::string.
    • Keep field names aligned with FetchJobRecords, which now reads view["wckey"].

This keeps the task schema consistent between embedded‑DB and live Ctld representations.

Also applies to: 1666-1702, 1770-1806

src/CraneCtld/AccountManager.h (1)

212-218: WckeyKey hashing, helpers, and storage integrate cleanly into AccountManager

  • Private getters GetWckeyInfoNoLock_ / GetExistedWckeyInfoNoLock_ and helpers AddWckey_, DeleteWckey_, SetUserDefaultWckey_ are neatly declared here and match the cpp implementations.
  • struct WckeyKey encapsulates the composite key (name, cluster, user_name) with both operator== and AbslHashValue, which keeps equality and hashing consistent.
  • struct WcKeyHash wraps the hash functor cleanly, and m_wckey_map_ uses it together with util::rw_mutex m_rw_wckey_mutex_, mirroring the patterns already used for users/accounts/QoS.

The type and member design looks sound and consistent with the rest of the class.

Also applies to: 322-324, 338-341, 360-362, 446-465, 467-476

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
protos/PublicDefs.proto (1)

473-474: Clarify the semantics of the wckey_valid field.

The wckey_valid boolean field is added to TaskInfo, but its meaning is not documented. Please clarify:

  • Does it indicate whether the wckey field has been populated?
  • Does it mean the wckey was validated against the database?
  • Does it indicate whether Wckey validation is enabled for this task?
  • Should it reflect the g_config.MustNeedWckey configuration?

Additionally, note that TaskToCtld only has optional string wckey while TaskInfo has both bool wckey_valid and string wckey (not optional). This asymmetry suggests different use cases—consider adding a comment explaining the distinction.

Consider adding a comment like:

+  // wckey_valid indicates whether wckey validation was enabled when the task was submitted
   bool wckey_valid = 41;
+  // wckey contains the workload key assigned to this task (may be empty if wckey_valid is false)
   string wckey = 42;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dde33bc and cb622f9.

📒 Files selected for processing (6)
  • protos/PublicDefs.proto (4 hunks)
  • src/CraneCtld/AccountManager.cpp (11 hunks)
  • src/CraneCtld/CtldPublicDefs.h (4 hunks)
  • src/CraneCtld/DbClient.cpp (16 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (4 hunks)
  • src/CraneCtld/TaskScheduler.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/CraneCtld/CtldPublicDefs.h
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-30T02:09:41.625Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Applied to files:

  • src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
🧬 Code graph analysis (2)
src/CraneCtld/AccountManager.cpp (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
  • DeleteWckey (1615-1637)
  • DeleteWckey (1615-1618)
  • ModifyDefaultWckey (1236-1256)
  • ModifyDefaultWckey (1236-1239)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
  • string (70-72)
  • string (384-426)
  • string (385-394)
🔇 Additional comments (4)
protos/PublicDefs.proto (2)

184-184: LGTM: Wckey field addition to TaskToCtld.

The addition of the optional wckey field follows protobuf3 conventions and is appropriately placed with other task submission parameters.


788-793: LGTM: WckeyInfo message structure is well-defined.

The WckeyInfo message provides a clear structure for Wckey information with appropriate fields for name, cluster, user association, and default status. The field types and naming are consistent with the rest of the protobuf definitions.

src/CraneCtld/AccountManager.cpp (1)

179-207: LGTM! Wckey management implementation looks solid.

The Wckey integration in AccountManager follows established patterns and addresses all previously identified issues:

  • Proper RAII lock usage throughout
  • Race conditions eliminated by re-validating under write locks
  • Consistent locking order (wckey → user when both needed)
  • Deleted flag checks in place
  • Safe map access using find() instead of operator[]

The implementation provides complete CRUD operations with appropriate permission checks and transactional database updates.

Also applies to: 287-315, 389-402, 404-416, 457-487, 1181-1209, 1915-1915, 1983-2000, 2212-2288, 2425-2446, 2579-2634

src/CraneCtld/DbClient.cpp (1)

460-464: LGTM! Wckey serialization and database schema updates are well-implemented.

The remaining Wckey changes are solid:

  • Safe type checking before accessing wckey field in FetchJobRecords
  • Proper serialization/deserialization for default_wckey_map in User documents
  • Clean integration of wckey field into task documents
  • ViewToWckey_ includes appropriate exception handling

Also applies to: 962-972, 1258-1268, 1276-1298, 1300-1324, 1670-1741, 1774-1833

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/CraneCtld/DbClient.cpp (2)

1675-1750: Minor comment inconsistency.

At lines 1696-1697, the comment says // 40 - 44 followed by just "wckey", but there's only one field at position 40. The comment should say // 40 instead of // 40 - 44 for clarity.

Apply this diff to fix the comment:

     "mem_alloc", "device_map", "meta_container", "has_job_info", "licenses_alloc",
-    // 40 - 44
+    // 40
     "wckey"

1779-1842: Minor comment inconsistency.

At lines 1800-1801, the comment says // 40 - 44 followed by just "wckey", but there's only one field at position 40. The comment should say // 40 instead of // 40 - 44 for clarity.

Apply this diff to fix the comment:

       "mem_alloc", "device_map", "meta_container", "has_job_info", "licenses_alloc",
-      // 40 - 44
+      // 40
        "wckey"

Additionally, line 1814 has a similar issue in the tuple comment:

              int64_t, DeviceMap, std::optional<ContainerMetaInTask>, /*35-37*/
              bool, std::unordered_map<std::string, uint32_t>,        /*38-39*/
-             std::string>                                            /*40-44*/
+             std::string>                                            /*40*/
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cb622f9 and 5254877.

📒 Files selected for processing (4)
  • protos/PublicDefs.proto (4 hunks)
  • src/CraneCtld/CtldPublicDefs.h (4 hunks)
  • src/CraneCtld/DbClient.cpp (15 hunks)
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/CraneCtld/CtldPublicDefs.h (1)
src/CraneCtld/DbClient.h (3)
  • string (70-72)
  • string (384-426)
  • string (385-394)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
  • string (70-72)
  • string (384-426)
  • string (385-394)
🔇 Additional comments (16)
src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)

262-262: LGTM! Good practice to add the trailing comma.

Adding a trailing comma to the last array element makes future diffs cleaner.

src/CraneCtld/CtldPublicDefs.h (4)

199-200: LGTM! Config flags added correctly.

The new boolean flags MustNeedWckey and WckeyValid provide configuration control for wckey validation and requirement enforcement, with sensible defaults.


710-710: LGTM! Field addition is correct.

The wckey field in TaskInCtld aligns with the wckey support being added across the system.


1019-1020: LGTM! Field addition is correct.

The default_wckey_map field properly tracks default wckey per cluster for each user.


1075-1087: LGTM! Wckey struct is well-defined.

The Wckey struct includes all necessary fields and provides an equality operator for comparison.

protos/PublicDefs.proto (4)

184-184: LGTM! Proto field addition is correct.

The wckey field is properly defined as optional with the correct field number.


473-474: LGTM! Proto fields added correctly.

The wckey_valid and wckey fields enable both validation status and value tracking in TaskInfo.


788-793: LGTM! WckeyInfo message is well-defined.

The message structure aligns with the C++ Wckey struct and includes all necessary fields.


700-704: LGTM! Error codes added correctly.

The five new wckey-related error codes are sequentially numbered and cover the expected error scenarios. The corresponding error string array syntax issue was already addressed in commit 03a3ba5.

src/CraneCtld/DbClient.cpp (7)

620-633: LGTM! Exception handling is present.

The InsertWckey method includes proper exception handling as addressed in commit 03a3ba5.


681-683: LGTM! Break statement is present.

The WCKEY case includes the required break statement as addressed in commits 172421f to 45005cc.


733-745: LGTM! Exception handling is present.

The SelectAllWckey method includes proper exception handling as addressed in commit 03a3ba5.


796-818: LGTM! Exception handling is present.

The UpdateWckey method includes proper exception handling as addressed in commit 03a3ba5.


971-981: LGTM! Template specialization is correct.

The DocumentAppendItem_ template for std::unordered_map<std::string, std::string> properly serializes the map to BSON.


1267-1306: LGTM! User serialization/deserialization updated correctly.

The ViewToUser_ and UserToDocument_ methods properly handle the new default_wckey_map field with appropriate type checking and defensive programming.


1309-1333: LGTM! Wckey serialization/deserialization is correct.

The WckeyToDocument_ and ViewToWckey_ methods properly serialize and deserialize all Wckey fields with appropriate type conversions and exception handling.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)

262-269: Capitalize “Crane” consistently in new error string.

Line 266 currently says "The wckey already exists in crane". Other messages (e.g., line 182) capitalize the product name as “Crane”, so this should be adjusted for consistency and polish.

-        "The wckey already exists in crane",
+        "The wckey already exists in Crane",
src/CraneCtld/DbClient.cpp (1)

452-464: Set wckey_valid alongside wckey when fetching jobs.

Here you populate TaskInfo.wckey from the Mongo document, but TaskInfo.wckey_valid is never set, so it stays at the default false even when a valid wckey string exists. This makes it impossible for callers to distinguish “no wckey stored” from “wckey present and OK”.

Recommend updating this block to drive both fields:

-      if (view["wckey"] && view["wckey"].type() == bsoncxx::type::k_utf8) {
-        task->set_wckey(view["wckey"].get_string().value.data());
-      } else {
-        task->set_wckey("");
-      }
+      auto wckey_elem = view["wckey"];
+      if (wckey_elem && wckey_elem.type() == bsoncxx::type::k_utf8) {
+        task->set_wckey_valid(true);
+        task->set_wckey(wckey_elem.get_string().value.data());
+      } else {
+        task->set_wckey_valid(false);
+        task->set_wckey("");
+      }

This keeps behavior correct for legacy documents (no wckey field) while making the new proto fields meaningful.

🧹 Nitpick comments (2)
src/CraneCtld/TaskScheduler.cpp (1)

3704-3731: Wckey acquisition flow looks solid; consider using const TaskToCtld accessor.

The validation/marking logic for explicit vs default wckey (including the "*" convention and ignoring non‑essential default lookup errors) is consistent and avoids the extra DB round‑trip from the earlier version. The only small polish would be to prefer task->TaskToCtld().has_wckey() / .wckey() over MutableTaskToCtld() here since these calls are read‑only.

protos/PublicDefs.proto (1)

184-184: Proto schema changes for wckey are backward‑compatible and coherent.

Appending TaskToCtld.wckey, TaskInfo.wckey_valid/wckey, the wckey error codes (84–88), and WckeyInfo at the tail of their enums/messages preserves wire compatibility and lines up with the Ctld/DB changes. If you expect downstream consumers to distinguish “no wckey recorded” from “wckey string present but invalid”, a one‑line comment on the wckey_valid/wckey pair in TaskInfo would help future readers.

Also applies to: 473-475, 700-704, 788-793

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5254877 and 1350035.

📒 Files selected for processing (5)
  • protos/PublicDefs.proto (4 hunks)
  • src/CraneCtld/CtldPublicDefs.h (4 hunks)
  • src/CraneCtld/DbClient.cpp (16 hunks)
  • src/CraneCtld/TaskScheduler.cpp (1 hunks)
  • src/Utilities/PublicHeader/include/crane/PublicHeader.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/CraneCtld/CtldPublicDefs.h
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
🧬 Code graph analysis (1)
src/CraneCtld/TaskScheduler.cpp (1)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (160-160)
🔇 Additional comments (2)
src/CraneCtld/DbClient.cpp (2)

620-633: Wckey MongoDB CRUD and (de)serialization look consistent with existing patterns.

InsertWckey, SelectAllWckey, UpdateWckey, DeleteEntity(WCKEY), and the WckeyToDocument_ / ViewToWckey_ helpers mirror the established User/Account/Qos patterns (same collection naming, key fields, and exception handling). Filters on {name, cluster, user_name} for updates and deletions match the logical key for a wckey, and the schema (deleted, name, cluster, uid, user_name, is_default) aligns with the Ctld side.

No functional issues spotted here.

Also applies to: 681-683, 733-745, 796-818, 1309-1333


971-981: User default_wckey_map and task wckey persistence are wired correctly.

  • The new DocumentAppendItem_<std::unordered_map<std::string, std::string>> specialization plus the extended UserToDocument_/ViewToUser_ handle default_wckey_map cleanly and tolerate old documents where the field is absent.
  • Both TaskInEmbeddedDbToDocument_ and TaskInCtldToDocument_ now include a wckey field in the job document layout, and comments at the top of each function have been updated to reflect the extra slot. This keeps the Mongo schema and the historical layout documentation in sync.

Overall, the new serialized fields and their read/write paths look coherent with the rest of the client.

Also applies to: 1267-1277, 1285-1305, 1675-1711, 1780-1784, 1813-1815, 1842-1844

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/CraneCtld/DbClient.cpp (1)

460-464: Missing wckey_valid field assignment.

The code sets wckey but does not set wckey_valid, leaving TaskInfo incomplete. Based on the proto definition, both fields must be set to properly indicate wckey presence.

Apply this diff:

 if (view["wckey"] && view["wckey"].type() == bsoncxx::type::k_utf8) {
   task->set_wckey(view["wckey"].get_string().value.data());
+  task->set_wckey_valid(true);
 } else {
   task->set_wckey("");
+  task->set_wckey_valid(false);
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1350035 and 0323ca4.

📒 Files selected for processing (7)
  • etc/config.yaml (1 hunks)
  • protos/PublicDefs.proto (5 hunks)
  • src/CraneCtld/CraneCtld.cpp (1 hunks)
  • src/CraneCtld/CtldPublicDefs.cpp (1 hunks)
  • src/CraneCtld/CtldPublicDefs.h (4 hunks)
  • src/CraneCtld/DbClient.cpp (16 hunks)
  • src/CraneCtld/TaskScheduler.cpp (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/CraneCtld/CraneCtld.cpp
  • src/CraneCtld/CtldPublicDefs.cpp
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.

Applied to files:

  • src/CraneCtld/TaskScheduler.cpp
🧬 Code graph analysis (2)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
  • string (70-72)
  • string (384-426)
  • string (385-394)
src/CraneCtld/TaskScheduler.cpp (3)
src/Craned/Core/CranedPublicDefs.h (1)
  • g_config (160-160)
src/CraneCtld/CtldPublicDefs.cpp (1)
  • result (66-119)
src/Utilities/PublicHeader/PublicHeader.cpp (1)
  • result (772-772)
🔇 Additional comments (18)
etc/config.yaml (1)

151-152: Clarify intent of commented Wckey configuration.

These lines add a commented example for AccountingStorageEnforce: wckeys, but a past reviewer (Nativu5) questioned whether this option has meaning in your system. Since these remain commented out and are non-functional, please clarify:

  1. Is this intended as documentation for future optional use?
  2. Or should these lines be removed if the option is not applicable to your system?

If this is meant to be example/reference documentation, consider adding an inline comment explaining when or why an operator might uncomment these lines.

src/CraneCtld/DbClient.cpp (7)

620-633: LGTM!

InsertWckey follows the same pattern as InsertUser and InsertAccount, with proper exception handling and consistent document construction.


681-683: LGTM!

The WCKEY case now includes a break statement, preventing unintended fall-through.


733-745: LGTM!

SelectAllWckey now includes proper exception handling, consistent with SelectAllUser and SelectAllAccount.


796-818: LGTM!

UpdateWckey now includes proper exception handling and correctly filters by the compound key (name, cluster, user_name).


971-981: LGTM!

Template specialization correctly serializes unordered_map<string, string> to BSON subdocument for the User's default_wckey_map field.


1267-1277: LGTM!

User serialization/deserialization correctly handles the new default_wckey_map field with proper type checking and symmetric BSON mapping.

Also applies to: 1285-1306


1309-1333: LGTM!

WckeyToDocument_ and ViewToWckey_ provide symmetric serialization/deserialization with proper type mapping and exception handling.

src/CraneCtld/TaskScheduler.cpp (1)

3704-3731: Well-structured Wckey handling implementation.

The refactored logic correctly:

  1. Validates the provided wckey before any default lookup
  2. Only fetches the default wckey when needed for marking purposes (using the if-init pattern)
  3. Ignores GetExistedDefaultWckeyName errors when the user's wckey was already validated (avoiding misleading error propagation)
  4. Documents the "*" prefix convention via inline comments

The conditional structure cleanly separates the "wckey provided" vs "use default" paths.

src/CraneCtld/CtldPublicDefs.h (4)

199-199: LGTM!

The MustNeedWckey config flag with a safe default of false allows for backward-compatible opt-in of wckey enforcement.


709-709: LGTM!

The wckey field addition to TaskInCtld aligns with the proto TaskInfo.wckey field and enables wckey tracking throughout task lifecycle.


1018-1019: LGTM!

The default_wckey_map keyed by cluster name appropriately supports multi-cluster environments where users may have different default wckeys per cluster.


1074-1086: Verify intent of operator== for Wckey comparisons before removing is_default.

The concern about inconsistent treatment of mutable state fields is valid: both deleted and is_default are mutable fields, yet only is_default is included in the equality operator. However, the appropriate fix depends on the intended semantics:

  • If operator== should compare only identity (wckey name/cluster/user identity), remove both is_default and deleted.
  • If operator== should compare complete state, include both fields for consistency.
  • Check how Wckey objects are used: in collections as keys, in comparisons for updates, or other contexts that clarify whether identity or value semantics are needed.

Without understanding the actual usage patterns and design intent in the codebase, removing is_default without also addressing deleted risks creating subtle equality semantics bugs.

protos/PublicDefs.proto (5)

184-184: LGTM!

Using optional for wckey in TaskToCtld correctly distinguishes between "not provided" (use default) and "explicitly set" cases at submission time.


473-473: LGTM!

The wckey field in TaskInfo is non-optional since the task always has a resolved wckey value (either user-provided or default) by the time it's reported.


699-703: LGTM!

The new wckey-related error codes follow the existing naming convention and are assigned sequential values (84-88) that don't conflict with existing codes.


732-733: LGTM!

The DefaultWckey = 40 enum value in ModifyField provides a clear identifier for wckey modification operations, following the established pattern of grouping related fields with appropriate spacing (30s for qos, 40 for wckey).


789-794: LGTM!

The WckeyInfo message properly exposes the public-facing wckey attributes needed for gRPC communication, appropriately excluding internal fields like uid and deleted that are handled server-side.

@github-actions github-actions bot added the docs label Dec 3, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0323ca4 and 47a4423.

📒 Files selected for processing (2)
  • docs/en/command/cacctmgr.md (1 hunks)
  • docs/zh/command/cacctmgr.md (1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/en/command/cacctmgr.md

[grammar] ~527-~527: Ensure spelling is correct
Context: ...action where Target=PKU ``` --- ## 5. Wckey Management ### 5.1 Add Wckey **Syntax...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

@github-actions github-actions bot added the test-passed Build and test success label Dec 25, 2025
@L-Xiafeng L-Xiafeng merged commit 5187844 into master Dec 25, 2025
7 checks passed
@L-Xiafeng L-Xiafeng deleted the dev/wckey branch December 25, 2025 14:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs test-passed Build and test success

Projects

None yet

Development

Successfully merging this pull request may close these issues.

新增 cbatch -wckey 功能

4 participants