-
Notifications
You must be signed in to change notification settings - Fork 32
Add job wckey #558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add job wckey #558
Conversation
|
Warning Rate limit exceeded@L-Xiafeng has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 15 minutes and 24 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (20)
📝 WalkthroughWalkthroughAdds end-to-end "wckey" support: protobufs and gRPC RPCs; AccountManager in-memory + transactional Wckey CRUD and default management; DbClient persistence and document mapping to new Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Ctld as Ctld gRPC
participant AM as AccountManager
participant DB as DbClient
participant Mongo as MongoDB
Client->>Ctld: AddWckeyRequest(uid, WckeyInfo)
Ctld->>AM: AddUserWckey(uid, Wckey)
AM->>DB: InsertWckey(wckey)
DB->>Mongo: insert into wckey_table
Mongo-->>DB: ack
DB-->>AM: success / error
AM-->>Ctld: result (ok / ErrCode)
Ctld-->>Client: AddWckeyReply
Client->>Ctld: QueryWckeyInfoRequest(uid, optional list)
Ctld->>AM: QueryAllWckeyInfo(uid, filter_set)
AM->>DB: SelectAllWckey()
DB->>Mongo: query wckey_table
Mongo-->>DB: documents
DB-->>AM: wckey list
AM-->>Ctld: populated WckeyInfo list
Ctld-->>Client: QueryWckeyInfoReply
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (8)
src/CraneCtld/CraneCtld.cpp (1)
551-555: Consider explicit initialization for consistency.The logic correctly parses the comma-separated values and sets
MustNeedWckeyto true when "wckeys" is present. However, consider explicitly setting it to false when "wckeys" is not present to ensure consistent behavior during configuration reloads.Apply this diff for explicit initialization:
if (config["AccountingStorageEnforce"]) { auto val = config["AccountingStorageEnforce"].as<std::string>(); std::unordered_set<std::string> items = util::SplitStr(val, ','); - if (items.contains("wckeys")) g_config.MustNeedWckey = true; + g_config.MustNeedWckey = items.contains("wckeys"); }protos/PublicDefs.proto (2)
319-320: Consider usingoptionalfor wckey field consistencyInstead of using a separate
wckey_validboolean field, consider makingwckeyan optional field. This would be more consistent with modern protobuf practices and the pattern used inTaskToCtld(line 168).- bool wckey_valid = 41; - string wckey = 42; + optional string wckey = 41;
553-557: Potential duplication between UserInfo.Wckey and WckeyInfo messagesThe
UserInfo.Wckeynested message and the top-levelWckeyInfomessage share most fields. Consider consolidating these to avoid duplication and maintain consistency.You could either:
- Reuse
WckeyInfowithinUserInfoinstead of defining a nestedWckeymessage- Or extend the nested
Wckeymessage with additional fields when neededThis would reduce code duplication and ensure consistent field definitions across the codebase.
Also applies to: 578-583
protos/Crane.proto (1)
411-414: Clarify wckey_list format in QueryWckeyInfoRequestThe field name
wckey_listis ambiguous. Based on other Query operations, this should contain identifiers to filter the query.Consider renaming for clarity:
message QueryWckeyInfoRequest { uint32 uid = 1; - repeated string wckey_list = 2; + repeated string wckey_names = 2; // Filter by wckey names }src/CraneCtld/DbClient.h (1)
232-251: Complex error message construction could be simplifiedThe error message construction with manual JSON-like formatting is verbose and error-prone.
Consider using a JSON library or simplifying:
- std::string filter_str = "{"; - for (size_t i = 0; i < filter_fields.size(); ++i) { - filter_str += std::visit( - [&](auto&& arg) -> std::string { - if constexpr (std::is_same_v<std::decay_t<decltype(arg)>, bool>) { - return std::format("{}:{}", filter_fields[i].first, - arg ? "true" : "false"); - } else { - return std::format("{}:{}", filter_fields[i].first, arg); - } - }, - filter_fields[i].second); - if (i != filter_fields.size() - 1) filter_str += ", "; - } - filter_str += "}"; + std::string filter_str = bsoncxx::to_json(filter.view());src/CraneCtld/DbClient.cpp (2)
510-518: Consider pagination for SelectAllWckeyThe method loads all Wckey documents into memory at once, which could be problematic with large datasets.
Consider adding pagination support similar to
FetchJobRecords:-void MongodbClient::SelectAllWckey(std::list<Ctld::Wckey>* wckey_list) { +void MongodbClient::SelectAllWckey(std::list<Ctld::Wckey>* wckey_list, + size_t limit = 0) { + mongocxx::options::find options; + if (limit > 0) { + options = options.limit(limit); + } mongocxx::cursor cursor = - (*GetClient_())[m_db_name_][m_wckey_collection_name_].find({}); + (*GetClient_())[m_db_name_][m_wckey_collection_name_].find({}, options);
846-848: Consider using uint32_t for uid field consistencyThe uid is cast to int64_t, but it's uint32_t in the Wckey struct. While this works, it's inconsistent with the data type.
Consider using int32_t for consistency with other entity types:
- wckey.cluster, static_cast<int64_t>(wckey.uid), + wckey.cluster, static_cast<int32_t>(wckey.uid),And update the corresponding read operation:
- wckey->uid = wckey_view["uid"].get_int64().value; + wckey->uid = wckey_view["uid"].get_int32().value;src/CraneCtld/AccountManager.cpp (1)
1152-1155: Address TODO for multi-cluster support.The hardcoded cluster validation will need revision for multi-cluster support. Consider tracking this technical debt.
Would you like me to create an issue to track the multi-cluster enhancement for wckey management?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (16)
etc/config.yaml(1 hunks)protos/Crane.proto(5 hunks)protos/PublicDefs.proto(6 hunks)scripts/wipe_data.sh(2 hunks)src/CraneCtld/AccountManager.cpp(11 hunks)src/CraneCtld/AccountManager.h(10 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/CtldPreCompiledHeader.h(1 hunks)src/CraneCtld/CtldPublicDefs.h(4 hunks)src/CraneCtld/DbClient.cpp(15 hunks)src/CraneCtld/DbClient.h(7 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(4 hunks)src/CraneCtld/RpcService/CtldGrpcServer.h(4 hunks)src/CraneCtld/TaskScheduler.cpp(1 hunks)src/Utilities/PublicHeader/String.cpp(1 hunks)src/Utilities/PublicHeader/include/crane/String.h(1 hunks)
🧰 Additional context used
🧠 Learnings (8)
src/CraneCtld/CtldPreCompiledHeader.h (3)
Learnt from: L-Xiafeng
PR: #496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In CraneSched, phmap::parallel_flat_hash_set from the Parallel Hashmap library (parallel_hashmap/phmap.h) is used for thread-safe containers. This container implements internal sharding with separate locks for different parts of the hash table, making it inherently thread-safe for concurrent operations without requiring external synchronization.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: The phmap::parallel_flat_hash_set used in CraneSched's VaultClientWrapper is from the Parallel Hashmap library, which provides thread-safe containers for concurrent access. It uses internal partitioning to reduce contention, making it safe for use in multithreaded environments without external synchronization.
src/Utilities/PublicHeader/include/crane/String.h (1)
Learnt from: L-Xiafeng
PR: #496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
src/CraneCtld/CtldPublicDefs.h (10)
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Learnt from: 1daidai1
PR: #458
File: src/CraneCtld/CtldPublicDefs.h:0-0
Timestamp: 2025-05-23T02:32:43.952Z
Learning: In the CraneSched project, allocated_res_view is handled/updated separately before calling SetAllocatedRes, so it does not need to be updated again within the method itself.
Learnt from: 1daidai1
PR: #458
File: src/CraneCtld/CtldPublicDefs.h:0-0
Timestamp: 2025-05-23T02:32:43.952Z
Learning: In the CraneSched project, allocated_res_view is updated before calling SetAllocatedRes, so it doesn't need to be updated again within the method itself.
Learnt from: 1daidai1
PR: #458
File: src/CraneCtld/CtldPublicDefs.h:0-0
Timestamp: 2025-05-23T02:32:43.952Z
Learning: In the CraneSched project, allocated_res_view is updated before calling SetAllocatedRes, so it doesn't need to be updated again within the method itself.
Learnt from: huerni
PR: #469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.
Learnt from: huerni
PR: #469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
src/CraneCtld/RpcService/CtldGrpcServer.h (3)
Learnt from: L-Xiafeng
PR: #520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, g_ctld_client is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
src/CraneCtld/AccountManager.h (10)
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Learnt from: L-Xiafeng
PR: #496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: The phmap::parallel_flat_hash_set used in CraneSched's VaultClientWrapper is from the Parallel Hashmap library, which provides thread-safe containers for concurrent access. It uses internal partitioning to reduce contention, making it safe for use in multithreaded environments without external synchronization.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In CraneSched, phmap::parallel_flat_hash_set from the Parallel Hashmap library (parallel_hashmap/phmap.h) is used for thread-safe containers. This container implements internal sharding with separate locks for different parts of the hash table, making it inherently thread-safe for concurrent operations without requiring external synchronization.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.
src/CraneCtld/TaskScheduler.cpp (7)
Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Learnt from: huerni
PR: #469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().
Learnt from: huerni
PR: #469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.
Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Learnt from: L-Xiafeng
PR: #399
File: src/Craned/Supervisor/TaskManager.cpp:396-396
Timestamp: 2025-05-25T04:11:27.740Z
Learning: In TaskManager.cpp, GetCrunMeta() calls don't need null checks because they're only called in contexts where the task is guaranteed to be a CRUN task (e.g., SetupChildProcessCrunX11_ is only called when step->IsCrun() && x11() conditions are met), ensuring the metadata will always be valid.
Learnt from: huerni
PR: #537
File: src/CraneCtld/CtldGrpcServer.cpp:1662-1669
Timestamp: 2025-06-30T08:38:18.748Z
Learning: In CraneSched's CPU frequency feature, validation is split between controller and compute nodes: CtldGrpcServer only validates governor against configuration (g_config.CpuFreqGovernors), while detailed validation of low/high frequency values is deferred to TaskManager.cpp via CpuFreqValidateAndSet() where actual hardware capabilities can be checked.
Learnt from: L-Xiafeng
PR: #520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, g_ctld_client is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
src/CraneCtld/RpcService/CtldGrpcServer.cpp (3)
Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Learnt from: L-Xiafeng
PR: #520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, g_ctld_client is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
src/CraneCtld/AccountManager.cpp (8)
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Learnt from: L-Xiafeng
PR: #461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The AtomicHashMap::operator[] implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a ValueExclusivePtr that will be empty if the key is not found.
Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ is an std::unordered_set<std::string> which is not inherently thread-safe. The IsCertAllowed method in VaultClientWrapper involves both read and write operations on this container and could cause race conditions when called from multiple threads.
🧬 Code Graph Analysis (6)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(49-57)
src/Utilities/PublicHeader/include/crane/String.h (1)
src/Utilities/PublicHeader/String.cpp (4)
SplitStr(414-427)SplitStr(414-414)ToLower(429-434)ToLower(429-429)
src/CraneCtld/CraneCtld.cpp (2)
src/Utilities/PublicHeader/String.cpp (4)
SplitStr(414-427)SplitStr(414-414)ToLower(429-434)ToLower(429-429)src/Craned/CranedPublicDefs.h (1)
g_config(107-107)
src/CraneCtld/CtldPublicDefs.h (2)
src/Craned/CranedPublicDefs.h (1)
g_config(107-107)src/CraneCtld/DbClient.h (2)
string(323-348)string(324-332)
src/CraneCtld/RpcService/CtldGrpcServer.h (2)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (8)
AddWckey(546-570)AddWckey(546-548)ModifyWckey(796-814)ModifyWckey(796-799)QueryWckeyInfo(987-1020)QueryWckeyInfo(987-990)DeleteWckey(1151-1172)DeleteWckey(1151-1154)src/CraneCtld/AccountManager.cpp (4)
ModifyWckey(1136-1164)ModifyWckey(1136-1139)DeleteWckey(272-300)DeleteWckey(272-275)
src/CraneCtld/DbClient.h (1)
src/CraneCtld/DbClient.cpp (14)
InsertWckey(414-426)InsertWckey(414-414)SelectAllWckey(510-518)SelectAllWckey(510-510)GetClient_(728-734)GetClient_(728-728)GetSession_(736-744)GetSession_(736-736)UpdateWckey(557-575)UpdateWckey(557-557)WckeyToDocument_(840-850)WckeyToDocument_(840-841)ViewToWckey_(852-864)ViewToWckey_(852-853)
🔇 Additional comments (34)
src/Utilities/PublicHeader/include/crane/String.h (1)
93-96: LGTM! Clean utility function additions.The function declarations are well-defined and match their implementations. These utilities will support the wckey configuration parsing functionality introduced in this PR.
src/CraneCtld/CtldPreCompiledHeader.h (1)
64-64: LGTM! Necessary include for hash functionality.The addition of
absl/hash/hash.his properly placed with other Abseil includes and supports the WckeyKey hashing functionality introduced in this PR.scripts/wipe_data.sh (2)
3-3: LGTM! Usage message correctly updated.The usage message accurately reflects that modes 4 and 6 now include wiping the wckey_table alongside existing tables.
55-55: LGTM! Consistent wiping logic for wckey_table.The addition of wckey_table wiping follows the existing pattern and is appropriately included in modes 4, 5, and 6 where user-related data is cleared.
etc/config.yaml (1)
125-128: LGTM! Helpful configuration documentation.The commented configuration options provide clear examples for users wanting to enable wckey tracking and enforcement. Keeping them commented prevents unintended activation while serving as documentation.
src/Utilities/PublicHeader/String.cpp (2)
414-427: LGTM! Well-implemented string splitting with proper trimming.The
SplitStrfunction correctly handles:
- String splitting by delimiter using stringstream
- Whitespace trimming from both ends
- Automatic deduplication via unordered_set
- Edge cases where trimmed strings are empty
429-434: LGTM! Modern and correct lowercase conversion.The
ToLowerfunction usesstd::ranges::transformwith properunsigned charcasting in the lambda, which is the correct approach forstd::tolowerto avoid undefined behavior with signed char values.src/CraneCtld/CraneCtld.cpp (1)
556-567: LGTM! Well-implemented configuration parsing with proper error handling.The implementation correctly handles case-insensitive parsing of boolean values with clear error messaging. The explicit boolean assignment and consistent error handling align well with the existing codebase patterns.
src/CraneCtld/TaskScheduler.cpp (1)
3112-3129: LGTM! Well-structured wckey validation and assignment logic.The implementation correctly handles wckey requirements based on configuration:
- Proper validation: Validates user-provided wckeys against the account manager database
- Fallback mechanism: Assigns default wckey with "*" prefix when none provided
- Clean separation: Clears wckey when not required by configuration
- Error handling: Returns appropriate error codes (
ERR_INVALID_WCKEY) with descriptive loggingThe integration with
g_account_managerfollows the established patterns in the codebase and aligns well with the comprehensive wckey support introduced in this PR.src/CraneCtld/RpcService/CtldGrpcServer.h (4)
101-103: LGTM - Wckey management method declarations are consistent.The new AddWckey method declaration follows the established pattern of other CRUD operations in the service interface.
118-120: LGTM - ModifyWckey method declaration follows conventions.The method signature is consistent with other modify operations in the service.
132-135: LGTM - QueryWckeyInfo method declaration is well-structured.The method follows the established query pattern used by other info query methods.
154-156: LGTM - DeleteWckey method declaration completes the CRUD interface.The method signature is consistent with other delete operations in the service.
src/CraneCtld/RpcService/CtldGrpcServer.cpp (5)
546-570: LGTM - AddWckey implementation follows established patterns.The implementation correctly:
- Checks server readiness
- Constructs a Wckey object from the request
- Calls g_account_manager->AddUserWckey with proper parameters
- Sets response fields based on the result
- Returns grpc::Status::OK as required by the learned conventions
796-814: LGTM - ModifyWckey implementation is consistent.The method properly handles the modify operation by calling g_account_manager->ModifyWckey and setting appropriate response fields based on the result.
987-1020: LGTM - QueryWckeyInfo implementation is well-structured.The implementation correctly:
- Calls g_account_manager->QueryAllWckeyInfo
- Handles error cases by populating rich_error_list
- Iterates through results to populate wckey_list in the response
- Sets response status based on error presence
1151-1172: LGTM - DeleteWckey implementation follows conventions.The method properly calls g_account_manager->DeleteWckey and handles both success and error cases appropriately.
967-980: LGTM - QueryUserInfo enhancement for Wckey support.The code properly iterates through the user's wckey_map and populates the wckey_list in the response. It correctly checks the default_wckey_map to set the is_def flag for default wckeys.
src/CraneCtld/CtldPublicDefs.h (7)
169-170: LGTM - Configuration flags for Wckey validation and enforcement.The two boolean flags
MustNeedWckeyandWckeyValidprovide appropriate control over Wckey functionality, with sensible default values of false.
399-399: LGTM - Wckey field addition to TaskInCtld.The
wckeystring field is appropriately placed in the task structure and will be used to associate wckeys with tasks.
866-867: LGTM - WckeyMap type alias is well-defined.The type alias clearly represents the mapping from cluster names to sets of wckey names.
873-873: LGTM - Default wckey mapping addition.The
default_wckey_mapfield appropriately maps cluster names to default wckey names for each user.
877-877: LGTM - Wckey map integration in User struct.The
wckey_mapfield of typeWckeyMapproperly stores user-associated wckeys organized by cluster.
880-892: LGTM - Wckey struct definition is comprehensive.The struct includes all necessary fields:
deletedflag for soft deletionname,cluster,uid,user_namefor identificationis_defflag for default wckey indication- Equality operator for comparisons
The structure follows patterns established by other entity structs in the codebase.
713-718: LGTM - Wckey validation logic in SetFieldsOfTaskInfo.The code correctly sets the
wckey_validfield based on the configuration flags (WckeyValidorMustNeedWckey) and populates thewckeyfield from the task's wckey member. This provides proper integration between configuration and task information.src/CraneCtld/AccountManager.h (1)
409-428: Well-implemented composite key structureThe
WckeyKeystruct with custom hash support is well-designed for the three-field composite key requirement.src/CraneCtld/DbClient.cpp (2)
374-378: Good defensive programming for wckey field handlingThe code properly checks for field existence and type before accessing the wckey value, providing good backward compatibility.
661-671: Template specialization for string maps looks correctThe template specialization properly handles serialization of
std::unordered_map<std::string, std::string>to BSON subdocuments.src/CraneCtld/AccountManager.cpp (6)
287-290: Good defensive check for default wckey deletion.The check prevents deletion of default wckeys, which is essential for maintaining data consistency.
454-462: Consider aligning wckey query permissions with user query model.Unlike
QueryAllUserInfowhich allows non-operators to query users within their account hierarchy, this method completely restricts non-operators. Consider whether wckey visibility should follow account boundaries for consistency.If the current restriction is intentional, consider adding a comment explaining why wckeys have stricter visibility rules than users.
1771-1771: Proper initialization of wckey data with consistent lock ordering.The wckey initialization follows the established pattern and maintains consistent lock ordering to prevent deadlocks.
Also applies to: 1791-1796
1970-2053: Well-structured transaction handling for wckey creation.The implementation correctly handles all edge cases including updating existing deleted entries and managing default wckey transitions.
Consider defining constants for the map key patterns (e.g., "wckey_map." + cluster) to avoid string concatenation errors.
2260-2284: Correct implementation of wckey soft deletion.The method properly maintains consistency between wckey and user collections using transactions, and follows the established soft-delete pattern.
2404-2458: Efficient default wckey update with proper state management.The implementation correctly handles the transition of default wckeys, including the optimization to skip unnecessary updates when the wckey is already default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/CraneCtld/AccountManager.cpp (2)
1152-1155: TODO: Multi-cluster support needs implementation.The current implementation only supports a single cluster. This limitation should be addressed for multi-cluster deployments.
Would you like me to help implement multi-cluster support or create an issue to track this enhancement?
1970-2046: Robust implementation with proper transaction handling.The function correctly manages default wckey state and ensures data consistency. Consider extracting the transaction callback into a separate named function for better readability given its complexity.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
protos/PublicDefs.proto(4 hunks)src/CraneCtld/AccountManager.cpp(11 hunks)src/CraneCtld/CtldPublicDefs.h(4 hunks)src/CraneCtld/DbClient.cpp(15 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- protos/PublicDefs.proto
- src/CraneCtld/CtldPublicDefs.h
- src/CraneCtld/RpcService/CtldGrpcServer.cpp
🧰 Additional context used
🧠 Learnings (2)
src/CraneCtld/DbClient.cpp (1)
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
src/CraneCtld/AccountManager.cpp (11)
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ is an std::unordered_set<std::string> which is not inherently thread-safe. The IsCertAllowed method in VaultClientWrapper involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.
Learnt from: L-Xiafeng
PR: #399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.
Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Learnt from: L-Xiafeng
PR: #520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.
Learnt from: L-Xiafeng
PR: #461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The AtomicHashMap::operator[] implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a ValueExclusivePtr that will be empty if the key is not found.
🔇 Additional comments (17)
src/CraneCtld/DbClient.cpp (12)
374-378: LGTM! Defensive null check for wckey field.Good defensive programming by checking if the field exists and has the correct type before accessing it, with a sensible default empty string fallback.
414-426: LGTM! Consistent implementation with other entity insertion methods.The
InsertWckeymethod follows the established pattern for entity insertion with proper session handling and creation timestamp.
469-471: The missing break statement issue has been resolved.The switch case for
EntityType::WCKEYnow correctly includes a break statement at line 471.
510-518: LGTM! Consistent implementation with other entity selection methods.The
SelectAllWckeymethod follows the established pattern for retrieving all entities of a specific type.
557-575: LGTM! Appropriate use of composite key for Wckey updates.The use of a composite filter (name + cluster + user_name) is appropriate for uniquely identifying Wckey entities across users and clusters.
661-685: LGTM! Well-structured template specializations for map serialization.The template specializations correctly handle the serialization of both simple string-to-string maps and the more complex WckeyMap (string to set of strings) structure.
781-791: LGTM! Robust deserialization with proper type checking.The implementation properly validates field existence and types before deserialization, preventing potential runtime errors.
799-819: LGTM! User document structure properly extended.The field array and tuple are correctly updated to include the new
default_wckey_mapfield.
821-831: LGTM! Consistent Wckey serialization implementation.The
WckeyToDocument_method properly serializes all Wckey fields with appropriate type conversions.
833-845: LGTM! Proper Wckey deserialization with error handling.The
ViewToWckey_method correctly deserializes all fields with appropriate type conversions and exception handling.
1062-1123: LGTM! Task document structure properly extended for embedded DB.The fields array and tuple are correctly updated to include the wckey field from the task's gRPC message.
1152-1205: LGTM! Task document structure properly extended for in-memory tasks.The fields array and tuple are correctly updated to include the wckey field from the TaskInCtld object.
src/CraneCtld/AccountManager.cpp (5)
272-300: Well-structured deletion logic with proper validation checks.The implementation correctly prevents deletion of default wckeys and follows the established locking patterns.
443-465: Clean implementation with proper permission checks.The function correctly restricts wckey queries to operators and filters out deleted entries appropriately.
2253-2274: Clean soft-delete implementation.The function correctly implements soft deletion with proper transaction handling and maintains consistency between database and in-memory state.
2394-2448: Efficient implementation with proper state management.The function correctly handles default wckey transitions and includes an optimization to skip unnecessary updates when the wckey is already default.
1771-1796: Proper initialization of wckey data structures.The changes correctly integrate wckey loading into the initialization sequence with appropriate locking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (4)
src/CraneCtld/AccountManager.h (1)
157-159: Rename method to clarify its purposeThe method name
ModifyDefaultWckeyis misleading as it only sets the default Wckey without general modification capabilities.- CraneExpected<void> ModifyDefaultWckey(uint32_t uid, const std::string& name, - const std::string& cluster, - const std::string& user_name); + CraneExpected<void> SetDefaultWckey(uint32_t uid, const std::string& name, + const std::string& cluster, + const std::string& user_name);src/CraneCtld/DbClient.h (1)
181-254: Fix debug logging levelThe method implementation is solid, but the debug logging uses an inappropriate error level.
Remove or change the log level at lines 247-250:
- CRANE_ERROR( - "Update date in database fail(filter:{}, opt:{}, key:{}, value:{}, " - "{})", - filter_str, opt, key, value, result ? result->modified_count() : -1); + CRANE_ERROR( + "Update data in database failed (filter:{}, opt:{}, key:{}, value:{}, " + "modified_count:{})", + filter_str, opt, key, value, result ? result->modified_count() : -1);Also note the typo "date" should be "data".
src/CraneCtld/AccountManager.cpp (2)
379-392: RAII locking pattern correctly implemented.The method now uses
util::read_lock_guardfor proper RAII-style mutex management, which addresses the previous concern about manual lock/unlock operations.
1854-1860: Deletion flag check is properly implemented.The
GetExistedWckeyInfoNoLock_method correctly checks both existence and the deleted flag before returning the wckey.
🧹 Nitpick comments (2)
src/CraneCtld/AccountManager.h (1)
409-428: Fix naming inconsistencyThere's an inconsistency between
WckeyKeyandWcKeyHash. Use consistent naming throughout.- struct WcKeyHash { + struct WckeyKeyHash { std::size_t operator()(const WckeyKey& key) const { return absl::Hash<WckeyKey>{}(key); } };Also update the usage in line 438:
- std::unordered_map<WckeyKey, std::unique_ptr<Wckey>, WcKeyHash> m_wckey_map_; + std::unordered_map<WckeyKey, std::unique_ptr<Wckey>, WckeyKeyHash> m_wckey_map_;src/CraneCtld/AccountManager.cpp (1)
1156-1159: Track the multi-cluster TODO.The current implementation correctly validates the cluster name, but the TODO comment indicates future work is needed for multi-cluster support.
Would you like me to create an issue to track the multi-cluster support enhancement?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
protos/Crane.proto(5 hunks)src/CraneCtld/AccountManager.cpp(11 hunks)src/CraneCtld/AccountManager.h(10 hunks)src/CraneCtld/DbClient.cpp(15 hunks)src/CraneCtld/DbClient.h(7 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(4 hunks)src/CraneCtld/RpcService/CtldGrpcServer.h(4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/CraneCtld/RpcService/CtldGrpcServer.h
- src/CraneCtld/RpcService/CtldGrpcServer.cpp
- protos/Crane.proto
🧰 Additional context used
🧠 Learnings (4)
src/CraneCtld/AccountManager.cpp (11)
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ is an std::unordered_set<std::string> which is not inherently thread-safe. The IsCertAllowed method in VaultClientWrapper involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.
Learnt from: L-Xiafeng
PR: #399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.
Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Learnt from: L-Xiafeng
PR: #520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.
Learnt from: L-Xiafeng
PR: #461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The AtomicHashMap::operator[] implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a ValueExclusivePtr that will be empty if the key is not found.
src/CraneCtld/DbClient.h (1)
Learnt from: MCKZX-llx
PR: #513
File: src/Utilities/EmbeddedDb/EmbeddedDb.cpp:411-411
Timestamp: 2025-06-08T07:45:15.739Z
Learning: In the CraneSched project's Berkeley DB implementation in src/Utilities/EmbeddedDb/EmbeddedDb.cpp, explicit deletion of DbTxn objects after commit() or abort() is not required, contrary to general Berkeley DB documentation. The project maintainer confirmed that the memory management is handled automatically in their specific implementation.
src/CraneCtld/AccountManager.h (10)
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Learnt from: L-Xiafeng
PR: #496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: The phmap::parallel_flat_hash_set used in CraneSched's VaultClientWrapper is from the Parallel Hashmap library, which provides thread-safe containers for concurrent access. It uses internal partitioning to reduce contention, making it safe for use in multithreaded environments without external synchronization.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In CraneSched, phmap::parallel_flat_hash_set from the Parallel Hashmap library (parallel_hashmap/phmap.h) is used for thread-safe containers. This container implements internal sharding with separate locks for different parts of the hash table, making it inherently thread-safe for concurrent operations without requiring external synchronization.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.
src/CraneCtld/DbClient.cpp (1)
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
🔇 Additional comments (24)
src/CraneCtld/AccountManager.h (5)
42-42: LGTM!The
WckeyMutexSharedPtrtype alias follows the established pattern for other entity types in the class.
54-55: LGTM!The
AddUserWckeymethod signature is consistent with other entity addition methods.
63-66: LGTM!The
DeleteWckeymethod correctly uses a composite key (name, cluster, user_name) to identify Wckey entities, which aligns with theWckeyKeystruct design.
91-96: LGTM!The Wckey getter methods correctly use composite keys and follow the established patterns. The
GetExistedDefaultWckeyNamemethod appropriately returns the default Wckey name for a given cluster/user combination.
438-439: LGTM!The
m_wckey_map_member correctly uses a composite key for Wckey storage and follows the established thread-safety pattern withm_rw_wckey_mutex_.src/CraneCtld/DbClient.h (3)
47-47: LGTM!The
WCKEYentity type is correctly added to the enum.
161-165: LGTM!The WCKEY case correctly returns false with an appropriate error message, addressing the previous review feedback about silent failures.
68-68: LGTM!The Wckey-related method declarations and collection name member follow the established patterns for entity management.
Also applies to: 110-110, 257-257, 294-297, 324-324
src/CraneCtld/DbClient.cpp (9)
374-378: LGTM!The wckey field extraction is implemented defensively with proper type checking and fallback to empty string.
414-426: LGTM!The
InsertWckeyimplementation follows the established pattern for entity insertion.
470-472: LGTM!The WCKEY case correctly includes the break statement, properly addressing the previous review feedback about fall-through.
511-519: LGTM!The
SelectAllWckeyimplementation follows the established pattern for entity selection.
558-576: LGTM!The
UpdateWckeyimplementation correctly uses the composite key (name, cluster, user_name) for filtering.
662-686: LGTM!The template specializations for wckey-related map types correctly handle BSON serialization of complex data structures.
782-792: LGTM!The User schema changes correctly handle the
default_wckey_mapfield with proper backward compatibility checks.Also applies to: 800-819
822-846: LGTM!The Wckey serialization/deserialization methods correctly handle all fields with appropriate type conversions.
1063-1063: LGTM!The task schema changes consistently add the wckey field to both embedded and in-memory task document structures.
Also applies to: 1079-1080, 1090-1090, 1124-1124, 1153-1153, 1169-1169, 1180-1180, 1206-1206
src/CraneCtld/AccountManager.cpp (7)
169-201: Good implementation of double-checked locking pattern.The re-validation of user existence under write lock (lines 190-193) properly addresses the potential race condition where a user could be deleted between releasing the read lock and acquiring the write lock.
277-305: Proper validation logic for wckey deletion.The method correctly prevents deletion of default wckeys and follows the established permission checking pattern.
394-406: Manual locking is appropriate for shared pointer pattern.The manual lock/unlock pattern here is intentional and necessary since the method returns a
WckeyMutexSharedPtrthat manages the mutex lifecycle. The lock must remain held until the caller releases the shared pointer.
447-469: Consistent permission model and filtering logic.The method correctly restricts wckey queries to operators and above, and properly filters out deleted entries.
1771-1801: Proper initialization of wckey data structures.The method correctly loads wckey data from the database and uses an appropriate composite key structure for the wckey map.
1974-2050: Well-implemented atomic wckey addition with proper default handling.The method correctly manages default wckey transitions and ensures database and in-memory state consistency through transactions.
2398-2452: Correct implementation of default wckey switching.The method properly handles the atomic transition of default wckey status, ensuring only one wckey is marked as default per user/cluster combination.
fe51039 to
f9bfa94
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/CraneCtld/AccountManager.cpp (2)
390-402: Consider using RAII for lock management in the future.While this method follows the existing pattern used by other
GetExisted*Infomethods, the manual lock/unlock approach could be improved with RAII. Consider refactoring all these methods together in a future update to usestd::shared_lockor similar RAII wrappers.
1136-1164: TODO: Multi-cluster support needs implementation.The current implementation only supports a single cluster as indicated by the TODO comment. The hardcoded cluster validation will need to be updated when multi-cluster support is added.
Would you like me to create an issue to track the multi-cluster support implementation?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (18)
etc/config.yaml(1 hunks)protos/Crane.proto(5 hunks)protos/PublicDefs.proto(4 hunks)scripts/wipe_data.sh(2 hunks)src/CraneCtld/AccountManager.cpp(11 hunks)src/CraneCtld/AccountManager.h(10 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/CtldPreCompiledHeader.h(1 hunks)src/CraneCtld/CtldPublicDefs.cpp(1 hunks)src/CraneCtld/CtldPublicDefs.h(3 hunks)src/CraneCtld/DbClient.cpp(16 hunks)src/CraneCtld/DbClient.h(7 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(4 hunks)src/CraneCtld/RpcService/CtldGrpcServer.h(4 hunks)src/CraneCtld/TaskScheduler.cpp(1 hunks)src/Utilities/PublicHeader/String.cpp(1 hunks)src/Utilities/PublicHeader/include/crane/PublicHeader.h(1 hunks)src/Utilities/PublicHeader/include/crane/String.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (15)
- src/CraneCtld/CtldPreCompiledHeader.h
- etc/config.yaml
- scripts/wipe_data.sh
- src/Utilities/PublicHeader/include/crane/PublicHeader.h
- src/Utilities/PublicHeader/String.cpp
- src/CraneCtld/CtldPublicDefs.cpp
- src/CraneCtld/TaskScheduler.cpp
- src/Utilities/PublicHeader/include/crane/String.h
- src/CraneCtld/CtldPublicDefs.h
- src/CraneCtld/CraneCtld.cpp
- protos/PublicDefs.proto
- protos/Crane.proto
- src/CraneCtld/RpcService/CtldGrpcServer.h
- src/CraneCtld/RpcService/CtldGrpcServer.cpp
- src/CraneCtld/AccountManager.h
🧰 Additional context used
🧠 Learnings (2)
src/CraneCtld/AccountManager.cpp (13)
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Learnt from: huerni
PR: #499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ is an std::unordered_set<std::string> which is not inherently thread-safe. The IsCertAllowed method in VaultClientWrapper involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Learnt from: huerni
PR: #528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.
Learnt from: L-Xiafeng
PR: #461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.
Learnt from: L-Xiafeng
PR: #520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Learnt from: RileyWen
PR: #456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the m_allowed_certs_ in VaultClientWrapper is implemented as a phmap::parallel_flat_hash_set<std::string>, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Learnt from: RileyWen
PR: #475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.
Learnt from: L-Xiafeng
PR: #399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.
Learnt from: L-Xiafeng
PR: #496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Learnt from: huerni
PR: #537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Learnt from: L-Xiafeng
PR: #461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The AtomicHashMap::operator[] implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a ValueExclusivePtr that will be empty if the key is not found.
src/CraneCtld/DbClient.h (1)
Learnt from: MCKZX-llx
PR: #513
File: src/Utilities/EmbeddedDb/EmbeddedDb.cpp:411-411
Timestamp: 2025-06-08T07:45:15.739Z
Learning: In the CraneSched project's Berkeley DB implementation in src/Utilities/EmbeddedDb/EmbeddedDb.cpp, explicit deletion of DbTxn objects after commit() or abort() is not required, contrary to general Berkeley DB documentation. The project maintainer confirmed that the memory management is handled automatically in their specific implementation.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: format
🔇 Additional comments (25)
src/CraneCtld/DbClient.h (5)
47-47: LGTM!The WCKEY entity type addition follows the existing enum pattern correctly.
68-68: LGTM!The Wckey method declarations follow the established patterns for other entity types and maintain consistency in the interface design.
Also applies to: 110-110, 250-250
161-166: Proper error handling for WCKEY updates.Good improvement! The explicit error message clearly directs users to the appropriate method for WCKEY updates, addressing the previous concern about silent failures.
181-247: Well-designed multi-field update method.The
UpdateEntityOneByFieldstemplate method provides a flexible solution for complex entity updates. The use ofstd::variantforFilterValueand comprehensive error logging with proper boolean formatting demonstrates good design practices.
287-290: LGTM!The private helper method declarations and collection name constant follow the established patterns and naming conventions for other entity types.
Also applies to: 317-317
src/CraneCtld/DbClient.cpp (11)
375-379: LGTM!The wckey field extraction includes proper BSON type checking and safe fallback handling, following the defensive programming patterns used elsewhere in the method.
415-427: LGTM!The
InsertWckeymethod follows the established pattern for insert operations, including proper serialization and timestamp handling.
471-473: LGTM!The WCKEY case is properly implemented with the break statement, addressing the previous review feedback about missing break statements.
512-520: LGTM!The
SelectAllWckeymethod follows the established pattern for select-all operations with proper deserialization handling.
559-577: LGTM!The
UpdateWckeymethod correctly uses a multi-field filter (name, cluster, user_name) which explains why WCKEY entities require the specializedUpdateEntityOneByFieldsmethod rather than the single-name-basedUpdateEntityOne.
663-673: LGTM!The template specialization for
std::unordered_map<std::string, std::string>correctly handles BSON serialization for the User entity's default_wckey_map field.
769-779: LGTM!The
default_wckey_mapfield parsing includes proper BSON type checking and safe access patterns. Clearing the map before population ensures clean state.
787-807: LGTM!The
UserToDocument_method correctly incorporates thedefault_wckey_mapfield with proper field count updates and tuple type modifications.
809-819: LGTM!The
WckeyToDocument_method follows the established serialization pattern and includes all necessary Wckey fields with appropriate type casting for BSON compatibility.
821-833: LGTM!The
ViewToWckey_method correctly deserializes BSON documents with proper exception handling and type conversions for all Wckey fields.
1048-1048: LGTM!Both task serialization methods are consistently updated to include the wckey field with proper field count increments and tuple type modifications. The wckey sources are appropriate for each context.
Also applies to: 1066-1066, 1077-1077, 1111-1111, 1137-1137, 1156-1156, 1167-1167, 1193-1193
src/CraneCtld/AccountManager.cpp (9)
169-197: Good job addressing the race condition!The implementation correctly re-validates user existence under write lock after releasing the read lock, preventing use-after-free issues.
273-301: Proper validation to prevent default wckey deletion.Good implementation that prevents deletion of default wckeys, maintaining system consistency.
375-388: Excellent use of RAII for lock management.The method correctly uses
util::read_lock_guardwhich ensures automatic unlocking even on early returns.
443-465: Correct permission enforcement for wckey queries.The implementation properly restricts wckey queries to operators and above, maintaining security boundaries.
1771-1771: Correct initialization of wckey data structures.The wckey loading follows the established pattern and properly uses composite keys for unique identification.
Also applies to: 1791-1796
1836-1856: Correct implementation of wckey lookup methods.The methods properly handle the deleted flag check and follow the established pattern for entity lookups.
1970-2046: Well-implemented default wckey management.The method correctly handles the complex logic of updating default wckeys, ensuring both database and in-memory state remain consistent. The transactional approach guarantees atomicity of all related updates.
2253-2274: Correct implementation of wckey soft deletion.The method properly uses filter fields for composite key lookup and maintains consistency between database and in-memory state.
2394-2448: Excellent implementation of default wckey updates.The method comprehensively handles all aspects of changing default wckeys, including updating both old and new default wckeys in both tables, with proper transaction support.
45bdcc2 to
d9484ea
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
src/CraneCtld/AccountManager.cpp (3)
390-403: Consider using RAII for mutex management.The manual
lock_shared()/unlock_shared()pattern could lead to issues if exceptions are thrown. Consider refactoring to use RAII or document why manual locking is necessary here.If the current design is intentional for returning a custom shared pointer wrapper, consider adding a comment explaining the design choice. Otherwise, consider using RAII-based locking similar to
GetExistedDefaultWckeyName.
1136-1165: TODO indicates incomplete multi-cluster support.The function correctly validates inputs and permissions. However, the TODO comment suggests multi-cluster support needs implementation.
Would you like me to create an issue to track the multi-cluster support implementation for Wckey functionality?
2394-2449: Robust default Wckey switching logic.The function correctly handles all cases of default Wckey updates with proper transactional consistency.
Consider adding an early return at line 2402 to avoid unnecessary operations when the Wckey is already the default:
if (m_wckey_map_[new_wckey_key] && m_wckey_map_[new_wckey_key]->is_def) { + CRANE_DEBUG("Wckey {} is already default for user {} in cluster {}", + new_def_wckey, user_name, cluster); return {}; }
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
protos/PublicDefs.proto(4 hunks)src/CraneCtld/AccountManager.cpp(11 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- protos/PublicDefs.proto
- src/CraneCtld/CraneCtld.cpp
🧰 Additional context used
🧠 Learnings (13)
📚 Learning: in cranesched's accountmetacontainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe...
Learnt from: huerni
PR: PKUHPC/CraneSched#499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: in the cranesched project, the `m_allowed_certs_` is an `std::unordered_set` which is n...
Learnt from: RileyWen
PR: PKUHPC/CraneSched#456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: in cranectld/cranedkeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guarant...
Learnt from: RileyWen
PR: PKUHPC/CraneSched#475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: the userhastask() method in accountmetacontainer does not attempt to acquire any locks, so there is ...
Learnt from: huerni
PR: PKUHPC/CraneSched#528
File: src/CraneCtld/AccountManager.cpp:190-192
Timestamp: 2025-06-18T07:13:23.464Z
Learning: The UserHasTask() method in AccountMetaContainer does not attempt to acquire any locks, so there is no deadlock risk when calling it while holding other locks like m_rw_user_mutex_ and m_rw_account_mutex_.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: in the cgroupmanager's getjobbpfmapcgroupsv2 method, the developer has confirmed that cg_ino_job_id_...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: in src/cranectld/cranedkeeper.h, the m_shutting_down_ member in cranedstub class is declared as std:...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: in the cranesched project, the `m_allowed_certs_` in `vaultclientwrapper` is implemented as a `phmap...
Learnt from: RileyWen
PR: PKUHPC/CraneSched#456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: in the cranectld codebase, the variables m_disconnected_ and m_registered_ in cranedstub class are a...
Learnt from: RileyWen
PR: PKUHPC/CraneSched#475
File: src/CraneCtld/CranedKeeper.cpp:601-602
Timestamp: 2025-05-02T07:05:26.012Z
Learning: In the CraneCtld codebase, the variables m_disconnected_ and m_registered_ in CranedStub class are already defined as std::atomic_bool, making them thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: the cranesched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps,...
Learnt from: huerni
PR: PKUHPC/CraneSched#499
File: src/CraneCtld/AccountMetaContainer.cpp:310-316
Timestamp: 2025-05-30T02:08:41.798Z
Learning: The CraneSched project uses thread-safe phmap (parallel hashmap) for user and account metadata maps, making individual container operations thread-safe without requiring explicit locking for simple operations like erase().
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: the configurecraned function in src/cranectld/cranedkeeper.cpp is called from a thread pool, so ther...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: in taskmanager.cpp, the security model relies on administrator-configured command templates in parse...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: in the cranesched codebase, src/craned/craned.cpp guarantees that g_config.cranedres.contains(g_conf...
Learnt from: huerni
PR: PKUHPC/CraneSched#537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: the `atomichashmap::operator[]` implementation in the cranesched project does not create elements wh...
Learnt from: L-Xiafeng
PR: PKUHPC/CraneSched#461
File: src/Craned/JobManager.cpp:136-149
Timestamp: 2025-05-06T08:09:07.755Z
Learning: The `AtomicHashMap::operator[]` implementation in the CraneSched project does not create elements when keys don't exist, unlike standard C++ map behavior. It returns a `ValueExclusivePtr` that will be empty if the key is not found.
Applied to files:
src/CraneCtld/AccountManager.cpp
🔇 Additional comments (8)
src/CraneCtld/AccountManager.cpp (8)
169-198: LGTM! Proper thread-safe implementation.The function correctly validates user existence under write lock after permission checks, preventing race conditions. The handling of stale (deleted) Wckey entries is appropriate.
273-302: Good validation logic for default Wckey protection.The function appropriately prevents deletion of default Wckeys, ensuring data consistency. The error code
ERR_DEL_DEFAULT_WCKEYclearly indicates why the deletion failed.
375-389: Clean implementation with proper RAII.The function correctly uses
util::read_lock_guardfor automatic lock management and provides clear error codes for different failure scenarios.
443-466: Appropriate permission model for Wckey queries.The function correctly restricts Wckey queries to operators and above, maintaining consistency with the security model. The filtering of deleted entries is properly implemented.
1767-1797: Proper initialization of Wckey data structures.The function correctly loads Wckey data alongside other entities, using an appropriate composite key structure for the map.
1836-1857: Consistent implementation of Wckey lookup methods.The helper methods follow the established pattern of having separate functions for retrieving all entries vs. only non-deleted entries.
1970-2047: Well-implemented transactional Wckey addition.The function correctly handles the complexity of default Wckey management within a transaction, ensuring atomicity when switching defaults. The update of both Wckey and User documents maintains data consistency.
2253-2275: Consistent soft-delete implementation.The function properly implements soft deletion using the established pattern, with appropriate use of FilterFields for composite key operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
protos/PublicDefs.proto(4 hunks)src/CraneCtld/AccountManager.cpp(11 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/CtldPublicDefs.h(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/CraneCtld/CraneCtld.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-30T02:09:41.625Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Applied to files:
src/CraneCtld/AccountManager.cpp
🧬 Code graph analysis (2)
src/CraneCtld/CtldPublicDefs.h (1)
src/CraneCtld/DbClient.h (2)
string(316-341)string(317-325)
src/CraneCtld/AccountManager.cpp (2)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
DeleteWckey(1177-1198)DeleteWckey(1177-1180)ModifyDefaultWckey(836-854)ModifyDefaultWckey(836-839)src/Craned/Core/CranedPublicDefs.h (1)
g_config(135-135)
🔇 Additional comments (18)
protos/PublicDefs.proto (4)
171-171: LGTM!The optional wckey field is correctly added to TaskToCtld with appropriate field numbering and type.
329-330: LGTM!The wckey fields are appropriately added to TaskInfo with sequential field numbering.
494-498: LGTM!The new error codes comprehensively cover wckey-related error scenarios with sequential numbering.
582-587: LGTM!The WckeyInfo message structure appropriately represents wckey data for gRPC communication.
src/CraneCtld/CtldPublicDefs.h (4)
169-170: LGTM!The configuration flags for wckey support are appropriately added with safe defaults maintaining backward compatibility.
388-388: LGTM!The wckey field is appropriately added to TaskInCtld in the submission-time fields section.
611-611: LGTM!The default_wckey_map appropriately supports per-cluster default wckey configuration for users.
617-629: LGTM!The Wckey struct is well-designed with appropriate fields and follows the established pattern for entity structs in the codebase.
src/CraneCtld/AccountManager.cpp (10)
169-197: LGTM!The AddUserWckey method correctly addresses the race condition from previous reviews by re-validating user existence under write lock.
273-301: LGTM!The DeleteWckey method correctly prevents deletion of the default wckey and follows proper locking patterns.
375-388: LGTM!The GetExistedDefaultWckeyName method correctly uses RAII locking, addressing the previous review concern.
390-402: LGTM!The GetExistedWckeyInfo method follows the established manual locking pattern consistent with other similar methods in this file (GetExistedUserInfo, GetExistedAccountInfo, etc.).
443-465: LGTM!The QueryAllWckeyInfo method follows the established pattern for query methods with appropriate privilege checks.
1136-1164: LGTM!The ModifyDefaultWckey method includes proper validation and locking. Note the TODO comment about multi-cluster support being a current limitation.
1767-1797: LGTM!The InitDataMap_ method is correctly updated to load wckey data from the database into the in-memory map.
1836-1856: LGTM!The wckey lookup methods correctly implement the deletion flag check, addressing the previous review concern.
1970-2046: LGTM!The AddWckey_ method correctly manages wckey creation and default wckey handling. Note that newly added wckeys always become the default for their cluster (line 1980), requiring ModifyDefaultWckey to change to a different default later.
2253-2274: LGTM!The DeleteWckey_ method correctly implements soft deletion consistent with other entity deletion methods in the codebase.
Nativu5
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rebase 一下吧。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (3)
protos/Crane.proto (2)
417-427: Missing modification fields in ModifyDefaultWckeyRequest.The request only contains identifier fields (uid, name, cluster, user_name) but lacks explicit fields indicating what modification to perform. While the RPC name suggests setting a default wckey, making this explicit in the message (e.g.,
bool set_as_default = 5;) would improve clarity and allow future extensibility.Based on past review comments.
363-373: Inconsistent deletion API compared to other entity types.The
DeleteWckeyRequestaccepts individual fields (name, cluster, user_name) whileDeleteAccountRequest,DeleteUserRequest, andDeleteQosRequestaccept repeated lists, allowing batch operations. This inconsistency makes the API harder to use and less efficient for bulk deletions.Based on past review comments.
src/CraneCtld/AccountManager.cpp (1)
2212-2283: Use ofoperator[]onm_wckey_map_withstd::unique_ptrrisks null dereferenceBoth
AddWckey_andSetUserDefaultWckey_rely onoperator[]onm_wckey_map_and then immediately dereference the storedstd::unique_ptr<Wckey>:
- In
AddWckey_:
m_wckey_map_[{old_def_wckey, res_wckey.cluster, res_wckey.user_name}]->is_def = false;- In
SetUserDefaultWckey_:
if (m_wckey_map_[new_wckey_key] && m_wckey_map_[new_wckey_key]->is_def) ...m_wckey_map_[new_wckey_key]->is_def = true;m_wckey_map_[{old_def_wckey, cluster, user_name}]->is_def = false;
operator[]will insert a new entry with a default‑constructed (null)unique_ptrwhen the key does not exist, and subsequent->is_defdereferences will crash. Even if invariants should guarantee presence, the pattern is fragile and was previously flagged.Recommend switching to
find-based access and checking both iterator andunique_ptrbefore dereferencing, e.g.:@@ SetUserDefaultWckey_ - WckeyKey new_wckey_key = {.name = new_def_wckey, - .cluster = cluster, - .user_name = user_name}; - if (m_wckey_map_[new_wckey_key] && m_wckey_map_[new_wckey_key]->is_def) { - return {}; - } + WckeyKey new_wckey_key{.name = new_def_wckey, + .cluster = cluster, + .user_name = user_name}; + auto it_new = m_wckey_map_.find(new_wckey_key); + if (it_new != m_wckey_map_.end() && it_new->second && + it_new->second->is_def) { + return {}; + } @@ - m_wckey_map_[new_wckey_key]->is_def = true; + it_new = m_wckey_map_.find(new_wckey_key); + if (it_new != m_wckey_map_.end() && it_new->second) { + it_new->second->is_def = true; + } m_user_map_[user_name]->default_wckey_map[cluster] = new_def_wckey; if (need_update_table) { - m_wckey_map_[{old_def_wckey, cluster, user_name}]->is_def = false; + WckeyKey old_key{.name = old_def_wckey, + .cluster = cluster, + .user_name = user_name}; + auto it_old = m_wckey_map_.find(old_key); + if (it_old != m_wckey_map_.end() && it_old->second) { + it_old->second->is_def = false; + } }Similarly in
AddWckey_:- if (update_wckey) - m_wckey_map_[{old_def_wckey, res_wckey.cluster, res_wckey.user_name}] - ->is_def = false; + if (update_wckey) { + WckeyKey old_key{.name = old_def_wckey, + .cluster = res_wckey.cluster, + .user_name = res_wckey.user_name}; + auto it_old = m_wckey_map_.find(old_key); + if (it_old != m_wckey_map_.end() && it_old->second) { + it_old->second->is_def = false; + } + }This removes the possibility of accidentally inserting null entries and then dereferencing them.
Also applies to: 2579-2633
🧹 Nitpick comments (6)
src/CraneCtld/CraneCtld.cpp (1)
644-664: Consider case-insensitive matching for AccountingStorageEnforce.Line 651 checks for an exact match of "wckeys". Users might reasonably enter "wckey" (singular) or mixed-case variants, which would silently fail to enable wckey enforcement. Consider normalizing to lowercase (already done for TrackWCKey on line 655) and accepting both singular and plural forms.
Apply this diff to improve robustness:
std::unordered_set<std::string> items; for (absl::string_view item : absl::StrSplit(item_list, ',')) { item = absl::StripAsciiWhitespace(item); - if (!item.empty()) items.emplace(item); + if (!item.empty()) items.emplace(absl::AsciiStrToLower(item)); } - if (items.contains("wckeys")) g_config.MustNeedWckey = true; + if (items.contains("wckey") || items.contains("wckeys")) + g_config.MustNeedWckey = true; }src/CraneCtld/DbClient.h (1)
244-310: Consider refactoring verbose error logging.Lines 287-306 manually construct a filter string for error logging. While functional, this is verbose and duplicates formatting logic. Consider extracting this to a helper function if similar logging is needed elsewhere, or simplifying to log the BSON filter directly.
Example simplification:
if (!result || !result->modified_count()) { - std::string filter_str = "{"; - for (size_t i = 0; i < filter_fields.size(); ++i) { - filter_str += std::visit( - [&](auto&& arg) -> std::string { - if constexpr (std::is_same_v<std::decay_t<decltype(arg)>, bool>) { - return std::format("{}:{}", filter_fields[i].first, - arg ? "true" : "false"); - } else { - return std::format("{}:{}", filter_fields[i].first, arg); - } - }, - filter_fields[i].second); - if (i != filter_fields.size() - 1) filter_str += ", "; - } - filter_str += "}"; - CRANE_ERROR( - "Update date in database fail(filter:{}, opt:{}, key:{}, value:{}, " - "{})", - filter_str, opt, key, value, result ? result->modified_count() : -1); + "Update in database failed: opt={}, key={}, value={}, modified={}", + opt, key, value, result ? result->modified_count() : -1); return false;src/CraneCtld/AccountManager.cpp (1)
389-402: Minor behavior considerations for default Wckey lookup and QueryAllWckeyInfo
GetExistedDefaultWckeyNameandGetExistedWckeyInfoare straightforward and correctly respect deletion flags; thread‑safety via read locks andWckeyMutexSharedPtris fine.QueryAllWckeyInfo:
- Only operators/admins can list Wckeys; non‑privileged users always get
ERR_PERMISSION_USER. This is stricter thanQueryAllUserInfo, which returns a scoped subset. If normal users are expected to introspect their own Wckeys, you might want to relax this.- When
wckey_listis non‑empty, filtering is bywckey->nameonly. If name collisions across clusters/users are possible, callers may get more entries than intended.If those semantics are deliberate, no changes are needed; otherwise consider:
- Allowing non‑admin users to query only their own Wckeys.
- Extending the filter API to accept
(name, cluster, user)triples, or documenting that Wckey names are globally unique.Also applies to: 404-416, 457-487
src/CraneCtld/DbClient.cpp (2)
370-378: FetchJobRecords: wckey extraction is robust to missing/legacy documentsThe new
wckeyhandling:
- Safely checks both existence and
k_utf8type before reading.- Falls back to empty string when absent or non‑UTF‑8, which is compatible with older job documents that don’t carry wckey.
If you like, you could avoid two lookups by binding the element once (
auto ele = view["wckey"];), but behavior is already correct.Also applies to: 460-464
620-633: Wckey CRUD integration is correct; consider matching exception‑handling style in SelectAllWckey
InsertWckey/UpdateWckeyuse the sameWckeyToDocument_schema asViewToWckey_, and filter on(name, cluster, user_name), which matches how AccountManager looks up Wckeys.- These methods are only invoked inside
CommitTransactioncallbacks in the current code, so letting Mongo exceptions bubble towith_transactionis acceptable.SelectAllWckeymirrors otherSelectAll*methods but lacks atry/catcharound thefind/cursor loop;SelectAllUser,SelectAllAccount, andSelectAllQosalready guard those calls.To keep behavior consistent and avoid an uncaught exception during initialization, consider wrapping
SelectAllWckeyin atry/catchlike the others and logging errors viaCRANE_LOGGER_ERROR.Also applies to: 732-740, 791-809
src/CraneCtld/AccountManager.h (1)
43-44: Wckey public APIs and mutex alias are coherent and match implementation
WckeyMutexSharedPtrmirrors the existingUserMutexSharedPtr/AccountMutexSharedPtrpattern, enabling safe shared‑lock access.- Public methods
AddUserWckey,DeleteWckey,QueryAllWckeyInfo,GetExistedWckeyInfo,GetExistedDefaultWckeyName, andModifyDefaultWckeyalign with the corresponding definitions inAccountManager.cpp(same parameter sets and semantics).- Grouping
ModifyDefaultWckeywith other “Modify*” methods keeps the API surface consistent.Only minor nit: if
QueryAllWckeyInfois expected to accept large filter sets, you might eventually switchwckey_listtoconst std::unordered_set<std::string>&to avoid copies, but current signature is acceptable.Also applies to: 57-58, 66-74, 95-100, 161-163
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (15)
etc/config.yaml(1 hunks)protos/Crane.proto(5 hunks)protos/PublicDefs.proto(4 hunks)scripts/wipe_data.sh(2 hunks)src/CraneCtld/AccountManager.cpp(11 hunks)src/CraneCtld/AccountManager.h(10 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/CtldPreCompiledHeader.h(1 hunks)src/CraneCtld/CtldPublicDefs.cpp(1 hunks)src/CraneCtld/CtldPublicDefs.h(4 hunks)src/CraneCtld/DbClient.cpp(16 hunks)src/CraneCtld/DbClient.h(7 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(4 hunks)src/CraneCtld/RpcService/CtldGrpcServer.h(4 hunks)src/CraneCtld/TaskScheduler.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- src/CraneCtld/CtldPublicDefs.cpp
- src/CraneCtld/RpcService/CtldGrpcServer.h
- src/CraneCtld/TaskScheduler.cpp
- protos/PublicDefs.proto
- etc/config.yaml
🧰 Additional context used
🧠 Learnings (9)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cppsrc/CraneCtld/AccountManager.h
📚 Learning: 2025-05-30T02:09:41.625Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In CraneSched, `phmap::parallel_flat_hash_set` from the Parallel Hashmap library (`parallel_hashmap/phmap.h`) is used for thread-safe containers. This container implements internal sharding with separate locks for different parts of the hash table, making it inherently thread-safe for concurrent operations without requiring external synchronization.
Applied to files:
src/CraneCtld/AccountManager.h
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: The `phmap::parallel_flat_hash_set` used in CraneSched's `VaultClientWrapper` is from the Parallel Hashmap library, which provides thread-safe containers for concurrent access. It uses internal partitioning to reduce contention, making it safe for use in multithreaded environments without external synchronization.
Applied to files:
src/CraneCtld/AccountManager.h
🧬 Code graph analysis (6)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(51-59)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
src/CraneCtld/AccountManager.cpp (4)
ModifyDefaultWckey(1181-1209)ModifyDefaultWckey(1181-1183)DeleteWckey(287-315)DeleteWckey(287-290)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
string(70-72)string(384-426)string(385-394)
src/CraneCtld/AccountManager.cpp (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
DeleteWckey(1612-1633)DeleteWckey(1612-1615)ModifyDefaultWckey(1235-1253)ModifyDefaultWckey(1235-1238)
src/CraneCtld/CraneCtld.cpp (1)
src/Craned/Core/CranedPublicDefs.h (1)
g_config(160-160)
src/CraneCtld/AccountManager.h (1)
src/CraneCtld/AccountManager.cpp (30)
AddUserWckey(179-207)AddUserWckey(179-180)DeleteUser(209-230)DeleteUser(209-211)DeleteAccount(232-259)DeleteAccount(232-233)DeleteQos(261-285)DeleteQos(261-262)DeleteWckey(287-315)DeleteWckey(287-290)QueryAllUserInfo(418-455)QueryAllUserInfo(418-419)QueryAllWckeyInfo(457-487)QueryAllWckeyInfo(457-458)GetExistedWckeyInfo(404-416)GetExistedWckeyInfo(404-406)GetExistedDefaultWckeyName(389-402)GetExistedDefaultWckeyName(389-390)ModifyDefaultWckey(1181-1209)ModifyDefaultWckey(1181-1183)GetExistedWckeyInfoNoLock_(1994-2000)GetExistedWckeyInfoNoLock_(1994-1996)GetWckeyInfoNoLock_(1983-1989)GetWckeyInfoNoLock_(1983-1985)AddWckey_(2212-2288)AddWckey_(2212-2214)DeleteWckey_(2425-2446)DeleteWckey_(2425-2427)SetUserDefaultWckey_(2579-2633)SetUserDefaultWckey_(2579-2581)
🔇 Additional comments (13)
scripts/wipe_data.sh (2)
3-3: ✓ Usage message accurately updated.The help text correctly reflects the expanded mode definitions, with wckey_table now included in modes 4 and 6.
53-56: ✓ wckey_table wiping correctly implemented.The wckey_table wipe is properly added alongside user_table, and the conditional logic (mode 4 || 5 || 6) ensures it is wiped:
- Mode 4: user_table + wckey_table
- Mode 5: all tables including wckey_table
- Mode 6: acct_table + qos_table + user_table + wckey_table
The placement with user_table is logical, and MongoDB's
deleteMany({})safely handles non-existent collections.src/CraneCtld/CtldPreCompiledHeader.h (1)
64-64: LGTM!The addition of
absl/hash/hash.happropriately enables Abseil hashing utilities for the new Wckey-related data structures introduced in this PR.src/CraneCtld/CtldPublicDefs.h (1)
199-200: LGTM!The Wckey-related additions to Config, TaskInCtld, User, and the new Wckey struct are well-structured and follow existing patterns in the codebase. The equality operator in Wckey correctly compares all relevant fields.
Also applies to: 710-710, 1019-1020, 1075-1087
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
1415-1447: LGTM!The
QueryWckeyInfoRPC correctly includes the authentication check and follows the established pattern of other query methods. The implementation properly delegates to AccountManager and populates the response fields.src/CraneCtld/DbClient.h (2)
224-228: LGTM!The explicit error handling for WCKEY in
UpdateEntityOneaddresses the past review comment correctly. This prevents silent failures and directs developers to useUpdateEntityOneByFieldsfor wckey updates.Based on past review comments.
104-104: LGTM!The Wckey-related additions to DbClient follow the established patterns for Account, User, and Qos entities. The EntityType enum extension, new public methods, and private helpers are well-structured and consistent with the existing codebase.
Also applies to: 131-131, 173-173, 313-313, 351-354, 385-385
src/CraneCtld/AccountManager.cpp (3)
179-207: AddUserWckey / ModifyDefaultWckey locking and privilege checks look consistent
- Privilege checks (
CheckIfUserHasHigherPrivThan_(..., User::None)) mirror AddQos/AddAccount patterns.- Re-validating the target user under
m_rw_user_mutex_write lock inAddUserWckeycloses the earlier race onuser_exist.- Write-lock acquisition order
m_rw_wckey_mutex_→m_rw_user_mutex_is consistent acrossAddUserWckey,DeleteWckey, andModifyDefaultWckey, avoiding deadlock with other Wckey paths.No functional or concurrency issues stand out here.
Also applies to: 1181-1209
287-315: Wckey delete flow correctly guards default keys and maintains in‑memory/DB consistency
DeleteWckeyrefuses to delete a Wckey if it is currently the user’s default for that cluster (ERR_DEL_DEFAULT_WCKEY), which matches the intended invariant.- User existence and wckey existence are revalidated under write locks before calling
DeleteWckey_.DeleteWckey_marks the Wckey as deleted both in MongoDB (viadeleted: true) and inm_wckey_map_, without touchingdefault_wckey_map(safe because default deletions are already blocked).This path looks correct and thread‑safe.
Also applies to: 2425-2446
1915-1940: InitDataMap_ and Wckey getters integrate cleanly with the new in‑memory map
InitDataMap_now acquiresm_rw_wckey_mutex_alongside other write locks and bulk‑loads all Wckeys intom_wckey_map_keyed by(name, cluster, user_name), matching the lookup key.GetWckeyInfoNoLock_/GetExistedWckeyInfoNoLock_usefind+deletedflag consistently, aligned with how users/accounts/QoS are handled.This initialization and lookup path looks correct and consistent with the rest of AccountManager.
Also applies to: 1980-2000
src/CraneCtld/DbClient.cpp (2)
1258-1268: User.default_wckey_map BSON mapping is symmetric and backward‑compatible
DocumentAppendItem_<std::unordered_map<std::string,std::string>>serializes the map as a nested document{ key: value, ... }, which is simple and efficient.UserToDocument_addsdefault_wckey_mapas a top‑level field, andViewToUser_:
- Clears the in‑memory map first.
- Checks that the BSON element exists and is a document.
- Populates only
k_utf8entries, which avoids crashes on malformed documents.- For legacy user documents lacking
default_wckey_map, the code leaves the field empty, so old data remains valid.This end‑to‑end mapping looks solid.
Also applies to: 1274-1298, 962-972
1300-1325: Wckey entity and task document schema are consistent across DB and RPC
WckeyToDocument_/ViewToWckey_round‑trip all essential fields:deleted,name,cluster,uid,user_name,is_def, matching theCtld::Wckeystruct used by AccountManager.TaskInEmbeddedDbToDocument_andTaskInCtldToDocument_:
- Extend the fields array to include
"wckey"as the final field.- Add the corresponding tuple element (
task_to_ctld.wckey()/task->wckey) with typestd::string.- Keep field names aligned with
FetchJobRecords, which now readsview["wckey"].This keeps the task schema consistent between embedded‑DB and live Ctld representations.
Also applies to: 1666-1702, 1770-1806
src/CraneCtld/AccountManager.h (1)
212-218: WckeyKey hashing, helpers, and storage integrate cleanly into AccountManager
- Private getters
GetWckeyInfoNoLock_/GetExistedWckeyInfoNoLock_and helpersAddWckey_,DeleteWckey_,SetUserDefaultWckey_are neatly declared here and match the cpp implementations.struct WckeyKeyencapsulates the composite key (name,cluster,user_name) with bothoperator==andAbslHashValue, which keeps equality and hashing consistent.struct WcKeyHashwraps the hash functor cleanly, andm_wckey_map_uses it together withutil::rw_mutex m_rw_wckey_mutex_, mirroring the patterns already used for users/accounts/QoS.The type and member design looks sound and consistent with the rest of the class.
Also applies to: 322-324, 338-341, 360-362, 446-465, 467-476
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (1)
protos/PublicDefs.proto (1)
473-474: Clarify the semantics of thewckey_validfield.The
wckey_validboolean field is added toTaskInfo, but its meaning is not documented. Please clarify:
- Does it indicate whether the
wckeyfield has been populated?- Does it mean the wckey was validated against the database?
- Does it indicate whether Wckey validation is enabled for this task?
- Should it reflect the
g_config.MustNeedWckeyconfiguration?Additionally, note that
TaskToCtldonly hasoptional string wckeywhileTaskInfohas bothbool wckey_validandstring wckey(not optional). This asymmetry suggests different use cases—consider adding a comment explaining the distinction.Consider adding a comment like:
+ // wckey_valid indicates whether wckey validation was enabled when the task was submitted bool wckey_valid = 41; + // wckey contains the workload key assigned to this task (may be empty if wckey_valid is false) string wckey = 42;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
protos/PublicDefs.proto(4 hunks)src/CraneCtld/AccountManager.cpp(11 hunks)src/CraneCtld/CtldPublicDefs.h(4 hunks)src/CraneCtld/DbClient.cpp(16 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(4 hunks)src/CraneCtld/TaskScheduler.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/CraneCtld/CtldPublicDefs.h
- src/CraneCtld/RpcService/CtldGrpcServer.cpp
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-30T02:09:41.625Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.
Applied to files:
src/CraneCtld/AccountManager.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.
Applied to files:
src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().
Applied to files:
src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/TaskScheduler.cpp
🧬 Code graph analysis (2)
src/CraneCtld/AccountManager.cpp (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
DeleteWckey(1615-1637)DeleteWckey(1615-1618)ModifyDefaultWckey(1236-1256)ModifyDefaultWckey(1236-1239)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
string(70-72)string(384-426)string(385-394)
🔇 Additional comments (4)
protos/PublicDefs.proto (2)
184-184: LGTM: Wckey field addition to TaskToCtld.The addition of the optional
wckeyfield follows protobuf3 conventions and is appropriately placed with other task submission parameters.
788-793: LGTM: WckeyInfo message structure is well-defined.The
WckeyInfomessage provides a clear structure for Wckey information with appropriate fields for name, cluster, user association, and default status. The field types and naming are consistent with the rest of the protobuf definitions.src/CraneCtld/AccountManager.cpp (1)
179-207: LGTM! Wckey management implementation looks solid.The Wckey integration in AccountManager follows established patterns and addresses all previously identified issues:
- Proper RAII lock usage throughout
- Race conditions eliminated by re-validating under write locks
- Consistent locking order (wckey → user when both needed)
- Deleted flag checks in place
- Safe map access using find() instead of operator[]
The implementation provides complete CRUD operations with appropriate permission checks and transactional database updates.
Also applies to: 287-315, 389-402, 404-416, 457-487, 1181-1209, 1915-1915, 1983-2000, 2212-2288, 2425-2446, 2579-2634
src/CraneCtld/DbClient.cpp (1)
460-464: LGTM! Wckey serialization and database schema updates are well-implemented.The remaining Wckey changes are solid:
- Safe type checking before accessing wckey field in FetchJobRecords
- Proper serialization/deserialization for default_wckey_map in User documents
- Clean integration of wckey field into task documents
- ViewToWckey_ includes appropriate exception handling
Also applies to: 962-972, 1258-1268, 1276-1298, 1300-1324, 1670-1741, 1774-1833
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/CraneCtld/DbClient.cpp (2)
1675-1750: Minor comment inconsistency.At lines 1696-1697, the comment says
// 40 - 44followed by just"wckey", but there's only one field at position 40. The comment should say// 40instead of// 40 - 44for clarity.Apply this diff to fix the comment:
"mem_alloc", "device_map", "meta_container", "has_job_info", "licenses_alloc", - // 40 - 44 + // 40 "wckey"
1779-1842: Minor comment inconsistency.At lines 1800-1801, the comment says
// 40 - 44followed by just"wckey", but there's only one field at position 40. The comment should say// 40instead of// 40 - 44for clarity.Apply this diff to fix the comment:
"mem_alloc", "device_map", "meta_container", "has_job_info", "licenses_alloc", - // 40 - 44 + // 40 "wckey"Additionally, line 1814 has a similar issue in the tuple comment:
int64_t, DeviceMap, std::optional<ContainerMetaInTask>, /*35-37*/ bool, std::unordered_map<std::string, uint32_t>, /*38-39*/ - std::string> /*40-44*/ + std::string> /*40*/
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
protos/PublicDefs.proto(4 hunks)src/CraneCtld/CtldPublicDefs.h(4 hunks)src/CraneCtld/DbClient.cpp(15 hunks)src/Utilities/PublicHeader/include/crane/PublicHeader.h(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/CraneCtld/CtldPublicDefs.h (1)
src/CraneCtld/DbClient.h (3)
string(70-72)string(384-426)string(385-394)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
string(70-72)string(384-426)string(385-394)
🔇 Additional comments (16)
src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)
262-262: LGTM! Good practice to add the trailing comma.Adding a trailing comma to the last array element makes future diffs cleaner.
src/CraneCtld/CtldPublicDefs.h (4)
199-200: LGTM! Config flags added correctly.The new boolean flags
MustNeedWckeyandWckeyValidprovide configuration control for wckey validation and requirement enforcement, with sensible defaults.
710-710: LGTM! Field addition is correct.The
wckeyfield inTaskInCtldaligns with the wckey support being added across the system.
1019-1020: LGTM! Field addition is correct.The
default_wckey_mapfield properly tracks default wckey per cluster for each user.
1075-1087: LGTM! Wckey struct is well-defined.The
Wckeystruct includes all necessary fields and provides an equality operator for comparison.protos/PublicDefs.proto (4)
184-184: LGTM! Proto field addition is correct.The
wckeyfield is properly defined as optional with the correct field number.
473-474: LGTM! Proto fields added correctly.The
wckey_validandwckeyfields enable both validation status and value tracking in TaskInfo.
788-793: LGTM! WckeyInfo message is well-defined.The message structure aligns with the C++
Wckeystruct and includes all necessary fields.
700-704: LGTM! Error codes added correctly.The five new wckey-related error codes are sequentially numbered and cover the expected error scenarios. The corresponding error string array syntax issue was already addressed in commit 03a3ba5.
src/CraneCtld/DbClient.cpp (7)
620-633: LGTM! Exception handling is present.The
InsertWckeymethod includes proper exception handling as addressed in commit 03a3ba5.
681-683: LGTM! Break statement is present.The
WCKEYcase includes the requiredbreakstatement as addressed in commits 172421f to 45005cc.
733-745: LGTM! Exception handling is present.The
SelectAllWckeymethod includes proper exception handling as addressed in commit 03a3ba5.
796-818: LGTM! Exception handling is present.The
UpdateWckeymethod includes proper exception handling as addressed in commit 03a3ba5.
971-981: LGTM! Template specialization is correct.The
DocumentAppendItem_template forstd::unordered_map<std::string, std::string>properly serializes the map to BSON.
1267-1306: LGTM! User serialization/deserialization updated correctly.The
ViewToUser_andUserToDocument_methods properly handle the newdefault_wckey_mapfield with appropriate type checking and defensive programming.
1309-1333: LGTM! Wckey serialization/deserialization is correct.The
WckeyToDocument_andViewToWckey_methods properly serialize and deserialize allWckeyfields with appropriate type conversions and exception handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)
262-269: Capitalize “Crane” consistently in new error string.Line 266 currently says
"The wckey already exists in crane". Other messages (e.g., line 182) capitalize the product name as “Crane”, so this should be adjusted for consistency and polish.- "The wckey already exists in crane", + "The wckey already exists in Crane",src/CraneCtld/DbClient.cpp (1)
452-464: Setwckey_validalongsidewckeywhen fetching jobs.Here you populate
TaskInfo.wckeyfrom the Mongo document, butTaskInfo.wckey_validis never set, so it stays at the defaultfalseeven when a valid wckey string exists. This makes it impossible for callers to distinguish “no wckey stored” from “wckey present and OK”.Recommend updating this block to drive both fields:
- if (view["wckey"] && view["wckey"].type() == bsoncxx::type::k_utf8) { - task->set_wckey(view["wckey"].get_string().value.data()); - } else { - task->set_wckey(""); - } + auto wckey_elem = view["wckey"]; + if (wckey_elem && wckey_elem.type() == bsoncxx::type::k_utf8) { + task->set_wckey_valid(true); + task->set_wckey(wckey_elem.get_string().value.data()); + } else { + task->set_wckey_valid(false); + task->set_wckey(""); + }This keeps behavior correct for legacy documents (no
wckeyfield) while making the new proto fields meaningful.
🧹 Nitpick comments (2)
src/CraneCtld/TaskScheduler.cpp (1)
3704-3731: Wckey acquisition flow looks solid; consider using const TaskToCtld accessor.The validation/marking logic for explicit vs default wckey (including the "*" convention and ignoring non‑essential default lookup errors) is consistent and avoids the extra DB round‑trip from the earlier version. The only small polish would be to prefer
task->TaskToCtld().has_wckey()/.wckey()overMutableTaskToCtld()here since these calls are read‑only.protos/PublicDefs.proto (1)
184-184: Proto schema changes for wckey are backward‑compatible and coherent.Appending
TaskToCtld.wckey,TaskInfo.wckey_valid/wckey, the wckey error codes (84–88), andWckeyInfoat the tail of their enums/messages preserves wire compatibility and lines up with the Ctld/DB changes. If you expect downstream consumers to distinguish “no wckey recorded” from “wckey string present but invalid”, a one‑line comment on thewckey_valid/wckeypair inTaskInfowould help future readers.Also applies to: 473-475, 700-704, 788-793
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
protos/PublicDefs.proto(4 hunks)src/CraneCtld/CtldPublicDefs.h(4 hunks)src/CraneCtld/DbClient.cpp(16 hunks)src/CraneCtld/TaskScheduler.cpp(1 hunks)src/Utilities/PublicHeader/include/crane/PublicHeader.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/CraneCtld/CtldPublicDefs.h
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.
Applied to files:
src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().
Applied to files:
src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/TaskScheduler.cpp
🧬 Code graph analysis (1)
src/CraneCtld/TaskScheduler.cpp (1)
src/Craned/Core/CranedPublicDefs.h (1)
g_config(160-160)
🔇 Additional comments (2)
src/CraneCtld/DbClient.cpp (2)
620-633: Wckey MongoDB CRUD and (de)serialization look consistent with existing patterns.
InsertWckey,SelectAllWckey,UpdateWckey,DeleteEntity(WCKEY), and theWckeyToDocument_/ViewToWckey_helpers mirror the established User/Account/Qos patterns (same collection naming, key fields, and exception handling). Filters on{name, cluster, user_name}for updates and deletions match the logical key for a wckey, and the schema (deleted,name,cluster,uid,user_name,is_default) aligns with the Ctld side.No functional issues spotted here.
Also applies to: 681-683, 733-745, 796-818, 1309-1333
971-981: Userdefault_wckey_mapand taskwckeypersistence are wired correctly.
- The new
DocumentAppendItem_<std::unordered_map<std::string, std::string>>specialization plus the extendedUserToDocument_/ViewToUser_handledefault_wckey_mapcleanly and tolerate old documents where the field is absent.- Both
TaskInEmbeddedDbToDocument_andTaskInCtldToDocument_now include awckeyfield in the job document layout, and comments at the top of each function have been updated to reflect the extra slot. This keeps the Mongo schema and the historical layout documentation in sync.Overall, the new serialized fields and their read/write paths look coherent with the rest of the client.
Also applies to: 1267-1277, 1285-1305, 1675-1711, 1780-1784, 1813-1815, 1842-1844
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/CraneCtld/DbClient.cpp (1)
460-464: Missing wckey_valid field assignment.The code sets
wckeybut does not setwckey_valid, leaving TaskInfo incomplete. Based on the proto definition, both fields must be set to properly indicate wckey presence.Apply this diff:
if (view["wckey"] && view["wckey"].type() == bsoncxx::type::k_utf8) { task->set_wckey(view["wckey"].get_string().value.data()); + task->set_wckey_valid(true); } else { task->set_wckey(""); + task->set_wckey_valid(false); }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
etc/config.yaml(1 hunks)protos/PublicDefs.proto(5 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/CtldPublicDefs.cpp(1 hunks)src/CraneCtld/CtldPublicDefs.h(4 hunks)src/CraneCtld/DbClient.cpp(16 hunks)src/CraneCtld/TaskScheduler.cpp(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/CraneCtld/CraneCtld.cpp
- src/CraneCtld/CtldPublicDefs.cpp
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.
Applied to files:
src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().
Applied to files:
src/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/TaskScheduler.cpp
🧬 Code graph analysis (2)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
string(70-72)string(384-426)string(385-394)
src/CraneCtld/TaskScheduler.cpp (3)
src/Craned/Core/CranedPublicDefs.h (1)
g_config(160-160)src/CraneCtld/CtldPublicDefs.cpp (1)
result(66-119)src/Utilities/PublicHeader/PublicHeader.cpp (1)
result(772-772)
🔇 Additional comments (18)
etc/config.yaml (1)
151-152: Clarify intent of commented Wckey configuration.These lines add a commented example for
AccountingStorageEnforce: wckeys, but a past reviewer (Nativu5) questioned whether this option has meaning in your system. Since these remain commented out and are non-functional, please clarify:
- Is this intended as documentation for future optional use?
- Or should these lines be removed if the option is not applicable to your system?
If this is meant to be example/reference documentation, consider adding an inline comment explaining when or why an operator might uncomment these lines.
src/CraneCtld/DbClient.cpp (7)
620-633: LGTM!InsertWckey follows the same pattern as InsertUser and InsertAccount, with proper exception handling and consistent document construction.
681-683: LGTM!The WCKEY case now includes a break statement, preventing unintended fall-through.
733-745: LGTM!SelectAllWckey now includes proper exception handling, consistent with SelectAllUser and SelectAllAccount.
796-818: LGTM!UpdateWckey now includes proper exception handling and correctly filters by the compound key (name, cluster, user_name).
971-981: LGTM!Template specialization correctly serializes
unordered_map<string, string>to BSON subdocument for the User's default_wckey_map field.
1267-1277: LGTM!User serialization/deserialization correctly handles the new default_wckey_map field with proper type checking and symmetric BSON mapping.
Also applies to: 1285-1306
1309-1333: LGTM!WckeyToDocument_ and ViewToWckey_ provide symmetric serialization/deserialization with proper type mapping and exception handling.
src/CraneCtld/TaskScheduler.cpp (1)
3704-3731: Well-structured Wckey handling implementation.The refactored logic correctly:
- Validates the provided wckey before any default lookup
- Only fetches the default wckey when needed for marking purposes (using the if-init pattern)
- Ignores
GetExistedDefaultWckeyNameerrors when the user's wckey was already validated (avoiding misleading error propagation)- Documents the "*" prefix convention via inline comments
The conditional structure cleanly separates the "wckey provided" vs "use default" paths.
src/CraneCtld/CtldPublicDefs.h (4)
199-199: LGTM!The
MustNeedWckeyconfig flag with a safe default offalseallows for backward-compatible opt-in of wckey enforcement.
709-709: LGTM!The
wckeyfield addition toTaskInCtldaligns with the protoTaskInfo.wckeyfield and enables wckey tracking throughout task lifecycle.
1018-1019: LGTM!The
default_wckey_mapkeyed by cluster name appropriately supports multi-cluster environments where users may have different default wckeys per cluster.
1074-1086: Verify intent ofoperator==forWckeycomparisons before removingis_default.The concern about inconsistent treatment of mutable state fields is valid: both
deletedandis_defaultare mutable fields, yet onlyis_defaultis included in the equality operator. However, the appropriate fix depends on the intended semantics:
- If
operator==should compare only identity (wckey name/cluster/user identity), remove bothis_defaultanddeleted.- If
operator==should compare complete state, include both fields for consistency.- Check how
Wckeyobjects are used: in collections as keys, in comparisons for updates, or other contexts that clarify whether identity or value semantics are needed.Without understanding the actual usage patterns and design intent in the codebase, removing
is_defaultwithout also addressingdeletedrisks creating subtle equality semantics bugs.protos/PublicDefs.proto (5)
184-184: LGTM!Using
optionalforwckeyinTaskToCtldcorrectly distinguishes between "not provided" (use default) and "explicitly set" cases at submission time.
473-473: LGTM!The
wckeyfield inTaskInfois non-optional since the task always has a resolved wckey value (either user-provided or default) by the time it's reported.
699-703: LGTM!The new wckey-related error codes follow the existing naming convention and are assigned sequential values (84-88) that don't conflict with existing codes.
732-733: LGTM!The
DefaultWckey = 40enum value inModifyFieldprovides a clear identifier for wckey modification operations, following the established pattern of grouping related fields with appropriate spacing (30s for qos, 40 for wckey).
789-794: LGTM!The
WckeyInfomessage properly exposes the public-facing wckey attributes needed for gRPC communication, appropriately excluding internal fields likeuidanddeletedthat are handled server-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/en/command/cacctmgr.md(1 hunks)docs/zh/command/cacctmgr.md(1 hunks)
🧰 Additional context used
🪛 LanguageTool
docs/en/command/cacctmgr.md
[grammar] ~527-~527: Ensure spelling is correct
Context: ...action where Target=PKU ``` --- ## 5. Wckey Management ### 5.1 Add Wckey **Syntax...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
添加





删除
修改默认default-wckey
提交作业带参数
提交作业不带wckey,采用默认wckey
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.