Skip to content

Conversation

@edragain2nd
Copy link
Collaborator

@edragain2nd edragain2nd commented Nov 20, 2025

fixes #700

Summary by CodeRabbit

  • New Features

    • Batch modification: Submit multiple changes to accounts, users, and QoS in a single request.
    • Per-operation results: Responses include detailed per-change error entries for partial success visibility.
  • Improvements

    • Consistent transactional updates with better rollback/commit semantics across batched changes.
    • Improved handling of the force flag for batch operations.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Nov 20, 2025

📝 Walkthrough

Walkthrough

Refactors account, user, and QoS modify RPCs to accept batched ModifyFieldOperation vectors. Handlers and AccountManager now validate and apply multiple operations transactionally, aggregating per-operation rich errors in replies instead of single error codes. New helper and signature changes support batch semantics.

Changes

Cohort / File(s) Summary
Protocol Definition
protos/Crane.proto
Added ModifyFieldOperation message (modify_field, value_list, type). Replaced single-field/value fields in ModifyAccountRequest, ModifyUserRequest, ModifyQosRequest with repeated ModifyFieldOperation operations. ModifyQosReply now returns repeated RichError rich_error_list.
gRPC Service Handler
src/CraneCtld/RpcService/CtldGrpcServer.cpp
ModifyAccount, ModifyUser, ModifyQos refactored to iterate request->operations(), call batch AccountManager methods, and aggregate per-operation RichErrors into reply rich_error_list (removed early-return single-error flows).
Account Manager Implementation
src/CraneCtld/AccountManager.cpp
Added batch validation/apply methods (CheckModifyAccountOperations, ModifyAccount, CheckModifyUserOperations, ModifyUser, ModifyQos) returning std::vector<CraneExpectedRich<void>>. Converted many internal helpers to transaction-aware, batch-capable variants and changed error propagation to per-item rich results.
Account Manager Header
src/CraneCtld/AccountManager.h
Public API reshaped: ModifyAccount, ModifyUser, ModifyQos now accept vectors of ModifyFieldOperation and return vectors of CraneExpectedRich<void>. Many helper signatures updated to accept/return rich/batch types and to propagate results via referenced in-memory objects.
Public Definitions
src/CraneCtld/CtldPublicDefs.h
Added Qos::GetModifyFieldStr(const crane::grpc::ModifyField&) utility to map ModifyField enums to human-readable field name strings.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant GRPC as gRPC Handler
    participant AM as AccountManager
    participant DB as Database

    Client->>GRPC: ModifyXRequest (operations[])
    note over GRPC: Build operations vector\nand invoke batch method
    GRPC->>AM: ModifyX(uid, name, operations[], force?)
    rect rgb(240,248,255)
        note over AM: Validate operations\nCheckModifyXOperations(...)
        AM->>DB: Read current state
        DB-->>AM: Account/User/QoS data
    end
    rect rgb(240,255,240)
        note over AM: Apply operations transactionally\nAdd txn logs, update maps
        AM->>DB: Apply transaction/commit
        DB-->>AM: Success / per-item errors
    end
    alt some operations failed
        AM-->>GRPC: vector<CraneExpectedRich<void>> (rich errors)
    else all succeeded
        AM-->>GRPC: vector of ok results
    end
    GRPC->>Client: Reply (rich_error_list[])
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Suggested labels

bug

Suggested reviewers

  • L-Xiafeng
  • NamelessOIer

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 11.63% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: duplicate modification message output' accurately summarizes the main objective of consolidating modification operations into fewer output messages.
Linked Issues check ✅ Passed The code refactoring consolidates multiple per-field modify operations into batch operations with unified error handling, directly addressing the cacctmgr output clarity issue.
Out of Scope Changes check ✅ Passed All changes in proto, RPC service, AccountManager, and related files are directly focused on implementing batch operation handling to reduce duplicate output messages.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e2bbcb0 and 2150b24.

📒 Files selected for processing (2)
  • protos/Crane.proto (3 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.

Applied to files:

  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.

Applied to files:

  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
🔇 Additional comments (3)
protos/Crane.proto (2)

344-348: Reusable ModifyFieldOperation message looks structurally sound

The shared ModifyFieldOperation wrapper cleanly models (field, values, op-type) and matches how the server iterates request->operations(); this should make extending modify RPCs easier and avoids ad‑hoc per‑RPC fields.


352-355: Confirm wire‑compatibility implications of field reindexing in modify RPCs

ModifyAccountRequest, ModifyUserRequest, and ModifyQosRequest now use operations plus reindexed fields (e.g., name = 2, operations = 3, etc.). Reusing old tag numbers for different semantics is not backward compatible if any older binaries (ctld, frontends, plugins) may still be running.

Please ensure:

  • All components that send/receive these messages are upgraded together (no mixed versions), and
  • Any external tooling depending on the old field layout is updated.

If version skew is possible, consider adding new fields without reusing old tag numbers, or gate behavior via versioning.

Also applies to: 364-368, 378-379

src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)

977-1026: Account multi‑operation handling and error aggregation look correct

The new ModifyAccount flow iterates request->operations() and:

  • Uses SetAccountAllowedPartition / SetAccountAllowedQos for Overwrite semantics (full set updates), and
  • Falls back to per‑value ModifyAccount for other operations.

This matches the new proto shape and should avoid duplicated error entries for bulk partition/QoS overwrites while preserving per‑value reporting for other fields.

@L-Xiafeng
Copy link
Collaborator

会有部分成功的可能,需要确保一条修改语句是原子的

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (4)
src/CraneCtld/AccountManager.cpp (3)

843-845: Stray semicolon creates a no-op statement.

Line 845 contains only a semicolon which is a no-op statement. This appears to be leftover from editing.

    } else if (operation.type() == crane::grpc::OperationType::Overwrite &&
               operation.modify_field() == crane::grpc::ModifyField::Qos) {
-      ;
-
       std::list<std::string> deleted_qos;

1130-1133: Redundant User copy objects created for transaction workflow.

Multiple User copies are created (partition_user, qos_user, default_qos_user) to track changes before committing. This pattern adds complexity and memory overhead. Additionally, if operations of different types occur in the same batch (e.g., both Partition and Qos modifications), only one of these copies will be used to update the final map state, potentially losing changes from others.

Consider consolidating to a single res_user object that accumulates all modifications.


3131-3134: Minor: Comment has unusual line wrapping.

The @note comment has been reformatted with a line break mid-sentence. This is a minor formatting issue.

 /**
- * @note need write lock(m_rw_account_mutex_) and write *
- * lock(m_rw_user_mutex_)
+ * @note need write lock(m_rw_account_mutex_) and write lock(m_rw_user_mutex_)
  */
src/CraneCtld/AccountManager.h (1)

95-98: ModifyUser takes operations by non-const reference.

The operations parameter is passed by non-const reference (std::vector<crane::grpc::ModifyFieldOperation>& operations), while ModifyAccount and ModifyQos take it by const reference. This inconsistency suggests ModifyUser might modify the operations vector, which could be unexpected for callers.

If the vector isn't modified, change to const reference for consistency:

   std::vector<CraneExpectedRich<void>> ModifyUser(
       uint32_t uid, const std::string& name, const std::string& account,
       const std::string& partition,
-      std::vector<crane::grpc::ModifyFieldOperation>& operations, bool force);
+      const std::vector<crane::grpc::ModifyFieldOperation>& operations, bool force);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2150b24 and a9548e7.

📒 Files selected for processing (7)
  • protos/Crane.proto (3 hunks)
  • src/CraneCtld/AccountManager.cpp (15 hunks)
  • src/CraneCtld/AccountManager.h (5 hunks)
  • src/CraneCtld/CtldPublicDefs.h (1 hunks)
  • src/CraneCtld/DbClient.cpp (1 hunks)
  • src/CraneCtld/DbClient.h (1 hunks)
  • src/CraneCtld/RpcService/CtldGrpcServer.cpp (4 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-08T07:45:15.739Z
Learnt from: MCKZX-llx
Repo: PKUHPC/CraneSched PR: 513
File: src/Utilities/EmbeddedDb/EmbeddedDb.cpp:411-411
Timestamp: 2025-06-08T07:45:15.739Z
Learning: In the CraneSched project's Berkeley DB implementation in src/Utilities/EmbeddedDb/EmbeddedDb.cpp, explicit deletion of DbTxn objects after commit() or abort() is not required, contrary to general Berkeley DB documentation. The project maintainer confirmed that the memory management is handled automatically in their specific implementation.

Applied to files:

  • src/CraneCtld/DbClient.h
  • src/CraneCtld/DbClient.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.

Applied to files:

  • src/CraneCtld/RpcService/CtldGrpcServer.cpp
🧬 Code graph analysis (2)
src/CraneCtld/DbClient.h (1)
src/CraneCtld/DbClient.cpp (6)
  • StartTransaction (869-881)
  • StartTransaction (869-869)
  • CommitTransactionWithoutCallback (883-891)
  • CommitTransactionWithoutCallback (883-883)
  • AbortTransaction (893-901)
  • AbortTransaction (893-893)
src/CraneCtld/AccountManager.cpp (3)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (6)
  • ModifyAccount (967-995)
  • ModifyAccount (967-970)
  • ModifyUser (997-1027)
  • ModifyUser (997-999)
  • ModifyQos (1029-1057)
  • ModifyQos (1029-1031)
src/CraneCtld/CtldPublicDefs.h (3)
  • CheckIfTimeLimitSecIsValid (1078-1080)
  • AdminLevelToString (1062-1075)
  • string (949-965)
src/Utilities/PublicHeader/String.cpp (2)
  • ConvertStringToInt64 (364-368)
  • ConvertStringToInt64 (364-364)
🔇 Additional comments (15)
protos/Crane.proto (2)

344-348: LGTM! Well-structured operation message.

The new ModifyFieldOperation message cleanly encapsulates field modification parameters, enabling batch processing of multiple operations in a single RPC call.


350-385: I'm encountering persistent issues cloning the repository, which prevents me from verifying the actual field number changes through git history or examining the codebase for compatibility mechanisms.

However, I can provide context on the concern raised. Let me search for protobuf best practices regarding field reordering:

protobuf field number reordering wire compatibility backward compatibility

src/CraneCtld/DbClient.h (1)

247-251: LGTM! Transaction API additions are well-designed.

The three new transaction methods (StartTransaction, CommitTransactionWithoutCallback, AbortTransaction) complement the existing callback-based CommitTransaction and enable explicit transaction control for the new batch modification workflows. The bool return type provides clear success/failure indication.

src/CraneCtld/DbClient.cpp (1)

869-901: LGTM! Transaction method implementations are robust.

All three transaction methods (StartTransaction, CommitTransactionWithoutCallback, AbortTransaction) follow consistent patterns:

  • Proper try-catch for mongocxx::exception
  • Clear error logging
  • Transaction options align with existing CommitTransaction method
  • Return bool for explicit success/failure indication

Based on learnings, memory management for MongoDB session objects is handled automatically in this implementation.

src/CraneCtld/RpcService/CtldGrpcServer.cpp (3)

977-994: LGTM! Clean refactoring to batch operation model.

The ModifyAccount handler now properly delegates operation processing to AccountManager::ModifyAccount(), which processes multiple operations atomically and returns aggregated errors. This addresses previous concerns about validation by centralizing it in the AccountManager layer.

Note: Operations are copied from the protobuf repeated field into a std::vector. While this incurs a copy overhead, it's acceptable for the expected usage patterns.


1007-1027: LGTM! Consistent with ModifyAccount pattern.

The ModifyUser handler follows the same clean pattern as ModifyAccount: delegates to AccountManager for batch processing and aggregates errors into rich_error_list. The consistency across handlers improves maintainability.


1038-1056: LGTM! Completes the consistent batch operation refactoring.

The ModifyQos handler completes the refactoring trilogy with the same pattern: batch operation processing via AccountManager with aggregated error reporting. The three handlers now share a consistent, maintainable design.

Previous review concerns about empty value_list access are resolved—validation is now centralized in the AccountManager layer where it can be properly tested and maintained.

src/CraneCtld/CtldPublicDefs.h (1)

949-965: Yes, please provide the exact proto enum definition for ModifyField from the current master version. This will help me verify whether all enum values are covered in the GetModifyFieldStr function.

src/CraneCtld/AccountManager.cpp (3)

529-654: Validation function design looks correct for batch operation checking.

The CheckModifyAccountOperations function properly validates all operations before any modifications occur, supporting the atomic batch semantics mentioned in PR comments. The function correctly handles both bulk overwrite operations (Partition/Qos with value lists) and individual Add/Delete operations.


2543-2566: Updated AddUserAllowedPartition_ correctly handles batch partitions.

The function now accepts a set of partitions and iterates through them to update the user's allowed partition QoS map. The DB update and transaction logging are performed correctly.


708-826: I'm unable to complete the verification due to persistent repository cloning failures in the sandbox environment. However, let me provide the rewritten review comment based on the available context:

Verify transaction rollback semantics: ensure helper functions do not modify in-memory state before commit.

The code starts a transaction at line 710 and calls helper functions like SetAccountAllowedPartition_, AddAccountAllowedQos_, etc. If these functions perform in-memory modifications to maps (m_account_map_, m_user_map_, m_qos_map_) before the transaction commits, a rollback would leave the database consistent but the in-memory state inconsistent. Verify that all helper functions called within this transaction block (lines 711-817) defer in-memory updates until after CommitTransactionWithoutCallback() succeeds, consistent with the pattern of a separate memory-update phase.

src/CraneCtld/AccountManager.h (4)

90-103: Public API changes for batch operations are well-structured.

The new signatures for ModifyAccount, ModifyUser, and ModifyQos correctly return std::vector<CraneExpectedRich<void>> to support batch error reporting. The use of ModifyFieldOperation from the proto aligns with the transactional batch processing approach.


161-169: New validation helper declarations are consistent with implementation.

The CheckModifyAccountOperations and CheckModifyUserOperations helper functions are properly declared as private members returning std::vector<CraneExpectedRich<void>> for aggregated error collection.


273-279: Helper function signatures updated with res_user out-parameter.

The updated signatures for AddUserAllowedPartition_ and AddUserAllowedQos_ now accept User& res_user as an out-parameter to accumulate changes before final commit. This supports the transactional workflow pattern.


335-337: DeleteAccountAllowedQos_ return type change is appropriate.

Changing the return type from CraneExpected<void> to CraneExpected<int> allows returning the reference count change value needed for proper QoS reference tracking in the caller.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
src/CraneCtld/AccountManager.cpp (4)

700-701: Uninitialized variable delete_qos_change_num may be used before assignment.

This variable is declared on line 700 but is only assigned inside the Delete/Qos branch (line 803). However, it's used unconditionally in the memory update phase (line 926). If no Delete Qos operation occurs, the variable will contain garbage.

🔎 Proposed fix
-  int delete_qos_change_num;
+  int delete_qos_change_num = 0;

1239-1307: Inconsistent account parameter usage in memory update phase.

The code inconsistently uses account (input parameter, possibly empty), actual_account (resolved), and account_ptr->name across different branches:

  • Delete Partition/Qos branches (lines 1244-1256) use account
  • Partition/Qos modification branches (lines 1277-1290) use account_ptr->name
  • DefaultQos branch (lines 1292-1297) uses account

If account is empty (user's default account is used), these branches will access different map entries.

Ensure consistent usage of the resolved account name (actual_account or account_ptr->name) across all memory update branches.


1341-1361: Validation continues after conversion error, potentially causing undefined behavior.

When ConvertStringToInt64 fails (line 1346), the error is added to the list but execution continues to line 1352 where value_number (which is uninitialized on failure) is used in CheckIfTimeLimitSecIsValid.

Additionally, line 1342 accesses operation.value_list()[0] without checking if the list is empty.

🔎 Proposed fix
   for (const auto& operation : operations) {
+    if (operation.value_list().empty()) {
+      rich_error_list.emplace_back(std::unexpected{
+          FormatRichErr(CraneErrCode::ERR_INVALID_PARAM, "")});
+      continue;
+    }
     auto value = operation.value_list()[0];
     auto item = Qos::GetModifyFieldStr(operation.modify_field());
     int64_t value_number;
     if (item != Qos::FieldStringOfDescription()) {
       bool ok = util::ConvertStringToInt64(value, &value_number);
       if (!ok) {
         rich_error_list.emplace_back(std::unexpected{
             FormatRichErr(CraneErrCode::ERR_CONVERT_TO_INTEGER, value)});
+        continue;  // Skip further validation with invalid number
       }

1404-1406: Loop overwrites QoS object repeatedly, keeping only last version.

The loop iterates over qos_list and for each iteration, assigns the entire qos object to *m_qos_map_[name]. Since each iteration overwrites the previous value, only the final qos from the list is retained.

🔎 Proposed fix - assign only the final state
-  for (auto qos : qos_list) {
-    *m_qos_map_[name] = std::move(qos);
-  }
+  if (!qos_list.empty()) {
+    *m_qos_map_[name] = std::move(qos_list.back());
+  }
🧹 Nitpick comments (2)
src/CraneCtld/AccountManager.cpp (1)

1877-1896: Consider renaming "Check" functions that have side effects.

Functions like CheckSetUserAllowedQosNoLock_ modify res_user as a side effect (as noted in comments). This "check-and-prepare" pattern is valid for the batch flow, but the Check prefix typically implies a pure predicate. Consider renaming to ValidateAndPrepare* or similar to make the side effects explicit in the function name.

src/CraneCtld/AccountManager.h (1)

95-102: Inconsistent parameter passing for operations across Modify functions.

ModifyUser takes operations by non-const reference (line 98), while ModifyAccount (line 92) and ModifyQos (line 102) take it by const reference. This inconsistency is confusing and may indicate unintended modification of the input.

🔎 Proposed fix
   std::vector<CraneExpectedRich<void>> ModifyUser(
       uint32_t uid, const std::string& name, const std::string& account,
       const std::string& partition,
-      std::vector<crane::grpc::ModifyFieldOperation>& operations, bool force);
+      const std::vector<crane::grpc::ModifyFieldOperation>& operations, bool force);
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a9548e7 and 6a21bff.

📒 Files selected for processing (2)
  • src/CraneCtld/AccountManager.cpp (14 hunks)
  • src/CraneCtld/AccountManager.h (3 hunks)
🔇 Additional comments (5)
src/CraneCtld/AccountManager.cpp (3)

529-654: LGTM on the batch validation structure.

The CheckModifyAccountOperations function properly validates operations before execution, separating batch partition/QoS overwrite handling from per-value operations. The error aggregation pattern is appropriate for batch processing.


708-826: Transaction handling pattern is well-structured.

The validate-then-execute pattern with proper transaction boundaries ensures atomicity. The check phase (lines 703-706) runs before the transaction starts, and the apply phase properly handles rollback on any failure.


2827-2876: LGTM on SetAccountAllowedQos_ batch handling.

The function properly handles batch QoS updates with correct reference count management and default QoS handling.

src/CraneCtld/AccountManager.h (2)

161-186: LGTM on the batch validation helper signatures.

The new Check* helper signatures appropriately support the batch processing model with:

  • Vector returns for operations that can produce multiple errors
  • User& res_user output parameter for accumulating prepared changes
  • Consistent pattern between account and user operation validators

315-317: Return type change for DeleteAccountAllowedQos_ is appropriate.

Returning CraneExpected<int> (the reference count change) instead of void enables the caller to properly update in-memory reference counts during the memory update phase of batch operations.

@huerni
Copy link
Collaborator

huerni commented Dec 23, 2025

同样的,txn用一条就好了,因为只需要一个事务,只需要拼接修改的info就行

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
src/CraneCtld/AccountManager.cpp (5)

700-701: Uninitialized variable delete_qos_change_num may cause undefined behavior.

This issue was flagged in a previous review but remains unfixed. The variable is declared on line 700 but only assigned in the Delete/Qos branch (line 803). When used on line 926, it will contain garbage if no Delete/Qos operation was executed.

🔎 Proposed fix
-  int delete_qos_change_num;
+  int delete_qos_change_num = 0;

982-992: Unchecked access to operation.value_list()[0] may cause undefined behavior.

This issue was flagged in a previous review but remains unfixed. Multiple locations access operation.value_list()[0] without verifying the list is non-empty. If a malformed request with an empty value list is received, this causes undefined behavior.

Affected locations: lines 984, 1057-1058, 1071, and 1187.

🔎 Proposed fix - add bounds check before each access
       case crane::grpc::ModifyField::AdminLevel: {
+        if (operation.value_list().empty()) {
+          rich_error_list.emplace_back(std::unexpected{
+              FormatRichErr(CraneErrCode::ERR_INVALID_PARAM, "")});
+          break;
+        }
         auto rich_result = CheckAndModifyUserAdminLevelNoLock_(
             op_user, res_user, operation.value_list()[0]);

Apply similar checks at lines 1057, 1071, and 1187.


1191-1202: Validation continues after conversion error, using potentially uninitialized value.

This issue was flagged in a previous review but remains unfixed. When ConvertStringToInt64 fails (line 1191-1195), the error is recorded but execution continues to line 1197-1201 where value_number (uninitialized on failure) is used in CheckIfTimeLimitSecIsValid. This causes undefined behavior.

🔎 Proposed fix
       if (!ok) {
         rich_error_list.emplace_back(std::unexpected{
             FormatRichErr(CraneErrCode::ERR_CONVERT_TO_INTEGER, value)});
+        continue;  // Skip further validation with invalid number
       }

1249-1251: Loop overwrites QoS object repeatedly, keeping only last version.

This issue was flagged in a previous review but remains unfixed. The loop iterates over qos_list and assigns the entire qos object to *m_qos_map_[name] on each iteration. Since each iteration overwrites the previous value, only the final qos from the list is retained.

🔎 Proposed fix - update once with final state
-  for (auto qos : qos_list) {
-    *m_qos_map_[name] = std::move(qos);
-  }
+  if (!qos_list.empty()) {
+    *m_qos_map_[name] = std::move(qos_list.back());
+  }

843-846: Remove stray semicolon (empty statement).

Line 845 contains an isolated semicolon that serves no purpose. This appears to be leftover from editing.

🔎 Proposed fix
     } else if (operation.type() == crane::grpc::OperationType::Overwrite &&
                operation.modify_field() == crane::grpc::ModifyField::Qos) {
-      ;
-
       std::list<std::string> deleted_qos;
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a21bff and 14998d1.

📒 Files selected for processing (2)
  • src/CraneCtld/AccountManager.cpp
  • src/CraneCtld/AccountManager.h
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-30T02:09:41.625Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 499
File: src/CraneCtld/AccountMetaContainer.cpp:249-276
Timestamp: 2025-05-30T02:09:41.625Z
Learning: In CraneSched's AccountMetaContainer class, m_user_meta_map_ and m_account_meta_map_ are thread-safe containers that do not require external synchronization for access.

Applied to files:

  • src/CraneCtld/AccountManager.h
🧬 Code graph analysis (2)
src/CraneCtld/AccountManager.cpp (1)
src/Utilities/PublicHeader/String.cpp (2)
  • ConvertStringToInt64 (364-368)
  • ConvertStringToInt64 (364-364)
src/CraneCtld/AccountManager.h (1)
src/CraneCtld/AccountManager.cpp (26)
  • ModifyAccount (656-942)
  • ModifyAccount (656-659)
  • ModifyUser (1083-1154)
  • ModifyUser (1083-1087)
  • ModifyQos (1156-1253)
  • ModifyQos (1156-1158)
  • CheckModifyAccountOperations (529-654)
  • CheckModifyAccountOperations (530-533)
  • CheckModifyUserOperations (944-1081)
  • CheckModifyUserOperations (944-948)
  • res_user (1128-1128)
  • res_user (2392-2392)
  • CheckSetUserAllowedQosNoLock_ (1687-1745)
  • CheckSetUserAllowedQosNoLock_ (1687-1690)
  • CheckSetUserDefaultQosNoLock_ (1747-1788)
  • CheckSetUserDefaultQosNoLock_ (1747-1749)
  • CheckAndDeleteUserAllowedPartitionNoLock_ (1790-1800)
  • CheckAndDeleteUserAllowedPartitionNoLock_ (1791-1792)
  • DeleteUserAllowedQos_ (2538-2567)
  • DeleteUserAllowedQos_ (2538-2540)
  • SetAccountAllowedQos_ (2671-2720)
  • SetAccountAllowedQos_ (2671-2674)
  • DeleteAccountAllowedPartition_ (2722-2732)
  • DeleteAccountAllowedPartition_ (2722-2724)
  • DeleteAccountAllowedQos_ (2734-2745)
  • DeleteAccountAllowedQos_ (2734-2736)
🔇 Additional comments (6)
src/CraneCtld/AccountManager.cpp (3)

529-654: Batch validation logic looks correct.

The CheckModifyAccountOperations function properly validates each operation type (Add/Overwrite/Delete) for Partition and QoS fields, collecting rich errors for each failed validation. The early return pattern for Overwrite Partition/Qos set operations is appropriate.


708-826: Transaction handling pattern looks correct.

The transaction flow properly:

  1. Starts transaction before database operations
  2. Rolls back on any failure (need_rollback flag)
  3. Only updates in-memory structures after successful commit

This ensures atomicity as requested in the PR objectives.


1128-1153: Good use of the res_user copy-modify-commit pattern.

The implementation follows the recommended pattern from past reviews:

  1. Creates a copy of the user (res_user)
  2. Validates and modifies the copy in CheckModifyUserOperations
  3. Commits the modified copy atomically to the database
  4. Updates in-memory structures only on success

This approach is cleaner and safer than iterating multiple times.

src/CraneCtld/AccountManager.h (3)

90-102: Public API declarations are consistent with batch operation semantics.

The refactored public API correctly returns std::vector<CraneExpectedRich<void>> to support per-operation error reporting. The signatures match the implementations in the .cpp file.


161-188: Private helper declarations properly updated for the new batch-oriented design.

The helper function signatures correctly:

  • Pass User* for in-place modifications (e.g., res_user)
  • Return std::vector<CraneExpectedRich<void>> where multiple errors are possible
  • Return single CraneExpectedRich<void> for single-result operations

282-312: Signature updates support the batch modification pattern.

  • DeleteUserAllowedQos_ now takes User* allowing in-place modification of the result user
  • DeleteAccountAllowedQos_ returns CraneExpected<int> providing the change count for reference tracking

Both changes align with the transaction-based batch processing model.

std::string log = "";
std::unordered_map<std::string, Account> account_map;
std::unordered_map<std::string, User> user_map;
std::unordered_map<std::string, Qos> qos_map;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这三个map 是不是有点太粗暴了 可以问下这三个map是为了什么吗


case crane::grpc::OperationType::Overwrite:
switch (modify_field) {
Qos res_qos(*m_qos_map_[name]);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

上面已经查出了p吧

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (4)
src/CraneCtld/AccountManager.cpp (4)

834-843: Missing break statement causes fallthrough to std::unreachable().

The Delete Qos case (lines 834-840) lacks a break statement. After processing QoS deletions, execution falls through to the default case at line 841, which calls std::unreachable(), causing undefined behavior (likely a crash) for every Delete Qos operation.

🔎 Proposed fix
       case crane::grpc::ModifyField::Qos: {
         for (const auto& qos : operation.value_list()) {
           DeleteUserAllowedQos_(res_user, qos, actual_account, partition);
           *log += fmt::format("Del: account: {}, partition: {}, qos: {}\n",
                               account_name, partition, qos);
         }
+        break;
       }
       default:
         std::unreachable();

1605-1625: Move iterators in loop invalidate source after first iteration.

Lines 1611-1612 use std::make_move_iterator to assign from qos_list inside a loop over partitions. After the first iteration, elements in qos_list are moved-from and invalid. Subsequent partitions will receive empty/invalid QoS lists, corrupting user data.

🔎 Proposed fix - copy instead of move in the loop
   if (partition.empty()) {
     // Set the qos of all partition
     for (auto& [par, pair] :
          user->account_to_attrs_map[account_name].allowed_partition_qos_map) {
       if (!ranges::contains(qos_list, pair.first))
         pair.first = qos_list.empty() ? "" : default_qos;
-      pair.second.assign(std::make_move_iterator(qos_list.begin()),
-                         std::make_move_iterator(qos_list.end()));
+      pair.second.assign(qos_list.begin(), qos_list.end());
     }
   } else {
     // Set the qos of a specified partition
     auto iter =
         user->account_to_attrs_map[account_name].allowed_partition_qos_map.find(
             partition);

     if (!ranges::contains(qos_list, iter->second.first))
       iter->second.first = qos_list.empty() ? "" : default_qos;

-    iter->second.second.assign(std::make_move_iterator(qos_list.begin()),
-                               std::make_move_iterator(qos_list.end()));
+    iter->second.second.assign(qos_list.begin(), qos_list.end());
   }

1052-1069: Validation continues after conversion error, potentially using uninitialized value.

When ConvertStringToInt64 fails (line 1057-1061), an error is added to the list but execution continues. At line 1064, value_number (which is uninitialized or contains garbage on failure) is used in CheckIfTimeLimitSecIsValid. This could lead to undefined behavior.

🔎 Proposed fix
       if (!ok) {
         rich_error_list.emplace_back(std::unexpected{
             FormatRichErr(CraneErrCode::ERR_CONVERT_TO_INTEGER, value)});
+        continue;  // Skip this operation and proceed to next
       }

       if (item == Qos::FieldStringOfMaxTimeLimitPerTask() &&

This ensures value_number is never accessed when conversion fails.


1672-1682: Erasing from wrong map - will corrupt user data.

Line 1680 attempts to erase partition from account_to_attrs_map, but this map is keyed by account names, not partition names. This either has no effect (partition name doesn't exist as key) or corrupts the account_to_attrs_map. The code should erase from allowed_partition_qos_map within the specific account.

🔎 Proposed fix
 CraneExpectedRich<void>
 AccountManager::CheckAndDeleteUserAllowedPartitionNoLock_(
     User* user, const std::string& account, const std::string& partition) {
   if (!user->account_to_attrs_map.at(account)
            .allowed_partition_qos_map.contains(partition))
     return std::unexpected{
         FormatRichErr(CraneErrCode::ERR_PARTITION_MISSING, partition)};
   // modify the user
-  user->account_to_attrs_map.erase(partition);
+  user->account_to_attrs_map[account].allowed_partition_qos_map.erase(partition);
   return {};
 }
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e12fab6 and 9e5af98.

📒 Files selected for processing (1)
  • src/CraneCtld/AccountManager.cpp

}
*log += fmt::format("Set: qos_list: {}\n", fmt::join(qos_list, ","));
} else {
for (const auto& value : operation.value_list()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add bounds check before accessing value_list()[0].

Multiple locations in this function access operation.value_list()[0] without first verifying that the list is non-empty (lines 579, 623, 635). A malformed request with an empty value list will cause undefined behavior.

🔎 Proposed fix

Add a check at the start of the value loop or before each access:

     } else {
       for (const auto& value : operation.value_list()) {
+        if (operation.value_list().empty()) {
+          rich_error_list.emplace_back(std::unexpected{
+              FormatRichErr(CraneErrCode::ERR_INVALID_PARAM, "")});
+          break;
+        }
         switch (operation.type()) {

Or add validation before the single-value cases.

Also applies to: 623-632, 635-645

🤖 Prompt for AI Agents
In src/CraneCtld/AccountManager.cpp around lines 579, 623 and 635, the code
accesses operation.value_list()[0] without ensuring value_list is non-empty; add
a bounds check (e.g., if (operation.value_list_size() == 0) { /* return
error/log and continue/abort */ }) before the loop or immediately before each
single-value access so you never index [0] on an empty list, and handle the
malformed-request case consistently (log and return an appropriate error or skip
processing).

Comment on lines +846 to +856
case crane::grpc::ModifyField::AdminLevel: {
auto rich_result = CheckAndModifyUserAdminLevelNoLock_(
op_user, res_user, operation.value_list()[0]);

const Account* account_ptr = GetExistedAccountInfoNoLock_(actual_account);
if (!account_ptr)
return std::unexpected(
FormatRichErr(CraneErrCode::ERR_INVALID_ACCOUNT, ""));
if (!rich_result) {
rich_error_list.emplace_back(rich_result);
} else {
*log += fmt::format("admin_level: {}\n", operation.value_list()[0]);
}

auto rich_result =
CheckSetUserAllowedPartitionNoLock_(account_ptr, partition_list);
if (!rich_result) return rich_result;
break;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Unchecked access to value_list()[0] may cause undefined behavior.

Multiple locations access operation.value_list()[0] without verifying the list is non-empty (lines 848, 853, 906, 922, 935). If a malformed request arrives with an empty value list, this will cause undefined behavior.

🔎 Proposed fix

Add bounds checking before accessing the first element:

       case crane::grpc::ModifyField::AdminLevel: {
+        if (operation.value_list().empty()) {
+          rich_error_list.emplace_back(std::unexpected{
+              FormatRichErr(CraneErrCode::ERR_INVALID_PARAM, "")});
+          break;
+        }
         auto rich_result = CheckAndModifyUserAdminLevelNoLock_(
             op_user, res_user, operation.value_list()[0]);

Apply similar checks at lines 906, 922, and 935.

Also applies to: 919-931, 933-937

🤖 Prompt for AI Agents
In src/CraneCtld/AccountManager.cpp around lines 846-856 (also review the
similar sites around 906, 919-931, 922 and 935), the code dereferences
operation.value_list()[0] without checking the list size; add a guard that
verifies !operation.value_list().empty() before accessing [0], and if empty
populate rich_error_list (or return/handle an appropriate error) and skip the
modification logic so no out-of-bounds access occurs; apply the same pattern to
the other referenced locations.

Comment on lines +1052 to +1053
for (const auto& operation : operations) {
auto value = operation.value_list()[0];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing bounds check on value_list()[0] throughout QoS modification.

Lines 1053 and 1079 access operation.value_list()[0] without verifying the list is non-empty. A malformed request will cause undefined behavior.

🔎 Proposed fix

Add validation at the start of the operation loop:

   for (const auto& operation : operations) {
+    if (operation.value_list().empty()) {
+      rich_error_list.emplace_back(std::unexpected{
+          FormatRichErr(CraneErrCode::ERR_INVALID_PARAM, "")});
+      continue;
+    }
     auto value = operation.value_list()[0];

Also applies to: 1078-1118

🤖 Prompt for AI Agents
In src/CraneCtld/AccountManager.cpp around lines 1052-1053 and throughout the
QoS modification loop (also covering 1078-1118), operations access
operation.value_list()[0] without verifying the list is non-empty; add an
explicit bounds check at the start of the loop (e.g., if
(operation.value_list().empty()) { log/error and skip/return appropriate error
}) before any access, and apply the same guard to every other place that uses
value_list()[0] in this block so malformed requests do not cause undefined
behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cacctmgr输出不好理解

3 participants