-
Notifications
You must be signed in to change notification settings - Fork 32
Add creport cmd #684
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add creport cmd #684
Conversation
WalkthroughAdds protobuf messages and streaming RPCs for job summaries and manual aggregation; implements MongoDB hourly/day/month rollups with a background thread and startup DB init; extends task documents with Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Ctld_gRPC as Ctld gRPC
participant DBClient as MongodbClient
participant MongoDB
participant RollupTh as RollupThread
Client->>Ctld_gRPC: QueryJobSummary(request)
Ctld_gRPC->>DBClient: QueryJobSummary(request, stream)
loop per time-range chunk
DBClient->>MongoDB: Aggregate (union of hour/day/month collections)
MongoDB-->>DBClient: cursor/results
DBClient->>Ctld_gRPC: stream.Write(QueryJobSummaryReply batch)
end
Ctld_gRPC-->>Client: grpc::Status OK
Note over RollupTh,DBClient: Periodic background rollup
RollupTh->>DBClient: ClusterRollupUsage()
DBClient->>MongoDB: Aggregate task_table -> upsert hour_job_summary_table
DBClient->>MongoDB: Aggregate hour->day and day->month as scheduled
DBClient->>MongoDB: Update summary_time_table
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 10
🧹 Nitpick comments (11)
scripts/wipe_data.sh (2)
33-47: Add error handling to batch import functions.The new
batch_insert_collection()andbatch_insert_collection_process()functions executemongoimportwithout checking for errors or validating that the JSON file exists. This can silently fail or produce confusing output. Add explicit error handling and validation.function batch_insert_collection() { collection=$1 json_file=$2 + if [ ! -f "$json_file" ]; then + echo "Error: JSON file not found: $json_file" >&2 + return 1 + fi mongoimport --username "$username" --password "$password" --host "$host" --port "$port" \ --authenticationDatabase admin --db "$dbname" --collection "$collection" \ --file "$json_file" --jsonArray + if [ $? -ne 0 ]; then + echo "Error: mongoimport failed for $collection from $json_file" >&2 + return 1 + fi } function batch_insert_collection_process() { collection=$1 json_file=$2 + if [ ! -f "$json_file" ]; then + echo "Error: JSON file not found: $json_file" >&2 + return 1 + fi mongoimport --username "$username" --password "$password" --host "$host" --port "$port" \ --authenticationDatabase admin --db "$dbname" --collection "$collection" \ --file "$json_file" + if [ $? -ne 0 ]; then + echo "Error: mongoimport failed for $collection from $json_file" >&2 + return 1 + fi }
82-85: Use absolute path instead of relative path for batch import.Line 83 uses a relative path
./tasks_bulk.jsonwhich is fragile if the script is invoked from a different directory. Consider using an absolute path or resolving the path relative to the script's location.if [ "$mode" -eq 10 ]; then - batch_insert_collection_process "task_table" "./tasks_bulk.json" + script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" + batch_insert_collection_process "task_table" "$script_dir/tasks_bulk.json" exit 0 fisrc/Utilities/PublicHeader/include/crane/String.h (1)
42-46: Use an enum for TimeRange granularity instead of string literals.Strings ("hour"/"day"/"month") are error‑prone and add allocation/compare cost. Replace with a small enum class and convert at boundaries that need strings.
Example:
-struct TimeRange { - std::time_t start; - std::time_t end; - std::string type; // "month" / "day" / "hour" -}; +enum class TimeRangeType : uint8_t { Hour, Day, Month }; +struct TimeRange { + std::time_t start{}; + std::time_t end{}; + TimeRangeType type{}; +};src/Utilities/PublicHeader/String.cpp (1)
527-533: Minor: remove redundant inline and mark noexcept.This function lives in a .cpp; inline is unnecessary. Also consider noexcept.
-inline std::string FormatTime(std::time_t t) { +std::string FormatTime(std::time_t t) noexcept {src/CraneCtld/CtldPreCompiledHeader.h (1)
88-89: Include hygiene (optional).grpc++/grpc++.h already pulls in server APIs on most versions. Consider keeping to one style (grpcpp vs grpc++), preferably grpcpp, across the codebase to avoid subtle include path mismatches.
src/CraneCtld/RpcService/CtldGrpcServer.h (1)
379-389: Consider documenting auth/visibility for new reporting RPCs.Most Ctld RPCs enforce readiness and sometimes auth. Clarify intended access for QueryJobSummary/QueryJobSizeSummary (admin‑only vs public) in comments to guide clients and reviewers.
protos/PublicDefs.proto (2)
814-823: Field semantics: add units and aggregation notes (optional).Document whether total_cpu_time is seconds, hours, or core‑seconds, and what total_count counts (jobs? steps?) to avoid downstream misinterpretation.
824-831: Bucket key typing (optional).cpus_alloc as uint32 is fine; if buckets are used, consider naming cpus_alloc_bucket for clarity in clients.
protos/Crane.proto (3)
977-989: Define time‑range semantics and required invariants.Please document whether filter_end_time is exclusive or inclusive and require start < end to avoid ambiguous queries. Consider rejecting invalid ranges server‑side.
990-1006: Consistency nits: field naming and filters.
- filter_qoss: consider filter_qos or filter_qos_list for consistency with other protos.
- filter_nodename_list vs other filter_* naming: consider filter_nodes for symmetry.
1065-1067: Streaming RPCs: add brief comments about batching size and backpressure.Since replies are batched server‑side, noting batch limits in proto helps clients tune read loops.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
protos/Crane.proto(2 hunks)protos/PublicDefs.proto(1 hunks)scripts/wipe_data.sh(3 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/CtldPreCompiledHeader.h(1 hunks)src/CraneCtld/DbClient.cpp(11 hunks)src/CraneCtld/DbClient.h(5 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(1 hunks)src/CraneCtld/RpcService/CtldGrpcServer.h(1 hunks)src/Utilities/PublicHeader/String.cpp(1 hunks)src/Utilities/PublicHeader/include/crane/String.h(2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-06-23T07:53:30.513Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/CtldPreCompiledHeader.hsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/CtldPreCompiledHeader.hsrc/CraneCtld/RpcService/CtldGrpcServer.hsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.hsrc/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.
Applied to files:
src/Utilities/PublicHeader/String.cppsrc/Utilities/PublicHeader/include/crane/String.hsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-09T01:54:39.465Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Applied to files:
src/CraneCtld/DbClient.h
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/DbClient.h
🧬 Code graph analysis (6)
src/CraneCtld/RpcService/CtldGrpcServer.h (2)
src/CraneCtld/DbClient.cpp (4)
QueryJobSummary(2680-2784)QueryJobSummary(2680-2682)QueryJobSizeSummary(2786-2911)QueryJobSizeSummary(2786-2788)src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
QueryJobSummary(2046-2052)QueryJobSummary(2046-2049)QueryJobSizeSummary(2054-2064)QueryJobSizeSummary(2054-2057)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(49-57)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
src/CraneCtld/DbClient.cpp (4)
QueryJobSummary(2680-2784)QueryJobSummary(2680-2682)QueryJobSizeSummary(2786-2911)QueryJobSizeSummary(2786-2788)
src/CraneCtld/DbClient.cpp (5)
src/Utilities/PublicHeader/PublicHeader.cpp (1)
result(772-772)src/Utilities/PublicHeader/String.cpp (6)
FormatTime(527-533)FormatTime(527-527)EfficientSplitTimeRange(618-636)EfficientSplitTimeRange(618-619)SetCurrentThreadName(354-362)SetCurrentThreadName(354-354)src/CraneCtld/DbClient.h (4)
string(71-73)string(375-419)string(377-393)MongodbClient(93-221)src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
QueryJobSummary(2046-2052)QueryJobSummary(2046-2049)QueryJobSizeSummary(2054-2064)QueryJobSizeSummary(2054-2057)src/CraneCtld/EmbeddedDbClient.cpp (6)
Init(25-50)Init(25-25)Init(237-296)Init(237-237)Init(503-572)Init(503-503)
src/Utilities/PublicHeader/include/crane/String.h (1)
src/Utilities/PublicHeader/String.cpp (10)
FormatTime(527-533)FormatTime(527-527)TryPushHours(598-615)TryPushHours(598-599)TryPushMonths(538-565)TryPushMonths(538-539)TryPushDays(568-595)TryPushDays(568-569)EfficientSplitTimeRange(618-636)EfficientSplitTimeRange(618-619)
src/CraneCtld/DbClient.h (2)
src/CraneCtld/DbClient.cpp (48)
MongodbClient(3025-3049)MongodbClient(3051-3054)Init(2977-3000)Init(2977-2977)InitTableIndexes(2940-2975)InitTableIndexes(2940-2940)CreateCollectionIndex(2046-2068)CreateCollectionIndex(2046-2047)RollupTypeToString(1866-1877)RollupTypeToString(1866-1866)UpdateSummaryLastSuccessTimeSec(1879-1913)UpdateSummaryLastSuccessTimeSec(1879-1880)GetSummaryLastSuccessTimeTm(1915-1954)GetSummaryLastSuccessTimeTm(1915-1916)NeedRollup(1956-1973)NeedRollup(1956-1957)RollupSummary(1975-2012)RollupSummary(1975-1975)ClusterRollupUsage(2014-2044)ClusterRollupUsage(2014-2014)GetCpusAlloc(2625-2634)GetCpusAlloc(2625-2625)GetTotalCpuTime(2636-2645)GetTotalCpuTime(2636-2636)GetTotalCount(2647-2654)GetTotalCount(2647-2647)WriteReply(2656-2678)WriteReply(2656-2659)QueryJobSizeSummaryByJobIds(2335-2488)QueryJobSizeSummaryByJobIds(2335-2337)JobQueryMatch(2490-2534)JobQueryMatch(2490-2492)QueryJobSummary(2680-2784)QueryJobSummary(2680-2682)JobSizeQueryMatch(2536-2600)JobSizeQueryMatch(2536-2538)QueryJobSizeSummary(2786-2911)QueryJobSizeSummary(2786-2788)AppendUnionWithRanges(2603-2623)AppendUnionWithRanges(2603-2606)AggregateJobSummaryByHour(2070-2191)AggregateJobSummaryByHour(2070-2072)AggregateJobSummaryByDayOrMonth(2193-2333)AggregateJobSummaryByDayOrMonth(2193-2196)MongoDbSumaryTh_(3002-3023)MongoDbSumaryTh_(3002-3003)MillisecondsToNextHour(2913-2938)MillisecondsToNextHour(2913-2913)src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
QueryJobSummary(2046-2052)QueryJobSummary(2046-2049)QueryJobSizeSummary(2054-2064)QueryJobSizeSummary(2054-2057)
🔇 Additional comments (2)
src/Utilities/PublicHeader/String.cpp (1)
617-636: Edge-case: ensure fallback interval stays “hour” only for sub-hour tail.Current fallback emits {cur, end, "hour"} even when cur is aligned to day/month start but end < next boundary (rare). With the alignment fixes above this remains consistent; please re-run unit tests for boundaries (e.g., 00:00:30 → 01:00:00).
Would you like a quick gtest snippet covering DST boundaries and minute/second offsets?
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
2046-2052: Guard with srv_ready like other RPCs.Align with existing pattern to return UNAVAILABLE until server ready.
grpc::Status CraneCtldServiceImpl::QueryJobSummary( ::grpc::ServerContext *context, const ::crane::grpc::QueryJobSummaryRequest *request, ::grpc::ServerWriter<::crane::grpc::QueryJobSummaryReply> *writer) { - g_db_client->QueryJobSummary(request, writer); + if (!g_runtime_status.srv_ready.load(std::memory_order_acquire)) + return {grpc::StatusCode::UNAVAILABLE, "CraneCtld Server is not ready"}; + g_db_client->QueryJobSummary(request, writer); return grpc::Status::OK; }⛔ Skipped due to learnings
Learnt from: RileyWen Repo: PKUHPC/CraneSched PR: 475 File: src/CraneCtld/CtldGrpcServer.cpp:113-118 Timestamp: 2025-05-02T07:06:36.103Z Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.Learnt from: L-Xiafeng Repo: PKUHPC/CraneSched PR: 520 File: src/Craned/CranedServer.cpp:50-51 Timestamp: 2025-06-23T07:53:30.513Z Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.Learnt from: L-Xiafeng Repo: PKUHPC/CraneSched PR: 496 File: src/CraneCtld/CranedKeeper.cpp:51-53 Timestamp: 2025-05-09T01:54:21.256Z Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.Learnt from: L-Xiafeng Repo: PKUHPC/CraneSched PR: 587 File: src/Craned/Supervisor/CforedClient.cpp:449-454 Timestamp: 2025-08-14T02:56:35.503Z Learning: In CforedClient::AsyncSendRecvThread_(), the guard `if (state <= State::Registering) { continue; }` in the TIMEOUT branch only prevents premature cleanup when stopping before registration completes, but it doesn't block normal gRPC event processing. The completion queue will still deliver Prepare/Write/Read events that advance the state machine normally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
src/Utilities/PublicHeader/String.cpp (3)
538-567: Critical granularity bug + missing mktime error handling (previously flagged).Two correctness issues remain unfixed from prior review:
- Line 542: Checking only
tm_hour == 0is insufficient—a timestamp like2024-01-01 00:30:00would incorrectly qualify as a month boundary. Must also verifytm_min == 0 && tm_sec == 0.- Line 553:
mktimereturns-1on failure (e.g., invalid DST transitions), but the code doesn't check for this, which could corrupt the time-range split.Apply this diff to fix both issues:
std::tm tm_cur{}; localtime_r(&cur, &tm_cur); - if (tm_cur.tm_mday == 1 && tm_cur.tm_hour == 0) { + if (tm_cur.tm_mday == 1 && tm_cur.tm_hour == 0 && tm_cur.tm_min == 0 && tm_cur.tm_sec == 0) { std::time_t month_start = cur; std::time_t month_cur = cur; while (true) { std::tm tm_next_month{}; localtime_r(&month_cur, &tm_next_month); tm_next_month.tm_mon += 1; tm_next_month.tm_mday = 1; tm_next_month.tm_hour = 0; tm_next_month.tm_min = 0; tm_next_month.tm_sec = 0; std::time_t next_month_first = mktime(&tm_next_month); + if (next_month_first == -1) break; if (next_month_first <= end) { month_cur = next_month_first; } else { break; } }Based on learnings
570-600: Critical granularity bug + missing mktime error handling (previously flagged).Same issues as
TryPushMonths:
- Line 574: Only checking
tm_hour == 0allows partial-day timestamps (e.g.,12:30:00) to be bucketed as full days, causing query mismatches with day-level aggregates.- Lines 582, 590: No error checks for
mktimereturning-1.Apply this diff:
std::tm tm_cur{}; localtime_r(&cur, &tm_cur); - if (tm_cur.tm_hour == 0) { + if (tm_cur.tm_hour == 0 && tm_cur.tm_min == 0 && tm_cur.tm_sec == 0) { std::time_t day_start = cur; std::tm tm_next_month = tm_cur; tm_next_month.tm_mon += 1; tm_next_month.tm_mday = 1; tm_next_month.tm_hour = 0; tm_next_month.tm_min = 0; tm_next_month.tm_sec = 0; std::time_t next_month_first = mktime(&tm_next_month); + if (next_month_first == -1) return false; std::tm tm_end{}; localtime_r(&end, &tm_end); tm_end.tm_hour = 0; tm_end.tm_min = 0; tm_end.tm_sec = 0; std::time_t end_trunc_to_day = mktime(&tm_end); + if (end_trunc_to_day == -1) return false;Based on learnings
603-621: Missing mktime error handling (previously flagged).Line 613: The
mktimecall can return-1on failure, but there's no error check.Apply this diff:
std::tm tm_next_day = tm_cur; tm_next_day.tm_mday += 1; tm_next_day.tm_hour = 0; tm_next_day.tm_min = 0; tm_next_day.tm_sec = 0; std::time_t next_day = mktime(&tm_next_day); + if (next_day == -1) return false; std::time_t hour_end = std::min(next_day, end);Based on learnings
🧹 Nitpick comments (2)
scripts/wipe_data.sh (2)
3-3: Fix usage message formatting for consistency.The usage string still has inconsistent spacing around pipe separators. After mode 6 and before mode 8, spacing is uneven ("6:...| 7:" vs "|8:"). Standardize the separators throughout for readability.
-echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table| 7:hour_table+day_table+month_table |8: summary_time_table)" +echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table)"
33-47: Add error handling and input validation to batch import functions.The new
batch_insert_collectionandbatch_insert_collection_processfunctions invoke mongoimport without checking for failures. If the import fails, the script silently continues. Add error handling to match the safety pattern used inwipe_collection, and validate that the JSON file exists before attempting import.function batch_insert_collection() { collection=$1 json_file=$2 + if [ ! -f "$json_file" ]; then + echo "Error: JSON file not found: $json_file" + return 1 + fi mongoimport --username "$username" --password "$password" --host "$host" --port "$port" \ --authenticationDatabase admin --db "$dbname" --collection "$collection" \ - --file "$json_file" --jsonArray + --file "$json_file" --jsonArray || return 1 } function batch_insert_collection_process() { collection=$1 json_file=$2 + if [ ! -f "$json_file" ]; then + echo "Error: JSON file not found: $json_file" + return 1 + fi mongoimport --username "$username" --password "$password" --host "$host" --port "$port" \ --authenticationDatabase admin --db "$dbname" --collection "$collection" \ - --file "$json_file" + --file "$json_file" || return 1 }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
scripts/wipe_data.sh(3 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/DbClient.cpp(11 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(1 hunks)src/Utilities/PublicHeader/String.cpp(1 hunks)src/Utilities/PublicHeader/include/crane/String.h(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- src/CraneCtld/CraneCtld.cpp
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-06-23T07:53:30.513Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.
Applied to files:
src/Utilities/PublicHeader/String.cppsrc/Utilities/PublicHeader/include/crane/String.h
📚 Learning: 2025-04-02T10:11:33.562Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.
Applied to files:
src/CraneCtld/DbClient.cpp
🧬 Code graph analysis (4)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(49-57)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (2)
src/CraneCtld/DbClient.cpp (4)
QueryJobSummary(2679-2783)QueryJobSummary(2679-2681)QueryJobSizeSummary(2785-2910)QueryJobSizeSummary(2785-2787)src/Craned/Core/CranedPublicDefs.h (1)
g_runtime_status(162-162)
src/CraneCtld/DbClient.cpp (3)
src/Utilities/PublicHeader/PublicHeader.cpp (1)
result(772-772)src/Utilities/PublicHeader/String.cpp (6)
FormatTime(527-533)FormatTime(527-527)EfficientSplitTimeRange(624-642)EfficientSplitTimeRange(624-625)SetCurrentThreadName(354-362)SetCurrentThreadName(354-354)src/CraneCtld/DbClient.h (4)
string(71-73)string(375-419)string(377-393)MongodbClient(93-221)
src/Utilities/PublicHeader/include/crane/String.h (1)
src/Utilities/PublicHeader/String.cpp (10)
FormatTime(527-533)FormatTime(527-527)TryPushHours(603-621)TryPushHours(603-604)TryPushMonths(538-567)TryPushMonths(538-539)TryPushDays(570-600)TryPushDays(570-571)EfficientSplitTimeRange(624-642)EfficientSplitTimeRange(624-625)
🔇 Additional comments (16)
scripts/wipe_data.sh (2)
72-80: ✓ Mode 3 correctly excluded from summary table wiping.The conditions now properly restrict summary table wiping to modes 5, 7, and 8, excluding mode 3 (task_table). This aligns with the documented behavior.
82-85: Document mode 10 in usage message and validate input file.Mode 10 is not listed in the usage message on line 3, making the script's full capability undiscoverable. Additionally, the hardcoded path
./tasks_bulk.jsonhas no validation; if the file doesn't exist, mongoimport will fail. Consider either updating the usage message and adding file existence checks, or making the path configurable via an environment variable or argument.[suggest_minor_issue]
-echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table)" +echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table | 10:import_task_bulk)"And in the mode 10 block, add a file existence check:
if [ "$mode" -eq 10 ]; then + if [ ! -f "./tasks_bulk.json" ]; then + echo "Error: ./tasks_bulk.json not found" + exit 1 + fi batch_insert_collection_process "task_table" "./tasks_bulk.json" exit 0 fisrc/Utilities/PublicHeader/include/crane/String.h (2)
42-46: LGTM: Clean TimeRange struct definition.The struct is well-defined with clear field names and inline documentation of valid type values.
136-144: LGTM: Function declarations are properly structured.All time-range utility functions are correctly declared in the header for implementation in the .cpp file.
src/Utilities/PublicHeader/String.cpp (2)
527-533: LGTM: Thread-safe time formatting.Correctly uses
localtime_rfor thread safety and produces a standard timestamp format.
624-642: LGTM: Well-structured time-range splitting logic.The greedy strategy (months → days → hours) is sound, and edge cases (empty range, partial final hour) are handled correctly.
src/CraneCtld/RpcService/CtldGrpcServer.cpp (2)
2046-2054: LGTM - readiness check in place.The server readiness check is correctly implemented, and the handler properly delegates to the database client for streaming query results.
2056-2068: LGTM - readiness check and conditional routing look good.The handler correctly checks server readiness and appropriately routes to different query methods based on whether job IDs are specified.
src/CraneCtld/DbClient.cpp (8)
1535-1539: LGTM - nodename_list field properly integrated.The
nodename_listarray is correctly constructed from the craned IDs in bothTaskInEmbeddedDbToDocument_andTaskInCtldToDocument_, and the field count has been properly updated to 40.Also applies to: 1637-1641
1879-2044: Rollup orchestration is well-structured.The rollup logic properly:
- Uses
$maxoperator to ensure timestamps only advance- Protects concurrent rollup operations with a mutex
- Includes timing instrumentation for performance monitoring
- Handles errors by updating the last success timestamp to the last completed interval
2070-2191: Hourly aggregation pipeline is correct.The multiplication of
nodes_alloc * cpus_allocat lines 2100-2107 is intentional and correct (per previous discussion -cpus_allocin the raw table represents per-node allocation). The pipeline stages properly compute totals, group by dimensions, and merge results into the summary collection.
2452-2462: Bucketing logic correctly fixed.The bucketing logic now properly handles jobs with
cpus_allocsmaller than the first threshold by assigning them togrouping_list[0]rather than dropping them. This addresses the concern raised in previous reviews.Also applies to: 2892-2897
2624-2653: Type-safe helper functions for BSON field extraction.The helper functions correctly handle multiple BSON numeric types (
k_int32,k_int64,k_double) when extracting values, preventing type-related crashes.
2679-2910: Efficient multi-collection query with streaming responses.Both
QueryJobSummaryandQueryJobSizeSummaryimplement sophisticated multi-collection aggregation by:
- Classifying time ranges into hour/day/month segments
- Using
$unionWithto query appropriate summary tables- Streaming results in batches to handle large datasets
- Proper exception handling
2939-2999: Proper index creation and timer initialization.The initialization correctly:
- Creates indexes on time fields and composite keys for efficient querying
- Calculates the delay until the next hour boundary for the first rollup
- Sets up a repeating timer at hourly intervals
- Returns false if index creation fails
3001-3053: Thread lifecycle properly managed.The background thread and destructor correctly:
- Set a meaningful thread name for debugging
- Check the stop flag periodically via idle handler
- Walk and close all uvw handles before stopping the loop
- Join the thread in the destructor with joinable check
b89c440 to
c3d6614
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
src/Utilities/PublicHeader/String.cpp (1)
537-620: Require full-day/month alignment before widening time ranges
TryPushMonths/TryPushDaysstill accept timestamps where only the hour is zero; if the caller passes00:30, these helpers treat the interval as an entire day/month, so downstream aggregations double-count or mislabel partial ranges. We previously discussed this exact bug—please only bucket when hour, minute, and second are all zero (and consider using a shared safeToLocalTmhelper).- if (tm_cur.tm_mday == 1 && tm_cur.tm_hour == 0) { + if (tm_cur.tm_mday == 1 && tm_cur.tm_hour == 0 && tm_cur.tm_min == 0 && + tm_cur.tm_sec == 0) { @@ - if (tm_cur.tm_hour == 0) { + if (tm_cur.tm_hour == 0 && tm_cur.tm_min == 0 && tm_cur.tm_sec == 0) {
🧹 Nitpick comments (1)
scripts/wipe_data.sh (1)
3-3: Fix usage message formatting for consistency.The usage string still has inconsistent spacing around mode separators. For example, there's a space before
7:but the separator before8:is missing a space, and mode descriptions lack uniform punctuation.Apply this diff to improve consistency:
-echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table| 7:hour_table+day_table+month_table |8: summary_time_table)" +echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table | 10:batch_insert_tasks)"Note: Added mode 10 to the usage message since the script now supports it (line 82).
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
scripts/wipe_data.sh(3 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/DbClient.cpp(11 hunks)src/CraneCtld/DbClient.h(5 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(1 hunks)src/Utilities/PublicHeader/String.cpp(1 hunks)src/Utilities/PublicHeader/include/crane/String.h(2 hunks)
🧰 Additional context used
🧠 Learnings (10)
📚 Learning: 2025-06-23T07:53:30.513Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-02T07:12:46.896Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/Craned/CranedServer.h:0-0
Timestamp: 2025-05-02T07:12:46.896Z
Learning: The initialization of `m_supervisor_recovered_` to `true` in the CranedServer class is intentional despite the comment saying "When supervisor ready, init with false". This is temporary until the supervisor functionality is fully implemented.
Applied to files:
src/CraneCtld/CraneCtld.cpp
📚 Learning: 2025-04-27T06:47:51.751Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 475
File: src/Craned/CtldClient.cpp:0-0
Timestamp: 2025-04-27T06:47:51.751Z
Learning: In CtldClient, when a CranedRegister RPC fails, the state is reset to NOTIFY_SENDING. This ensures that token will be refreshed by NotifyCranedConnected_() before any subsequent CranedRegister_() call, preventing null token access issues.
Applied to files:
src/CraneCtld/CraneCtld.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/CraneCtld.cpp
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.
Applied to files:
src/Utilities/PublicHeader/include/crane/String.hsrc/Utilities/PublicHeader/String.cppsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-09T01:54:39.465Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Applied to files:
src/CraneCtld/DbClient.h
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/DbClient.h
📚 Learning: 2025-04-02T10:11:33.562Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.
Applied to files:
src/CraneCtld/DbClient.cpp
🧬 Code graph analysis (5)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(49-57)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (2)
src/CraneCtld/DbClient.cpp (4)
QueryJobSummary(2684-2788)QueryJobSummary(2684-2686)QueryJobSizeSummary(2790-2915)QueryJobSizeSummary(2790-2792)src/Craned/Core/CranedPublicDefs.h (1)
g_runtime_status(162-162)
src/Utilities/PublicHeader/include/crane/String.h (1)
src/Utilities/PublicHeader/String.cpp (10)
FormatTime(527-533)FormatTime(527-527)TryPushHours(603-621)TryPushHours(603-604)TryPushMonths(538-567)TryPushMonths(538-539)TryPushDays(570-600)TryPushDays(570-571)EfficientSplitTimeRange(624-642)EfficientSplitTimeRange(624-625)
src/CraneCtld/DbClient.h (1)
src/CraneCtld/DbClient.cpp (48)
MongodbClient(3029-3053)MongodbClient(3055-3058)Init(2981-3004)Init(2981-2981)InitTableIndexes(2944-2979)InitTableIndexes(2944-2944)CreateCollectionIndex(2051-2073)CreateCollectionIndex(2051-2052)RollupTypeToString(1866-1877)RollupTypeToString(1866-1866)UpdateSummaryLastSuccessTimeSec(1879-1913)UpdateSummaryLastSuccessTimeSec(1879-1880)GetSummaryLastSuccessTimeTm(1915-1962)GetSummaryLastSuccessTimeTm(1915-1916)NeedRollup(1964-1981)NeedRollup(1964-1965)RollupSummary(1983-2017)RollupSummary(1983-1983)ClusterRollupUsage(2019-2049)ClusterRollupUsage(2019-2019)GetCpusAlloc(2629-2638)GetCpusAlloc(2629-2629)GetTotalCpuTime(2640-2649)GetTotalCpuTime(2640-2640)GetTotalCount(2651-2658)GetTotalCount(2651-2651)WriteReply(2660-2682)WriteReply(2660-2663)QueryJobSizeSummaryByJobIds(2340-2492)QueryJobSizeSummaryByJobIds(2340-2342)JobQueryMatch(2494-2538)JobQueryMatch(2494-2496)QueryJobSummary(2684-2788)QueryJobSummary(2684-2686)JobSizeQueryMatch(2540-2604)JobSizeQueryMatch(2540-2542)QueryJobSizeSummary(2790-2915)QueryJobSizeSummary(2790-2792)AppendUnionWithRanges(2607-2627)AppendUnionWithRanges(2607-2610)AggregateJobSummaryByHour(2075-2196)AggregateJobSummaryByHour(2075-2077)AggregateJobSummaryByDayOrMonth(2198-2338)AggregateJobSummaryByDayOrMonth(2198-2201)MongoDbSumaryTh_(3006-3027)MongoDbSumaryTh_(3006-3007)MillisecondsToNextHour(2917-2942)MillisecondsToNextHour(2917-2917)
src/CraneCtld/DbClient.cpp (4)
src/Utilities/PublicHeader/PublicHeader.cpp (1)
result(772-772)src/Utilities/PublicHeader/String.cpp (6)
FormatTime(527-533)FormatTime(527-527)EfficientSplitTimeRange(624-642)EfficientSplitTimeRange(624-625)SetCurrentThreadName(354-362)SetCurrentThreadName(354-354)src/CraneCtld/DbClient.h (3)
string(71-73)string(377-419)MongodbClient(93-221)src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
QueryJobSummary(2046-2054)QueryJobSummary(2046-2049)QueryJobSizeSummary(2056-2068)QueryJobSizeSummary(2056-2059)
🔇 Additional comments (8)
scripts/wipe_data.sh (3)
33-47: New import functions look good.The
batch_insert_collectionfunction correctly uses--jsonArrayfor importing multiple documents within a single JSON array, whilebatch_insert_collection_processhandles standard newline-delimited JSON format. Both functions properly inherit authentication and database parameters from the script context.
72-80: ✓ Critical issue resolved: Mode 3 correctly excluded from summary table wipes.The logic now correctly isolates summary table wiping to modes 5, 7, and 8. Mode 3 (task_table) no longer triggers unintended deletion of derived summary data, fixing the critical inconsistency flagged in the prior review.
82-85: Verify relative path usage for mode 10 import.The hardcoded relative path
./tasks_bulk.jsonassumes the script is executed from a specific working directory. If invoked from elsewhere (e.g., via cron, systemd, or absolute path call), the import will fail silently or with an unclear error.Consider one of the following approaches:
- Use an absolute path (e.g.,
$base_dir/tasks_bulk.jsonif tasks are colocated with configs)- Make it configurable via an environment variable or command-line parameter
- Document the required working directory in a comment if the current relative path is intentional
Verify the intended usage pattern and update accordingly.
src/CraneCtld/DbClient.h (5)
22-22: LGTM!The include is necessary for the new gRPC streaming method signatures added to the MongodbClient class.
107-127: LGTM!The
RollupTypeenum and summary data structures are well-designed:
- Explicit underlying type for the enum ensures predictable size
JobSizeSummaryKeyproperly implementsAbslHashValueandoperator==for use in hash mapsJobSizeSummaryResultis a straightforward aggregate
129-131: LGTM!The lifecycle methods are appropriately declared. The separation of
Init()from the constructor is reasonable for complex initialization involving thread startup.
372-379: LGTM!The collection name constants follow the existing naming convention, and the batch size constant is appropriately defined as a compile-time constant with a clear explanatory comment.
388-392: LGTM!The thread synchronization members are properly designed:
cluster_rollup_mutex_protects the rollup critical sectionm_mongodb_sum_thread_manages the background rollup thread lifecyclem_thread_stop_usesstd::atomic_boolfor safe cross-thread signalingThe destructor properly coordinates thread shutdown by setting the flag and joining the thread.
fbe22d2 to
8b946f0
Compare
04e0fa1 to
3b8661e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/CraneCtld/CraneCtld.cpp (1)
776-789: Shut down Mongo rollup thread cleanly (reset g_db_client in Destroy).g_db_client owns a background uvw loop/thread; it’s never reset here, so the thread can outlive shutdown. Reset it to join the thread and close UV handles deterministically.
Apply:
void DestroyCtldGlobalVariables() { using namespace Ctld; + // Stop Mongo rollup thread and close UV handles before tearing down others. + g_db_client.reset(); + g_task_scheduler.reset(); g_craned_keeper.reset(); // In case that spdlog is destructed before g_embedded_db_client->Close() // in which log function is called. g_embedded_db_client.reset(); g_thread_pool->wait(); g_thread_pool.reset(); g_plugin_client.reset(); }src/CraneCtld/DbClient.cpp (1)
460-468: Bug: wrong field name in CheckTaskDbIdExistedFilter uses "job_db_inx", but documents store "task_db_id". This check will never match.
Apply:
-bool MongodbClient::CheckTaskDbIdExisted(int64_t task_db_id) { - document doc; - doc.append(kvp("job_db_inx", task_db_id)); +bool MongodbClient::CheckTaskDbIdExisted(int64_t task_db_id) { + document doc; + doc.append(kvp("task_db_id", task_db_id));
♻️ Duplicate comments (5)
scripts/wipe_data.sh (1)
3-3: Fix usage message formatting inconsistency.The usage string still has uneven spacing around pipes:
user_table| 7:lacks a space before|, whilemonth_table |8:has a space but inconsistent placement. Normalize to use consistent formatting throughout.-echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table| 7:hour_table+day_table+month_table |8: summary_time_table)" +echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table)"src/Utilities/PublicHeader/String.cpp (3)
538-567: Critical granularity bug and missing error checks remain unaddressed.A previous review flagged these issues, but they have not been fully resolved:
- Granularity bug (Line 542): Checking only
tm_hour == 0treats times like00:30:00as month boundaries, causing incorrect aggregation into month collections.- Missing error check (Line 553):
mktimecan return-1on error, but this is not checked.Apply the fixes from the previous review:
- std::tm tm_cur{}; - localtime_r(&cur, &tm_cur); - if (tm_cur.tm_mday == 1 && tm_cur.tm_hour == 0) { + std::tm tm_cur{}; + localtime_r(&cur, &tm_cur); + if (tm_cur.tm_mday == 1 && tm_cur.tm_hour == 0 && tm_cur.tm_min == 0 && tm_cur.tm_sec == 0) { std::time_t month_start = cur; std::time_t month_cur = cur; while (true) { std::tm tm_next_month{}; localtime_r(&month_cur, &tm_next_month); tm_next_month.tm_mon += 1; tm_next_month.tm_mday = 1; tm_next_month.tm_hour = 0; tm_next_month.tm_min = 0; tm_next_month.tm_sec = 0; std::time_t next_month_first = mktime(&tm_next_month); + if (next_month_first == -1) break; if (next_month_first <= end) {
570-598: Critical granularity bug and missing error checks remain unaddressed.The same issues flagged in the previous review apply here:
- Granularity bug (Line 574): Checking only
tm_hour == 0treats times like00:30:00as day boundaries, causing incorrect day aggregation.- Missing error checks (Lines 582, 589):
mktimecan return-1on error, but this is not checked.Apply the fixes:
- std::tm tm_cur{}; - localtime_r(&cur, &tm_cur); - if (tm_cur.tm_hour == 0) { + std::tm tm_cur{}; + localtime_r(&cur, &tm_cur); + if (tm_cur.tm_hour == 0 && tm_cur.tm_min == 0 && tm_cur.tm_sec == 0) { std::time_t day_start = cur; std::tm tm_next_month = tm_cur; tm_next_month.tm_mon += 1; tm_next_month.tm_mday = 1; tm_next_month.tm_hour = 0; tm_next_month.tm_min = 0; tm_next_month.tm_sec = 0; std::time_t next_month_first = mktime(&tm_next_month); + if (next_month_first == -1) return false; std::tm tm_end{}; localtime_r(&end, &tm_end); tm_end.tm_hour = 0; tm_end.tm_min = 0; tm_end.tm_sec = 0; std::time_t end_trunc_to_day = mktime(&tm_end); + if (end_trunc_to_day == -1) return false;
601-618: Missing error check for mktime.Line 611:
mktimecan return-1on error, but this is not checked, as flagged in the previous review.Apply the fix:
tm_next_day.tm_mday += 1; tm_next_day.tm_hour = 0; tm_next_day.tm_min = 0; tm_next_day.tm_sec = 0; std::time_t next_day = mktime(&tm_next_day); + if (next_day == -1) return false; std::time_t hour_end = std::min(next_day, end);src/CraneCtld/DbClient.h (1)
158-203: Address previous review feedback.The issues flagged in the previous review are still present:
Typo on line 202:
MongoDbSumaryTh_should beMongoDbSummaryTh_(missing 'm' in "Summary").Visibility concerns: Many methods in this section (lines 158-203) expose implementation details and should be moved to the private section. Only
QueryJobSummary(lines 183-185) andQueryJobSizeSummary(lines 189-191) are called fromCtldGrpcServerand need to remain public.
🧹 Nitpick comments (4)
protos/Crane.proto (1)
977-1006: Clarify time-bound semantics in doc commentsIn summaries, end time is treated as exclusive ([start, end)); in per-ID path it’s inclusive ($lte). Document the intended contract to avoid boundary mismatches for clients.
Would you like me to add brief proto comments indicating whether filter_end_time is inclusive or exclusive for each RPC?
src/CraneCtld/DbClient.cpp (2)
2942-2977: Add raw-table indexes for new filters (perf)QueryJobSizeSummaryByJobIds now filters on account/user/qos/partition/wckey/nodename_list/time_start/time_end without supporting indexes on the raw job table. Add them to avoid collection scans at scale.
Apply:
bool MongodbClient::InitTableIndexes() { try { // Create index for the raw task table auto raw_table = (*GetClient_())[m_db_name_][m_task_collection_name_]; - CreateCollectionIndex(raw_table, {"time_end"}); + CreateCollectionIndex(raw_table, {"time_end"}); + CreateCollectionIndex(raw_table, {"time_start"}); + CreateCollectionIndex(raw_table, {"account"}); + CreateCollectionIndex(raw_table, {"username"}); + CreateCollectionIndex(raw_table, {"qos"}); + CreateCollectionIndex(raw_table, {"partition_name"}); + CreateCollectionIndex(raw_table, {"wckey"}); + // nodename_list is an array -> multi-key index for $in on node names + CreateCollectionIndex(raw_table, {"nodename_list"});
2758-2762: Bound Mongo aggregation duration with maxTimeMSLong-running aggregations can block CQ threads. Set a reasonable max_time to fail fast under load spikes.
Apply:
- mongocxx::options::aggregate agg_opts; + mongocxx::options::aggregate agg_opts; agg_opts.allow_disk_use(true); + agg_opts.max_time(std::chrono::minutes(2));Do this in both QueryJobSummary and QueryJobSizeSummary aggregate paths.
Also applies to: 2875-2878
src/CraneCtld/DbClient.h (1)
388-393: Consider removing redundant access specifier.Line 391 contains a redundant
private:label, as the private section already began at line 317. While not incorrect, removing it would improve clarity.Apply this diff to remove the redundancy:
std::mutex cluster_rollup_mutex_; std::thread m_mongodb_sum_thread_; - private: std::atomic_bool m_thread_stop_{false}; std::shared_ptr<uvw::timer_handle> mongodb_task_timer_handle;
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
protos/Crane.proto(2 hunks)scripts/wipe_data.sh(3 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/DbClient.cpp(11 hunks)src/CraneCtld/DbClient.h(5 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(1 hunks)src/CraneCtld/RpcService/CtldGrpcServer.h(1 hunks)src/Utilities/PublicHeader/String.cpp(1 hunks)src/Utilities/PublicHeader/include/crane/String.h(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- src/Utilities/PublicHeader/include/crane/String.h
- src/CraneCtld/RpcService/CtldGrpcServer.h
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.
Applied to files:
src/Utilities/PublicHeader/String.cppsrc/CraneCtld/DbClient.h
📚 Learning: 2025-06-23T07:53:30.513Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-02T07:12:46.896Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/Craned/CranedServer.h:0-0
Timestamp: 2025-05-02T07:12:46.896Z
Learning: The initialization of `m_supervisor_recovered_` to `true` in the CranedServer class is intentional despite the comment saying "When supervisor ready, init with false". This is temporary until the supervisor functionality is fully implemented.
Applied to files:
src/CraneCtld/CraneCtld.cpp
📚 Learning: 2025-04-27T06:47:51.751Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 475
File: src/Craned/CtldClient.cpp:0-0
Timestamp: 2025-04-27T06:47:51.751Z
Learning: In CtldClient, when a CranedRegister RPC fails, the state is reset to NOTIFY_SENDING. This ensures that token will be refreshed by NotifyCranedConnected_() before any subsequent CranedRegister_() call, preventing null token access issues.
Applied to files:
src/CraneCtld/CraneCtld.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/CraneCtld.cpp
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/DbClient.h
📚 Learning: 2025-04-02T10:11:33.562Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.
Applied to files:
src/CraneCtld/DbClient.cpp
📚 Learning: 2025-05-08T09:35:39.809Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/Pmix/PmixCollRing.cpp:0-0
Timestamp: 2025-05-08T09:35:39.809Z
Learning: In the PMIx implementation for CraneSched, objects referenced in asynchronous gRPC callbacks (like `coll_ring_ctx`) remain valid as long as the parent object (`this`) is not destroyed. The `Coll` class uses shared_ptr management to ensure its lifetime extends through all pending callbacks.
Applied to files:
src/CraneCtld/DbClient.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-09T01:54:39.465Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Applied to files:
src/CraneCtld/DbClient.h
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/DbClient.h
🧬 Code graph analysis (4)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(49-57)
src/CraneCtld/DbClient.cpp (3)
src/Utilities/PublicHeader/String.cpp (4)
FormatTime(527-533)FormatTime(527-527)EfficientSplitTimeRange(621-639)EfficientSplitTimeRange(621-622)src/CraneCtld/DbClient.h (3)
string(71-73)string(377-420)MongodbClient(93-221)src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
QueryJobSummary(2046-2054)QueryJobSummary(2046-2049)QueryJobSizeSummary(2056-2068)QueryJobSizeSummary(2056-2059)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (2)
src/CraneCtld/DbClient.cpp (4)
QueryJobSummary(2682-2786)QueryJobSummary(2682-2684)QueryJobSizeSummary(2788-2913)QueryJobSizeSummary(2788-2790)src/Craned/Core/CranedPublicDefs.h (1)
g_runtime_status(162-162)
src/CraneCtld/DbClient.h (2)
src/CraneCtld/DbClient.cpp (48)
MongodbClient(3027-3051)MongodbClient(3053-3056)Init(2979-3002)Init(2979-2979)InitTableIndexes(2942-2977)InitTableIndexes(2942-2942)CreateCollectionIndex(2051-2073)CreateCollectionIndex(2051-2052)RollupTypeToString(1866-1877)RollupTypeToString(1866-1866)UpdateSummaryLastSuccessTimeSec(1879-1913)UpdateSummaryLastSuccessTimeSec(1879-1880)GetSummaryLastSuccessTimeTm(1915-1962)GetSummaryLastSuccessTimeTm(1915-1916)NeedRollup(1964-1981)NeedRollup(1964-1965)RollupSummary(1983-2017)RollupSummary(1983-1983)ClusterRollupUsage(2019-2049)ClusterRollupUsage(2019-2019)GetCpusAlloc(2627-2636)GetCpusAlloc(2627-2627)GetTotalCpuTime(2638-2647)GetTotalCpuTime(2638-2638)GetTotalCount(2649-2656)GetTotalCount(2649-2649)WriteReply(2658-2680)WriteReply(2658-2661)QueryJobSizeSummaryByJobIds(2338-2490)QueryJobSizeSummaryByJobIds(2338-2340)JobQueryMatch(2492-2536)JobQueryMatch(2492-2494)QueryJobSummary(2682-2786)QueryJobSummary(2682-2684)JobSizeQueryMatch(2538-2602)JobSizeQueryMatch(2538-2540)QueryJobSizeSummary(2788-2913)QueryJobSizeSummary(2788-2790)AppendUnionWithRanges(2605-2625)AppendUnionWithRanges(2605-2608)AggregateJobSummaryByHour(2075-2195)AggregateJobSummaryByHour(2075-2077)AggregateJobSummaryByDayOrMonth(2197-2336)AggregateJobSummaryByDayOrMonth(2197-2200)MongoDbSumaryTh_(3004-3025)MongoDbSumaryTh_(3004-3005)MillisecondsToNextHour(2915-2940)MillisecondsToNextHour(2915-2915)src/CraneCtld/RpcService/CtldGrpcServer.cpp (4)
QueryJobSummary(2046-2054)QueryJobSummary(2046-2049)QueryJobSizeSummary(2056-2068)QueryJobSizeSummary(2056-2059)
🔇 Additional comments (12)
scripts/wipe_data.sh (3)
33-47: New batch insertion functions look good.The two new helper functions cleanly separate array-based imports (
--jsonArray) from single-document imports. Clear naming and consistent parameter handling.
72-76: Mode 3 correctly removed from summary table wiping.The summary table wiping conditions now only trigger for modes 5 and 7, aligning with documented behavior.
78-80: Mode 3 correctly removed from summary_time_table wiping.The summary_time_table wiping now only triggers for modes 5 and 8, as intended.
src/Utilities/PublicHeader/String.cpp (2)
527-533: LGTM!The function correctly uses
localtime_r(thread-safe on POSIX) andstd::strftimefor formatting.
621-639: Logic is sound, assuming helper functions are fixed.The orchestration logic correctly tries months, then days, then hours, with a fallback for partial intervals. Once the critical bugs in
TryPushMonths,TryPushDays, andTryPushHoursare addressed, this function should work correctly.src/CraneCtld/CraneCtld.cpp (1)
884-889: Good: gate readiness on DB Init()Init() result is now checked before setting srv_ready. This prevents RPCs against a half-initialized DB client.
src/CraneCtld/DbClient.cpp (3)
1535-1567: Persist nodename_list in job docsCapturing craned_ids into nodename_list and carrying it into summaries looks correct and enables node-based filters.
Also applies to: 1578-1615
2051-2185: Note on summary time window and fieldsThe hour rollup matches jobs whose time_end ∈ [hour, hour+3600) and groups on cpus_alloc, partition_name, and nodename_list. This is consistent; just ensure clients expect “bucketed by end time”.
Do you want me to add a short comment in code to lock in this invariant?
2984-3001: Timer lifetime fix looks goodStoring uvw timer in a member keeps the handle alive past Init(); destructor joins the thread — safe cleanup path.
src/CraneCtld/DbClient.h (3)
107-127: LGTM!The
RollupTypeenum and data structures are well-defined. TheJobSizeSummaryKeystruct correctly implements both hash function (viaAbslHashValue) and equality operator for use in hash-based containers, andJobSizeSummaryResultproperly initializes its members.
129-131: LGTM!Constructor, destructor, and
Init()method declarations are appropriately placed in the public interface.
372-379: LGTM!The collection name constants and
MaxJobSummaryBatchSizeare properly defined and follow the existing naming conventions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (1)
docs/zh/command/creport.md (1)
30-30: Minor: Chinese character consistency in account terminology.Lines 30 both use "帐户" (variant for "account"), while standard modern Chinese documentation typically uses "账户" consistently. Consider standardizing to "账户" throughout for consistency with other documentation files.
This is a minor style preference issue. Both forms are technically valid, but "账户" is the more commonly standardized form in modern Chinese technical documentation.
Also applies to: 30-30
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
docs/images/creport/accountutilizationbyqos.pngis excluded by!**/*.pngdocs/images/creport/accountutilizationbyuser.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccount.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccountandwckey.pngis excluded by!**/*.pngdocs/images/creport/sizesbywckey.pngis excluded by!**/*.pngdocs/images/creport/topusr.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbyaccount.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbywckey.pngis excluded by!**/*.pngdocs/images/creport/utilization.pngis excluded by!**/*.pngdocs/images/creport/wckeyutilizationbyuser.pngis excluded by!**/*.png
📒 Files selected for processing (13)
docs/en/command/cacct.md(1 hunks)docs/en/command/cbatch.md(1 hunks)docs/en/command/ccancel.md(1 hunks)docs/en/command/ceff.md(1 hunks)docs/en/command/cqueue.md(1 hunks)docs/en/command/creport.md(1 hunks)docs/zh/command/cacct.md(1 hunks)docs/zh/command/cbatch.md(1 hunks)docs/zh/command/ccancel.md(1 hunks)docs/zh/command/ceff.md(1 hunks)docs/zh/command/cqueue.md(1 hunks)docs/zh/command/creport.md(1 hunks)mkdocs.yaml(1 hunks)
✅ Files skipped from review due to trivial changes (4)
- docs/en/command/cqueue.md
- docs/en/command/cbatch.md
- docs/zh/command/cacct.md
- docs/zh/command/cqueue.md
🧰 Additional context used
🪛 LanguageTool
docs/en/command/creport.md
[grammar] ~20-~20: Ensure spelling is correct
Context: ...ER**: Cluster name - LOGIN: Username - PORPER NAME: Full Linux system name - **ACCO...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~54-~54: Ensure spelling is correct
Context: ...NT**: Account name - LOGIN: Username - PORPER NAME: Full Linux system name - **USED...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~85-~85: Ensure spelling is correct
Context: ...ER**: Cluster name - LOGIN: Username - PORPER NAME: Full Linux system name - **ACCO...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~117-~117: Ensure spelling is correct
Context: ...ER**: Cluster name - LOGIN: Username - PORPER NAME: Full Linux system name - **WCKE...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
[grammar] ~148-~148: Ensure spelling is correct
Context: ...WCKEY**:wckey name - LOGIN: Username - PORPER NAME: Full Linux system name - **USED...
(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)
docs/zh/command/creport.md
[uncategorized] ~30-~30: 您的意思是“"账"户”?
Context: ...定多个用户时用逗号隔开 - --group bool: 将每个用户的所有帐户分组在一起(默认为每个用户和帐户引用的单独条目) - **-t, --time...
(ZHANG7_ZHANG8)
[uncategorized] ~30-~30: 您的意思是“"账"户”?
Context: ...--group bool**: 将每个用户的所有帐户分组在一起(默认为每个用户和帐户引用的单独条目) - -t, --time string: 指定输出作...
(ZHANG7_ZHANG8)
🔇 Additional comments (8)
docs/en/command/ccancel.md (1)
212-212: LGTM!The addition of the creport reference is consistent with the "See Also" section structure and maintains documentation parity with other command references.
docs/zh/command/ceff.md (1)
95-95: LGTM!The addition of the creport reference is consistent with the "相关命令" (Related Commands) section and maintains documentation parity with other entries.
docs/en/command/cacct.md (1)
277-277: LGTM!The addition of the creport reference is consistent with the "See Also" section and properly positioned alongside other command references.
docs/zh/command/ccancel.md (1)
212-212: LGTM!The addition of the creport reference is consistent with the "另请参阅" (See Also) section structure and maintains documentation parity with other command references.
mkdocs.yaml (1)
167-167: LGTM!The creport navigation entry is properly formatted, correctly placed under the "Job Monitoring" section, and follows YAML syntax conventions established in the file.
docs/zh/command/cbatch.md (1)
338-338: LGTM!The addition of the creport reference is consistent with the "另请参阅" (See Also) section and maintains documentation parity with other related command references.
docs/en/command/ceff.md (1)
95-95: LGTM!The addition of the creport reference is consistent with the "Related Commands" section and properly positioned with other command references.
docs/zh/command/creport.md (1)
1-361: Comprehensive documentation structure approved.The new creport documentation is well-organized, covering multiple subcommands with consistent output fields, options, and usage examples. Documentation structure aligns with existing command docs.
0e4afaa to
d26ab58
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
docs/images/creport/accountutilizationbyqos.pngis excluded by!**/*.pngdocs/images/creport/accountutilizationbyuser.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccount.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccountandwckey.pngis excluded by!**/*.pngdocs/images/creport/sizesbywckey.pngis excluded by!**/*.pngdocs/images/creport/topusr.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbyaccount.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbywckey.pngis excluded by!**/*.pngdocs/images/creport/utilization.pngis excluded by!**/*.pngdocs/images/creport/wckeyutilizationbyuser.pngis excluded by!**/*.png
📒 Files selected for processing (13)
docs/en/command/cacct.md(1 hunks)docs/en/command/cbatch.md(1 hunks)docs/en/command/ccancel.md(1 hunks)docs/en/command/ceff.md(1 hunks)docs/en/command/cqueue.md(1 hunks)docs/en/command/creport.md(1 hunks)docs/zh/command/cacct.md(1 hunks)docs/zh/command/cbatch.md(1 hunks)docs/zh/command/ccancel.md(1 hunks)docs/zh/command/ceff.md(1 hunks)docs/zh/command/cqueue.md(1 hunks)docs/zh/command/creport.md(1 hunks)mkdocs.yaml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/en/command/cqueue.md
🚧 Files skipped from review as they are similar to previous changes (6)
- docs/en/command/cacct.md
- docs/zh/command/cacct.md
- docs/en/command/creport.md
- docs/en/command/cbatch.md
- docs/zh/command/cqueue.md
- docs/zh/command/ccancel.md
🧰 Additional context used
🪛 LanguageTool
docs/zh/command/creport.md
[uncategorized] ~30-~30: 您的意思是“"账"户”?
Context: ...定多个用户时用逗号隔开 - --group bool: 将每个用户的所有帐户分组在一起(默认为每个用户和帐户引用的单独条目) - **-t, --time...
(ZHANG7_ZHANG8)
[uncategorized] ~30-~30: 您的意思是“"账"户”?
Context: ...--group bool**: 将每个用户的所有帐户分组在一起(默认为每个用户和帐户引用的单独条目) - -t, --time string: 指定输出作...
(ZHANG7_ZHANG8)
🪛 markdownlint-cli2 (0.18.1)
docs/zh/command/creport.md
13-13: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
42-42: Multiple headings with the same content
(MD024, no-duplicate-heading)
46-46: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
78-78: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
110-110: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
140-140: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
170-170: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
201-201: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
230-230: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
270-270: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
310-310: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
349-349: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🔇 Additional comments (5)
docs/en/command/ccancel.md (1)
212-212: LGTM!The cross-reference to creport is properly placed in the "See Also" section and follows the existing documentation pattern.
docs/zh/command/ceff.md (1)
95-95: LGTM!The creport reference is properly added to the "相关命令" section, maintaining consistency with the English documentation and existing link patterns.
docs/en/command/ceff.md (1)
95-95: LGTM!The creport reference is properly added to the "Related Commands" section, maintaining consistency with the documentation structure and link patterns.
mkdocs.yaml (1)
167-167: LGTM!The creport navigation entry is correctly placed under the "Job Monitoring" section with proper YAML syntax and indentation. The file path is consistent with the documentation structure.
docs/zh/command/cbatch.md (1)
338-338: LGTM!The creport reference is properly added to the "另请参阅" section, maintaining consistency with the cbatch documentation structure and link patterns.
cebb70e to
14bd0f3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
♻️ Duplicate comments (1)
scripts/wipe_data.sh (1)
3-3: Fix usage message formatting (formatting issue persists from previous review).The usage string still has inconsistent spacing and formatting despite being marked as addressed in commit fbe22d2:
- After mode 6, missing space before pipe:
user_table| 7:→user_table | 7:- Before mode 8, incorrect spacing:
|8: summary→| 8:summaryApply this diff to achieve consistent formatting across all modes:
-echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table| 7:hour_table+day_table+month_table |8: summary_time_table)" +echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table)"
🧹 Nitpick comments (1)
scripts/wipe_data.sh (1)
33-39: Remove unusedbatch_insert_collection()function.This function is defined but never called in the script. Line 83 uses
batch_insert_collection_process()instead. If this function is intended for future use, document it; otherwise, remove it to avoid dead code.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
docs/images/creport/accountutilizationbyqos.pngis excluded by!**/*.pngdocs/images/creport/accountutilizationbyuser.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccount.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccountandwckey.pngis excluded by!**/*.pngdocs/images/creport/sizesbywckey.pngis excluded by!**/*.pngdocs/images/creport/topusr.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbyaccount.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbywckey.pngis excluded by!**/*.pngdocs/images/creport/utilization.pngis excluded by!**/*.pngdocs/images/creport/wckeyutilizationbyuser.pngis excluded by!**/*.png
📒 Files selected for processing (14)
docs/en/command/cacct.md(1 hunks)docs/en/command/cbatch.md(1 hunks)docs/en/command/ccancel.md(1 hunks)docs/en/command/ceff.md(1 hunks)docs/en/command/cqueue.md(1 hunks)docs/en/command/creport.md(1 hunks)docs/zh/command/cacct.md(1 hunks)docs/zh/command/cbatch.md(1 hunks)docs/zh/command/ccancel.md(1 hunks)docs/zh/command/ceff.md(1 hunks)docs/zh/command/cqueue.md(1 hunks)docs/zh/command/creport.md(1 hunks)mkdocs.yaml(1 hunks)scripts/wipe_data.sh(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- mkdocs.yaml
🚧 Files skipped from review as they are similar to previous changes (8)
- docs/en/command/cacct.md
- docs/en/command/ccancel.md
- docs/en/command/ceff.md
- docs/zh/command/cbatch.md
- docs/zh/command/ceff.md
- docs/zh/command/ccancel.md
- docs/en/command/cbatch.md
- docs/en/command/creport.md
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(49-57)
🪛 markdownlint-cli2 (0.18.1)
docs/zh/command/creport.md
349-349: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🔇 Additional comments (7)
scripts/wipe_data.sh (4)
41-47:batch_insert_collection_process()implementation looks correct.The function properly imports newline-delimited JSON (omits
--jsonArray), distinct from the unused array variant above. The credential passing and database selection are correct.
72-76: Mode 7 summary table wiping logic is correct.Conditions properly restrict to modes 5 and 7 only, eliminating the unintended wipe of summary tables when mode 3 (task_table) is selected. This correctly addresses the previous critical issue.
78-80: Mode 8 summary_time_table wiping logic is correct.Conditions properly restrict to modes 5 and 8 only. The fix from the previous review that removed mode 3 is correctly in place.
82-89: Mode 10 batch insert with error handling is properly implemented.The script now correctly captures the exit status of
batch_insert_collection_process, logs an error on failure, and exits with status 1. This addresses the previous critical issue where import failures were silently masked.docs/en/command/cqueue.md (1)
213-213: LGTM! The new creport link follows the established pattern and correctly references the related command documentation.docs/zh/command/cqueue.md (1)
213-213: LGTM! The new creport link follows the established pattern and correctly references the related command documentation.docs/zh/command/cacct.md (1)
277-277: LGTM! The new creport link follows the established pattern and correctly references the related command documentation.
534afe7 to
656b876
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
scripts/wipe_data.sh (1)
33-47: Clarify or consolidate nearly-identical batch insert functions.Both functions are duplicates except for the
--jsonArrayflag. The naming distinction (_processsuffix) doesn't clearly indicate that one expects an array JSON file and the other expects line-delimited JSON (JSONL). Additionally,batch_insert_collection(lines 33–39) appears unused—onlybatch_insert_collection_processis called in mode 10.Consider renaming for clarity (e.g.,
batch_insert_json_arrayvs.batch_insert_jsonl), or consolidate them into a single function with an optional parameter if only one format is needed. Ifbatch_insert_collectionis genuinely unused, remove it to avoid maintenance overhead.docs/zh/command/creport.md (1)
289-289: Add missing space after colon in list items.Lines 289 and 330 lack a space between the colon and description text, breaking the formatting pattern established throughout the file. Compare with line 329 which correctly has the space:
-- **-w --wckeys string**:指定查询的wckeys,指定多个wckey时用逗号隔开 +- **-w --wckeys string**: 指定查询的wckeys,指定多个wckey时用逗号隔开Apply the same fix to both line 289 and 330.
Also applies to: 330-330
docs/en/command/creport.md (1)
289-289: Add missing space after colon in list items.Lines 289 and 330 lack a space between the colon and description text, breaking the formatting pattern established throughout the file. Compare with surrounding lines which correctly have the space:
-- **-w --wckeys string**:Specify WCKEYs to query, use commas to separate multiple WCKEYs +- **-w --wckeys string**: Specify WCKEYs to query, use commas to separate multiple WCKEYsApply the same fix to both line 289 and 330 for consistency.
Also applies to: 330-330
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
docs/images/creport/accountutilizationbyqos.pngis excluded by!**/*.pngdocs/images/creport/accountutilizationbyuser.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccount.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccountandwckey.pngis excluded by!**/*.pngdocs/images/creport/sizesbywckey.pngis excluded by!**/*.pngdocs/images/creport/topusr.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbyaccount.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbywckey.pngis excluded by!**/*.pngdocs/images/creport/utilization.pngis excluded by!**/*.pngdocs/images/creport/wckeyutilizationbyuser.pngis excluded by!**/*.png
📒 Files selected for processing (14)
docs/en/command/cacct.md(1 hunks)docs/en/command/cbatch.md(1 hunks)docs/en/command/ccancel.md(1 hunks)docs/en/command/ceff.md(1 hunks)docs/en/command/cqueue.md(1 hunks)docs/en/command/creport.md(1 hunks)docs/zh/command/cacct.md(1 hunks)docs/zh/command/cbatch.md(1 hunks)docs/zh/command/ccancel.md(1 hunks)docs/zh/command/ceff.md(1 hunks)docs/zh/command/cqueue.md(1 hunks)docs/zh/command/creport.md(1 hunks)mkdocs.yaml(1 hunks)scripts/wipe_data.sh(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/zh/command/cbatch.md
🚧 Files skipped from review as they are similar to previous changes (5)
- docs/en/command/cacct.md
- docs/zh/command/ccancel.md
- mkdocs.yaml
- docs/zh/command/cqueue.md
- docs/en/command/cqueue.md
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(49-57)
🪛 markdownlint-cli2 (0.18.1)
docs/zh/command/creport.md
349-349: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🔇 Additional comments (7)
scripts/wipe_data.sh (2)
72-80: Correct application of mode-specific summary table wiping logic.The removal of mode 3 from the summary table conditions (lines 72 and 78) correctly aligns the implementation with the documented behavior in the usage message. Mode 3 now exclusively wipes
task_tableas documented, and modes 7 and 8 properly isolate their respective summary table subsets.
82-89: Error handling for mode 10 import is properly implemented.The script now correctly captures the exit status of
batch_insert_collection_process, logs an informative error message on failure, and exits with a non-zero status. This properly propagates import failures to callers and CI systems, resolving the earlier flagged issue.docs/en/command/ccancel.md (1)
212-212: LGTM!Cross-reference addition is correctly formatted and placed.
docs/en/command/ceff.md (1)
95-95: LGTM!Cross-reference addition is correctly formatted and placed.
docs/zh/command/cacct.md (1)
277-277: LGTM!Cross-reference addition is correctly formatted and placed.
docs/zh/command/ceff.md (1)
95-95: LGTM!Cross-reference addition is correctly formatted and placed.
docs/en/command/cbatch.md (1)
338-338: LGTM!Cross-reference addition is correctly formatted and placed.
2ea5ffb to
5a0f4e7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (2)
scripts/wipe_data.sh (2)
33-47: Consolidate duplicate batch insert functions.
batch_insert_collection()andbatch_insert_collection_process()are nearly identical, differing only by the--jsonArrayflag. Consolidate into one function with a parameter to control this behavior:-function batch_insert_collection() { - collection=$1 - json_file=$2 - mongoimport --username "$username" --password "$password" --host "$host" --port "$port" \ - --authenticationDatabase admin --db "$dbname" --collection "$collection" \ - --file "$json_file" --jsonArray -} - -function batch_insert_collection_process() { - collection=$1 - json_file=$2 - mongoimport --username "$username" --password "$password" --host "$host" --port "$port" \ - --authenticationDatabase admin --db "$dbname" --collection "$collection" \ - --file "$json_file" -} +function batch_insert_collection() { + collection=$1 + json_file=$2 + is_array_format=${3:-0} # Default to 0 (JSONL format) + + local extra_args="" + if [ "$is_array_format" -eq 1 ]; then + extra_args="--jsonArray" + fi + + mongoimport --username "$username" --password "$password" --host "$host" --port "$port" \ + --authenticationDatabase admin --db "$dbname" --collection "$collection" \ + --file "$json_file" $extra_args +}Then update line 83 to
batch_insert_collection "task_table" "./tasks_bulk.json" 0.
82-89: Verify hardcoded path portability and consider making it configurable.Line 83 uses a hardcoded path
./tasks_bulk.json. This may not work reliably depending on the working directory from which the script is invoked. Consider making it a parameter or deriving it from a configuration variable (e.g.,$base_dir):if [ "$mode" -eq 10 ]; then - batch_insert_collection_process "task_table" "./tasks_bulk.json" + tasks_file="${1:-.}/tasks_bulk.json" # Allow override via second arg, default to ./ + batch_insert_collection_process "task_table" "$tasks_file"Alternatively, document the expected working directory or file location in the usage message.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
docs/images/creport/accountutilizationbyqos.pngis excluded by!**/*.pngdocs/images/creport/accountutilizationbyuser.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccount.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccountandwckey.pngis excluded by!**/*.pngdocs/images/creport/sizesbywckey.pngis excluded by!**/*.pngdocs/images/creport/topusr.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbyaccount.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbywckey.pngis excluded by!**/*.pngdocs/images/creport/utilization.pngis excluded by!**/*.pngdocs/images/creport/wckeyutilizationbyuser.pngis excluded by!**/*.png
📒 Files selected for processing (14)
docs/en/command/cacct.md(1 hunks)docs/en/command/cbatch.md(1 hunks)docs/en/command/ccancel.md(1 hunks)docs/en/command/ceff.md(1 hunks)docs/en/command/cqueue.md(1 hunks)docs/en/command/creport.md(1 hunks)docs/zh/command/cacct.md(1 hunks)docs/zh/command/cbatch.md(1 hunks)docs/zh/command/ccancel.md(1 hunks)docs/zh/command/ceff.md(1 hunks)docs/zh/command/cqueue.md(1 hunks)docs/zh/command/creport.md(1 hunks)mkdocs.yaml(1 hunks)scripts/wipe_data.sh(3 hunks)
✅ Files skipped from review due to trivial changes (1)
- docs/en/command/ceff.md
🚧 Files skipped from review as they are similar to previous changes (6)
- docs/en/command/cacct.md
- docs/en/command/cqueue.md
- docs/en/command/ccancel.md
- docs/zh/command/ccancel.md
- docs/zh/command/cqueue.md
- docs/en/command/cbatch.md
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(49-57)
🪛 markdownlint-cli2 (0.18.1)
docs/zh/command/creport.md
349-349: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
🔇 Additional comments (5)
docs/zh/command/cacct.md (1)
277-277: LGTM!Cross-reference link is properly formatted and contextually appropriate within the "另请参阅" section.
docs/zh/command/ceff.md (1)
95-95: LGTM!Cross-reference link is properly formatted and contextually appropriate within the "相关命令" section.
docs/zh/command/cbatch.md (1)
338-338: LGTM!Cross-reference link is properly formatted and contextually appropriate within the "另请参阅" section.
mkdocs.yaml (1)
167-167: LGTM!Navigation entry correctly placed under "Job Monitoring" section with consistent formatting and file path structure.
docs/zh/command/creport.md (1)
1-361: LGTM!The Chinese documentation is comprehensive and well-structured. All major formatting issues from previous reviews (MD036 headings, terminology consistency, markdown list formatting) have been properly addressed. The content clearly documents all creport subcommands with appropriate sections for usage, options, and examples.
| #!/bin/bash | ||
|
|
||
| echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table)" | ||
| echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table| 7:hour_table+day_table+month_table |8: summary_time_table)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion | 🟠 Major
Update usage message: fix formatting and document mode 10.
The usage message has two issues:
- Inconsistent spacing around pipes and colons (compare
| 6:...| 7:vs.|8: summary_time_table) - Mode 10 (task import) is implemented but undocumented in the usage string
-echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table| 7:hour_table+day_table+month_table |8: summary_time_table)"
+echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table | 10:import_tasks)"🤖 Prompt for AI Agents
In scripts/wipe_data.sh around line 3, the usage message has inconsistent
spacing around pipes and colons and omits the implemented mode 10; update the
echo string to use consistent spacing (space before and after each pipe and
after each colon) and add a clear entry for "10: task_import" (or the exact mode
name used in the script) so the usage lists modes 1 through 10 consistently and
readably.
e1b19b6 to
caf1ce1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (3)
scripts/wipe_data.sh (1)
3-3: Document mode 10 in the usage message.The usage message lists modes 1-8, but mode 10 (task import) is implemented at lines 82-89 and is not documented here. Users won't know this mode exists.
-echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table| 7:hour_table+day_table+month_table |8: summary_time_table)" +echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table | 10:import_tasks)"src/CraneCtld/DbClient.h (1)
202-202: Typo in method name.
MongoDbSumaryTh_should beMongoDbSummaryTh_(missing 'm' in "Summary").Apply this diff:
- void MongoDbSumaryTh_(const std::shared_ptr<uvw::loop>& uvw_loop); + void MongoDbSummaryTh_(const std::shared_ptr<uvw::loop>& uvw_loop);Remember to update the implementation in DbClient.cpp as well.
src/Utilities/PublicHeader/String.cpp (1)
544-656: Granularity alignment for month/day ranges is still off at 00:mm:ss starts.The switch to
localtime_ris good for thread‑safety, but the earlier “granularity alignment” issue remains: you still treat any time attm_hour == 0(andtm_mday == 1for months) as a full day/month, regardless of minutes/seconds.Concretely, for ranges like:
start = 2025‑05‑01 00:05,end = 2025‑07‑03 12:34
TryPushMonthswill currently accept this as a"month"segment starting at 00:05 ifendis far enough ahead.start = 2025‑05‑01 00:05,end = 2025‑05‑03 00:00
TryPushDayswill produce a"day"segment from00:05on the 1st through00:00on the 3rd.In both cases, the first partial day/hour is labeled as a
"day"/"month"interval even though it doesn’t start exactly at midnight. This is risky if you’re using these labels to decide which MongoDB rollup collection (month/day/hour) to query; you can end up querying a coarser collection that assumes [00:00, 00:00) alignment and miscounting the leading minutes/seconds.Suggested fix (same spirit as the previous review):
- Tighten the preconditions before merging months/days, e.g.:
- if (tm_cur.tm_mday == 1 && tm_cur.tm_hour == 0) { + if (tm_cur.tm_mday == 1 && tm_cur.tm_hour == 0 && + tm_cur.tm_min == 0 && tm_cur.tm_sec == 0) {- if (tm_cur.tm_hour == 0) { + if (tm_cur.tm_hour == 0 && tm_cur.tm_min == 0 && tm_cur.tm_sec == 0) {
- Optionally also check return values from
localtime_r/mktimeand early‑returnfalseif they fail, to avoid undefined behavior on out‑of‑range timestamps or pathological DST transitions.This keeps the “month”/“day” buckets strictly aligned to exact local midnights, and any non‑aligned range start will naturally fall back to
"hour"segments.
🧹 Nitpick comments (1)
src/Utilities/PublicHeader/include/crane/String.h (1)
42-47: Narrow public API surface for time ranges; optionally strengthen type.
TimeRangeandEfficientSplitTimeRangemake sense as public utilities, but theTryPushMonths/Days/Hourshelpers look like purely internal orchestration details that are only used inString.cpp. Exposing them in the header grows the public API surface and makes it harder to change the splitting strategy later.Consider:
- Keeping only
TimeRangeandEfficientSplitTimeRangein the header, and movingTryPushMonths/Days/Hoursto an anonymous namespace inString.cpp(or making themstatic), unless you explicitly intend them to be part of the stable API.- Optionally replacing the
std::string typediscriminator ("month"/"day"/"hour") with anenum class(and converting to string only at the boundary where you serialize or log) to avoid magic strings and improve type safety.- Optionally including
<vector>and<ctime>here so the header does not rely on transitive includes forstd::vectorandstd::time_t.Also applies to: 138-146
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (10)
docs/images/creport/accountutilizationbyqos.pngis excluded by!**/*.pngdocs/images/creport/accountutilizationbyuser.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccount.pngis excluded by!**/*.pngdocs/images/creport/sizesbyaccountandwckey.pngis excluded by!**/*.pngdocs/images/creport/sizesbywckey.pngis excluded by!**/*.pngdocs/images/creport/topusr.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbyaccount.pngis excluded by!**/*.pngdocs/images/creport/userutilizationbywckey.pngis excluded by!**/*.pngdocs/images/creport/utilization.pngis excluded by!**/*.pngdocs/images/creport/wckeyutilizationbyuser.pngis excluded by!**/*.png
📒 Files selected for processing (24)
docs/en/command/cacct.md(1 hunks)docs/en/command/cbatch.md(1 hunks)docs/en/command/ccancel.md(1 hunks)docs/en/command/ceff.md(1 hunks)docs/en/command/cqueue.md(1 hunks)docs/en/command/creport.md(1 hunks)docs/zh/command/cacct.md(1 hunks)docs/zh/command/cbatch.md(1 hunks)docs/zh/command/ccancel.md(1 hunks)docs/zh/command/ceff.md(1 hunks)docs/zh/command/cqueue.md(1 hunks)docs/zh/command/creport.md(1 hunks)mkdocs.yaml(1 hunks)protos/Crane.proto(2 hunks)protos/PublicDefs.proto(1 hunks)scripts/wipe_data.sh(3 hunks)src/CraneCtld/CraneCtld.cpp(1 hunks)src/CraneCtld/CtldPreCompiledHeader.h(1 hunks)src/CraneCtld/DbClient.cpp(11 hunks)src/CraneCtld/DbClient.h(5 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(1 hunks)src/CraneCtld/RpcService/CtldGrpcServer.h(1 hunks)src/Utilities/PublicHeader/String.cpp(1 hunks)src/Utilities/PublicHeader/include/crane/String.h(2 hunks)
✅ Files skipped from review due to trivial changes (2)
- docs/en/command/cqueue.md
- docs/zh/command/ccancel.md
🚧 Files skipped from review as they are similar to previous changes (9)
- docs/en/command/creport.md
- docs/zh/command/cbatch.md
- protos/PublicDefs.proto
- docs/zh/command/ceff.md
- src/CraneCtld/CtldPreCompiledHeader.h
- docs/zh/command/cacct.md
- docs/en/command/ceff.md
- src/CraneCtld/CraneCtld.cpp
- docs/zh/command/creport.md
🧰 Additional context used
🧠 Learnings (8)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/RpcService/CtldGrpcServer.hsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/RpcService/CtldGrpcServer.h
📚 Learning: 2025-06-23T07:53:30.513Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/RpcService/CtldGrpcServer.hsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.
Applied to files:
src/Utilities/PublicHeader/String.cppsrc/Utilities/PublicHeader/include/crane/String.hsrc/CraneCtld/DbClient.h
📚 Learning: 2025-05-09T01:54:39.465Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Applied to files:
src/CraneCtld/DbClient.h
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/DbClient.h
📚 Learning: 2025-04-02T10:11:33.562Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.
Applied to files:
src/CraneCtld/DbClient.cpp
📚 Learning: 2025-05-08T09:35:39.809Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/Pmix/PmixCollRing.cpp:0-0
Timestamp: 2025-05-08T09:35:39.809Z
Learning: In the PMIx implementation for CraneSched, objects referenced in asynchronous gRPC callbacks (like `coll_ring_ctx`) remain valid as long as the parent object (`this`) is not destroyed. The `Coll` class uses shared_ptr management to ensure its lifetime extends through all pending callbacks.
Applied to files:
src/CraneCtld/DbClient.cpp
🧬 Code graph analysis (6)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (2)
src/CraneCtld/DbClient.cpp (4)
QueryJobSummary(2712-2816)QueryJobSummary(2712-2714)QueryJobSizeSummary(2818-2943)QueryJobSizeSummary(2818-2820)src/Craned/Core/CranedPublicDefs.h (1)
g_runtime_status(166-166)
src/Utilities/PublicHeader/include/crane/String.h (1)
src/Utilities/PublicHeader/String.cpp (10)
FormatTime(544-550)FormatTime(544-544)TryPushHours(618-635)TryPushHours(618-619)TryPushMonths(555-584)TryPushMonths(555-556)TryPushDays(587-615)TryPushDays(587-588)EfficientSplitTimeRange(638-656)EfficientSplitTimeRange(638-639)
src/CraneCtld/RpcService/CtldGrpcServer.h (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (6)
QueryJobSummary(2058-2066)QueryJobSummary(2058-2061)QueryJobSizeSummary(2068-2080)QueryJobSizeSummary(2068-2071)ActiveAggregationManually(2082-2097)ActiveAggregationManually(2082-2085)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(51-59)
src/CraneCtld/DbClient.h (1)
src/CraneCtld/DbClient.cpp (48)
MongodbClient(3057-3081)MongodbClient(3083-3086)Init(3009-3032)Init(3009-3009)InitTableIndexes(2972-3007)InitTableIndexes(2972-2972)CreateCollectionIndex(2081-2103)CreateCollectionIndex(2081-2082)RollupTypeToString(1896-1907)RollupTypeToString(1896-1896)UpdateSummaryLastSuccessTimeSec(1909-1943)UpdateSummaryLastSuccessTimeSec(1909-1910)GetSummaryLastSuccessTimeTm(1945-1992)GetSummaryLastSuccessTimeTm(1945-1946)NeedRollup(1994-2011)NeedRollup(1994-1995)RollupSummary(2013-2047)RollupSummary(2013-2013)ClusterRollupUsage(2049-2079)ClusterRollupUsage(2049-2049)GetCpusAlloc(2657-2666)GetCpusAlloc(2657-2657)GetTotalCpuTime(2668-2677)GetTotalCpuTime(2668-2668)GetTotalCount(2679-2686)GetTotalCount(2679-2679)WriteReply(2688-2710)WriteReply(2688-2691)QueryJobSizeSummaryByJobIds(2368-2520)QueryJobSizeSummaryByJobIds(2368-2370)JobQueryMatch(2522-2566)JobQueryMatch(2522-2524)QueryJobSummary(2712-2816)QueryJobSummary(2712-2714)JobSizeQueryMatch(2568-2632)JobSizeQueryMatch(2568-2570)QueryJobSizeSummary(2818-2943)QueryJobSizeSummary(2818-2820)AppendUnionWithRanges(2635-2655)AppendUnionWithRanges(2635-2638)AggregateJobSummaryByHour(2105-2225)AggregateJobSummaryByHour(2105-2107)AggregateJobSummaryByDayOrMonth(2227-2366)AggregateJobSummaryByDayOrMonth(2227-2230)MongoDbSumaryTh_(3034-3055)MongoDbSumaryTh_(3034-3035)MillisecondsToNextHour(2945-2970)MillisecondsToNextHour(2945-2945)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
string(71-73)string(377-426)MongodbClient(93-221)
🪛 YAMLlint (1.37.1)
mkdocs.yaml
[error] 190-190: duplication of key "nav" in mapping
(key-duplicates)
🔇 Additional comments (27)
docs/en/command/ccancel.md (1)
212-212: LGTM! Documentation cross-reference added correctly.The addition of the creport reference in the See Also section is consistent with the PR's objective to introduce creport documentation and integrate it across command references.
scripts/wipe_data.sh (2)
33-47: LGTM! Batch import functions implemented correctly.The two batch import functions properly distinguish between array-format JSON (
batch_insert_collectionusing--jsonArray) and line-delimited JSON (batch_insert_collection_process), which aligns with standardmongoimportusage patterns.
82-89: LGTM! Mode 10 implementation includes proper error handling.The batch import for mode 10 correctly checks the exit status and propagates failures, addressing previous review concerns about silent failures.
docs/en/command/cacct.md (1)
277-277: LGTM! Documentation cross-reference added.docs/en/command/cbatch.md (1)
339-339: LGTM! Documentation cross-reference added.src/CraneCtld/RpcService/CtldGrpcServer.h (1)
384-397: LGTM! RPC method declarations are correctly defined.The three new streaming and unary RPC methods follow the established patterns in this service implementation:
- Proper use of
ServerWriter<>for streaming responses- Consistent parameter naming and types
- Correct
overridespecifierssrc/CraneCtld/RpcService/CtldGrpcServer.cpp (3)
2058-2066: LGTM! QueryJobSummary correctly implements readiness check and delegation.The handler properly checks server readiness and delegates to the database client. Per CraneSched conventions, returning
Status::OKwith errors communicated in the payload is correct.If reporting should be restricted to admin users (similar to cacctmgr operations), consider adding
CheckCertAndUIDAllowed_andCheckUidIsAdminchecks. However, if job summaries are intended to be accessible to all authenticated users, the current implementation is appropriate.
2068-2080: LGTM! QueryJobSizeSummary correctly handles readiness and conditional delegation.The handler includes the readiness guard and properly dispatches to different query methods based on whether specific job IDs are filtered.
Same optional consideration as QueryJobSummary regarding access control—verify whether this should be admin-only or available to all authenticated users.
2082-2097: Fix authorization bypass in ActiveAggregationManually.The current implementation has a critical security flaw: when
uid != 0, it setsok=falseand an error reason (lines 2089-2091), but then continues to executeClusterRollupUsage()(line 2093) and unconditionally overwrites the response withok=true(line 2094). This allows unauthorized users to trigger expensive cluster-wide rollups while receiving a success response.Based on learnings, align with the established admin RPC pattern:
grpc::Status CraneCtldServiceImpl::ActiveAggregationManually( ::grpc::ServerContext *context, const ::crane::grpc::ActiveAggregationManuallyRequest *request, ::crane::grpc::ActiveAggregationManuallyReply *response) { if (!g_runtime_status.srv_ready.load(std::memory_order_acquire)) return {grpc::StatusCode::UNAVAILABLE, "CraneCtld Server is not ready"}; - if (request->uid() != 0) { + + // Verify TLS certificate matches UID + if (auto msg = CheckCertAndUIDAllowed_(context, request->uid()); msg) + return {grpc::StatusCode::UNAUTHENTICATED, msg.value()}; + + // Require admin privileges + auto res = g_account_manager->CheckUidIsAdmin(request->uid()); + if (!res) { response->set_ok(false); - response->set_reason("Only the root user can perform this operation"); + response->set_reason(CraneErrStr(res.error())); + return grpc::Status::OK; } + + // Authorized: perform manual rollup g_db_client->ClusterRollupUsage(); response->set_ok(true); - + response->clear_reason(); return grpc::Status::OK; }This ensures:
- TLS certificate is validated and matches the claimed UID
- Admin privilege check is enforced via
CheckUidIsAdmin- Rollup only executes when authorized
- Error responses are not overwritten
- Consistent with other admin RPCs in this service
Based on learnings
⛔ Skipped due to learnings
Learnt from: RileyWen Repo: PKUHPC/CraneSched PR: 475 File: src/CraneCtld/CtldGrpcServer.cpp:113-118 Timestamp: 2025-05-02T07:06:36.103Z Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.docs/zh/command/cqueue.md (1)
208-213: The review comment is incorrect and should be rejected.The actual file structure shows that NO
.zh.mdfiles exist anywhere in the documentation — the naming convention is simply.mdfor all files. The current state ofdocs/zh/command/cqueue.mdcontains:
- Lines 208-212: References to non-existent files (e.g.,
cbatch.zh.md,crun.zh.md, etc.)- Line 213: Correct reference to
creport.md(the file that actually exists)The review suggests changing line 213 from
creport.mdtocreport.zh.md, but this would create a seventh broken link rather than fix an inconsistency. The actual issue is that lines 208-212 contain broken links that should be corrected to use.mdinstead of.zh.md.Line 213 is already correct and should not be modified per the review's suggestion.
Likely an incorrect or invalid review comment.
protos/Crane.proto (4)
987-998: LGTM!The QueryJobSummary message definitions are well-structured with appropriate filter fields and support for streaming responses via repeated items.
1000-1015: LGTM!The QueryJobSizeSummary messages support comprehensive filtering including the new nodename_list field, enabling fine-grained queries by node allocation.
1017-1024: LGTM!The ActiveAggregationManually messages correctly include the uid field for authentication, addressing the past review concern.
1078-1080: LGTM!The new streaming RPCs are appropriately defined with server-side streaming for queries and unary call for manual aggregation trigger.
src/CraneCtld/DbClient.cpp (9)
1555-1558: LGTM!The nodename_list field is correctly integrated into the task document schema, maintaining consistency between the field array, value tuple, and documentation comments.
Also applies to: 1567-1589, 1600-1640
1663-1666: LGTM!The nodename_list integration in TaskInCtldToDocument_ mirrors the EmbeddedDb variant correctly.
Also applies to: 1679-1698, 1709-1739
1896-2079: LGTM!The rollup infrastructure is well-architected with proper:
- Timestamp tracking using MongoDB $max operator to prevent backward movement
- Graceful fallback to default time when no prior rollup exists
- Correct time comparison logic for hour/day/month boundaries
- Thread-safe execution with mutex protection
2105-2225: LGTM!The aggregation pipelines correctly:
- Multiply per-node cpus_alloc by nodes_alloc to compute total allocated CPUs (confirmed correct in past reviews)
- Calculate cpus_time as total CPUs × duration
- Include nodename_list in grouping for node-level granularity
- Upsert results with replace semantics to avoid duplicates
Also applies to: 2227-2366
2368-2520: LGTM!The QueryJobSizeSummaryByJobIds implementation correctly:
- Multiplies per-node cpus_alloc by nodes_alloc for raw task table (confirmed correct in past reviews)
- Handles bucket assignment including jobs smaller than the first threshold (past review issue addressed at line 2492)
- Filters by nodename_list when specified
2522-2710: LGTM!The query helpers demonstrate good engineering practices:
- Type-safe BSON field extraction with fallback defaults
- Flexible template-based pipeline construction
- Batched streaming to prevent large message sizes
2712-2943: LGTM!The QueryJobSummary and QueryJobSizeSummary implementations demonstrate excellent design:
- Efficient time range splitting to minimize data access
- Seamless querying across hour/day/month collections via $unionWith
- Consistent bucket handling (past review issue addressed at line 2930)
- Streaming responses with batching for scalability
2945-3032: LGTM!The initialization sequence is correct:
- MillisecondsToNextHour properly calculates delay to next hour boundary
- InitTableIndexes creates comprehensive indexes for efficient queries
- Init() correctly assigns the timer handle to the class member (line 3015), ensuring it remains alive for the lifetime of the client
3034-3055: LGTM!The background thread and cleanup logic are thread-safe:
- Idle loop properly checks stop flag before closing handles
- Destructor correctly signals stop and joins thread
Also applies to: 3083-3086
src/CraneCtld/DbClient.h (4)
22-22: LGTM!The new include and type definitions are well-structured:
- Crane.grpc.pb.h provides necessary gRPC types
- RollupType enum cleanly represents aggregation granularities
- JobSizeSummaryKey implements proper hash and equality for AbslHash compatibility
Also applies to: 107-127
129-131: LGTM!Standard lifecycle methods with proper Init() for two-phase initialization.
372-379: LGTM!The new collection name constants follow a consistent naming convention, and the batch size of 5000 is reasonable for streaming responses.
388-393: LGTM!The new private members properly support the rollup infrastructure:
- Mutex for thread-safe rollup execution
- Background thread for periodic aggregation
- Atomic stop flag for safe shutdown
- Timer handle stored as class member (correctly addresses past review)
| nav: | ||
| - Home: index.md | ||
| - Deployment: | ||
| - Overview: deployment/index.md | ||
| - Frontend: | ||
| - Deployment: deployment/frontend/frontend.md | ||
| - Plugins: deployment/frontend/plugins.md | ||
| - Backend: | ||
| - Rocky Linux 9: deployment/backend/Rocky9.md | ||
| - Ubuntu: deployment/backend/Ubuntu.md | ||
| - CentOS 7: deployment/backend/CentOS7.md | ||
| - Packaging Guide: deployment/backend/packaging.md | ||
| - eBPF Support: deployment/backend/eBPF.md | ||
| - Configuration: | ||
| - Cluster Configuration: deployment/configuration/config.md | ||
| - Database Configuration: deployment/configuration/database.md | ||
| - PAM Module: deployment/configuration/pam.md | ||
| - Multi-node Deployment: deployment/configuration/multi-node.md | ||
| - Commands: | ||
| - Job Submission: | ||
| - cbatch: command/cbatch.md | ||
| - crun: command/crun.md | ||
| - calloc: command/calloc.md | ||
| - Job Monitoring: | ||
| - cqueue: command/cqueue.md | ||
| - cinfo: command/cinfo.md | ||
| - ceff: command/ceff.md | ||
| - cacct: command/cacct.md | ||
| - creport: command/creport.md | ||
| - Job Control: | ||
| - ccancel: command/ccancel.md | ||
| - ccontrol: command/ccontrol.md | ||
| - Account Management: | ||
| - cacctmgr: command/cacctmgr.md | ||
| - Reference: | ||
| - Exit Codes: reference/exit_code.md | ||
| - Use Cases: | ||
| - Overview: reference/use_cases.md | ||
| - Multi-Node Parallel Job: reference/use_case_multi_node.md | ||
| - GPU-Accelerated Training: reference/use_case_gpu.md | ||
| # - Parameter Sweep with Job Arrays: reference/use_case_param_sweep.md | ||
| # - Chained Jobs with Dependencies: reference/use_case_pipeline.md | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove duplicate nav key definition.
YAML does not allow duplicate keys at the same level. The second nav: block (lines 190-232) will override the first one (lines 48-92), causing the loss of the original navigation structure. This is a critical configuration error flagged by YAMLlint.
The second block appears to be a copy-paste with the intent to add creport at line 218. Instead, add only the creport entry to the original nav block.
# Around line 75 in the FIRST nav block (after cacct):
- Job Monitoring:
- cqueue: command/cqueue.md
- cinfo: command/cinfo.md
- ceff: command/ceff.md
- cacct: command/cacct.md
+ - creport: command/creport.md
- Job Control:Then delete lines 189-232 (the entire duplicate nav block including the empty line 189).
Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 190-190: duplication of key "nav" in mapping
(key-duplicates)
🤖 Prompt for AI Agents
In mkdocs.yaml around lines 190 to 232 there is a duplicate top-level "nav"
block that overrides the original navigation; remove the entire duplicate block
(lines 189-232) and instead insert the missing "creport: command/creport.md"
entry into the original nav's "Job Monitoring" section so only one "nav:" exists
and the creport page is preserved.
e164202 to
85add56
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
♻️ Duplicate comments (3)
docs/en/command/creport.md (1)
146-147: Fix typos and spacing in field descriptions.A few small doc issues remain:
- Line 147: add a space after the colon for consistency with other fields:
**WCKEY**: wckey name.- Lines 240, 280, 320: add a space after the colon in
**250-499 CPUs**: CPU minutes in the 250-499 CPUs range.- Line 210:
PLANEEDlooks like a typo; change the label toPLANNED(or another intended word) to avoid confusion.Also applies to: 210-212, 236-243, 276-283, 318-323
scripts/wipe_data.sh (1)
3-3: Update usage string to match implemented modes and fix spacing.The usage message is still inconsistent and omits the documented import mode. Please align it with the actual modes and spacing, e.g.:
-echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table| 7:hour_table+day_table+month_table |8: summary_time_table)" +echo "Usage: $0 mode(1:acct_table | 2:qos_table | 3:task_table | 4:user_table | 5:all | 6:acct_table+qos_table+user_table | 7:hour_table+day_table+month_table | 8:summary_time_table | 10:import_tasks)"mkdocs.yaml (1)
191-233: Remove duplicate top-levelnavblock (YAML key conflict).There are two top-level
nav:mappings in this file; the second one (lines 191–233) overrides the first and breaks MkDocs/YAML parsing. Since the first nav already containscreport: command/creport.md, delete the entire secondnav:block and keep only the primary one.If you need any items unique to the second block, merge them into the first instead of defining
navtwice.
🧹 Nitpick comments (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
2058-2100: New creport RPC handlers look correct and consistent with existing patterns.
QueryJobSummary/QueryJobSizeSummarycorrectly gate onsrv_readyand stream via the DbClient helpers.ActiveAggregationManuallynow does TLS+UID binding and enforcesuid == 0before callingClusterRollupUsage, with clearok/reasonin the reply and early return for non-root callers.If you later introduce non-root admin roles for rollups, consider switching the root check to
g_account_manager->CheckUidIsAdmin(...), but the current behavior matches the “root-only” docs.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
docs/en/command/creport.md(1 hunks)mkdocs.yaml(2 hunks)scripts/wipe_data.sh(3 hunks)src/CraneCtld/DbClient.cpp(11 hunks)src/CraneCtld/RpcService/CtldGrpcServer.cpp(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-06-23T07:53:30.513Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-04-02T10:11:33.562Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.
Applied to files:
src/CraneCtld/DbClient.cpp
📚 Learning: 2025-05-08T09:35:39.809Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/Pmix/PmixCollRing.cpp:0-0
Timestamp: 2025-05-08T09:35:39.809Z
Learning: In the PMIx implementation for CraneSched, objects referenced in asynchronous gRPC callbacks (like `coll_ring_ctx`) remain valid as long as the parent object (`this`) is not destroyed. The `Coll` class uses shared_ptr management to ensure its lifetime extends through all pending callbacks.
Applied to files:
src/CraneCtld/DbClient.cpp
🧬 Code graph analysis (3)
scripts/wipe_data.sh (1)
scripts/wipe_data.py (1)
wipe_collection(51-59)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (2)
src/CraneCtld/DbClient.cpp (4)
QueryJobSummary(2713-2817)QueryJobSummary(2713-2715)QueryJobSizeSummary(2819-2944)QueryJobSizeSummary(2819-2821)src/Craned/Core/CranedPublicDefs.h (1)
g_runtime_status(166-166)
src/CraneCtld/DbClient.cpp (1)
src/CraneCtld/DbClient.h (3)
string(71-73)string(377-426)MongodbClient(93-221)
🪛 YAMLlint (1.37.1)
mkdocs.yaml
[error] 191-191: duplication of key "nav" in mapping
(key-duplicates)
🔇 Additional comments (1)
src/CraneCtld/DbClient.cpp (1)
1555-1644: Rollup, nodename_list, and summary query implementation look solid.
nodename_listis consistently added to task documents and carried through hourly/day/month summaries, with matching filter support and composite indexes; this aligns with the new CLI filters.- The rollup workflow (
RollupTypeToString,Get/UpdateSummaryLastSuccessTime*,NeedRollup,RollupSummary,ClusterRollupUsage,AggregateJobSummaryByHour/DayOrMonth) correctly advances the watermark only after full intervals and uses$mergewithwhenMatched: "replace"to keep summaries idempotent.QueryJobSummaryand both job-size summary paths (QueryJobSizeSummaryByJobIdsandQueryJobSizeSummary) correctly split time ranges across hour/day/month collections, use union-with pipelines, and stream batched replies with safe numeric extraction (GetCpusAlloc,GetTotalCpuTime,GetTotalCount,WriteReply).- The uvw-based background timer in
InitplusMongoDbSumaryTh_and the destructor cleanly manage the rollup thread and event loop lifetime.I don’t see correctness or concurrency issues in these additions; they’re ready to ship.
Also applies to: 1897-2080, 2230-2367, 2369-2521, 2693-2944, 2946-3033, 3035-3087
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/CraneCtld/DbClient.cpp (1)
1571-1590: Persistwckeyinto task documents so rollups and filters can work.I don’t see a
"wckey"field in eitherTaskInEmbeddedDbToDocument_orTaskInCtldToDocument_, but the rollup pipelines and query helpers expect it:
- Hourly rollup does
$addFields: { wckey: { $ifNull: ["$wckey", ""] } }.- Day/month rollups group and project
wckey.QueryJobSummary,QueryJobSizeSummary, andQueryJobSizeSummaryByJobIdsall filter and/or group bywckey.Without actually persisting
wckeyon the job documents, all summaries will end up withwckey == "", and any filtering by wckey will be ineffective. Please either:
- Add
wckeyinto the task document field arrays/tuples and populate it from the source RPC structs, or- Adjust the rollup/query code to derive
wckeyfrom whatever field currently carries that information (e.g., insideextra_attr) and store it explicitly before aggregation.Also applies to: 1680-1699, 2145-2151, 2173-2178, 2303-2315, 2456-2464, 2559-2565, 2604-2610
🧹 Nitpick comments (2)
src/CraneCtld/DbClient.cpp (2)
2523-2567: Consider supporting partition/nodename filters forQueryJobSummaryas well.
JobSizeQueryMatchsupportspartition_nameandnodename_list, butJobQueryMatchonly honors account/user/qos/wckey. If creport’s job-summary API is expected to filter by partition or nodename too, you may want to mirror those predicates here for consistency; otherwise this asymmetry is fine.Also applies to: 2569-2633, 2635-2656
2946-2971: Timer/loop lifetime is fixed; double‑check uvw threading assumptions.Persisting
mongodb_task_timer_handleas a member and shutting the loop down fromMongoDbSumaryTh_resolves the earlier issue where the timer died whenInit()returned.One thing to verify: the timer handle is created and
start()is called on the thread that runsInit(), butuvw_loop->run()executes onm_mongodb_sum_thread_. If uvw/libuv require handles to be created/started on the same thread as the running loop, you may want to move timer creation intoMongoDbSumaryTh_(or otherwise ensure all loop interactions happen on the loop thread).Also applies to: 3010-3033, 3035-3056, 3084-3087
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/CraneCtld/DbClient.cpp(11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-04-02T10:11:33.562Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 461
File: src/Craned/CgroupManager.cpp:685-685
Timestamp: 2025-04-02T10:11:33.562Z
Learning: In the CgroupManager's GetJobBpfMapCgroupsV2 method, the developer has confirmed that cg_ino_job_id_map will always contain the key->cgroup_id element, making the CRANE_ASSERT check appropriate rather than requiring additional error handling.
Applied to files:
src/CraneCtld/DbClient.cpp
🔇 Additional comments (4)
src/CraneCtld/DbClient.cpp (4)
1555-1644: Nodename_list field is correctly plumbed through task documents and summaries.The array construction and added field/tuple entries line up, and downstream rollups/indexes already consume
nodename_list, so this extension looks sound.Also applies to: 1664-1742
1897-2012: Rollup watermarking and orchestration logic look correct.
last_success_timeis kept monotonic with$max, failed intervals are retried (sincecur_startdoesn’t advance on exceptions), andNeedRollupgates work on the intended granularity. No functional issues spotted here.Also applies to: 2014-2080
2106-2367: Hourly/day/month aggregation and index wiring are internally consistent.Windowing, grouping keys (including
nodename_list), and the created indexes all match the way summaries and queries use these collections; the watermark update at the end of each rollup uses the correctcur_start.Also applies to: 2973-3008
2658-2711: Summary query and bucketing logic are solid.Time-range splitting,
$unionWithusage, numeric extraction helpers, and the revised CPU-size bucketing all behave as expected, avoiding the earlier “small jobs dropped” issue and producing stable streaming batches.Also applies to: 2713-2817, 2819-2944
| bool MongodbClient::QueryJobSizeSummaryByJobIds( | ||
| const crane::grpc::QueryJobSizeSummaryRequest* request, | ||
| grpc::ServerWriter<::crane::grpc::QueryJobSizeSummaryReply>* stream) { | ||
| document filter; | ||
| auto grouping_list = request->filter_grouping_list(); | ||
|
|
||
| if (request->has_filter_start_time()) { | ||
| int64_t start_time_sec = request->filter_start_time().seconds(); | ||
| filter.append(kvp("time_start", [&](sub_document time_start_doc) { | ||
| time_start_doc.append(kvp("$gte", start_time_sec)); | ||
| })); | ||
| } | ||
|
|
||
| if (request->has_filter_end_time()) { | ||
| int64_t end_time_sec = request->filter_end_time().seconds(); | ||
| filter.append(kvp("time_end", [&](sub_document time_end_doc) { | ||
| time_end_doc.append(kvp("$lte", end_time_sec)); | ||
| })); | ||
| } | ||
|
|
||
| bool has_accounts_constraint = !request->filter_accounts().empty(); | ||
| if (has_accounts_constraint) { | ||
| filter.append(kvp("account", [&request](sub_document account_doc) { | ||
| array account_array; | ||
| for (const auto& account : request->filter_accounts()) { | ||
| account_array.append(account); | ||
| } | ||
| account_doc.append(kvp("$in", account_array)); | ||
| })); | ||
| } | ||
|
|
||
| bool has_users_constraint = !request->filter_users().empty(); | ||
| if (has_users_constraint) { | ||
| filter.append(kvp("username", [&request](sub_document user_doc) { | ||
| array user_array; | ||
| for (const auto& user : request->filter_users()) { | ||
| user_array.append(user); | ||
| } | ||
| user_doc.append(kvp("$in", user_array)); | ||
| })); | ||
| } | ||
|
|
||
| bool has_qos_constraint = !request->filter_qoss().empty(); | ||
| if (has_qos_constraint) { | ||
| filter.append(kvp("qos", [&request](sub_document qos_doc) { | ||
| array qos_array; | ||
| for (const auto& qos : request->filter_qoss()) { | ||
| qos_array.append(qos); | ||
| } | ||
| qos_doc.append(kvp("$in", qos_array)); | ||
| })); | ||
| } | ||
|
|
||
| bool has_task_ids_constraint = !request->filter_job_ids().empty(); | ||
| if (has_task_ids_constraint) { | ||
| filter.append(kvp("task_id", [&request](sub_document task_id_doc) { | ||
| array task_id_array; | ||
| for (const auto& task_id : request->filter_job_ids()) { | ||
| task_id_array.append(static_cast<std::int32_t>(task_id)); | ||
| } | ||
| task_id_doc.append(kvp("$in", task_id_array)); | ||
| })); | ||
| } | ||
|
|
||
| bool has_partitions_constraint = !request->filter_partitions().empty(); | ||
| if (has_partitions_constraint) { | ||
| filter.append(kvp("partition_name", [&request](sub_document partition_doc) { | ||
| array partition_array; | ||
| for (const auto& partition : request->filter_partitions()) { | ||
| partition_array.append(partition); | ||
| } | ||
| partition_doc.append(kvp("$in", partition_array)); | ||
| })); | ||
| } | ||
|
|
||
| bool has_nodename_list_constraint = !request->filter_nodename_list().empty(); | ||
| if (has_nodename_list_constraint) { | ||
| filter.append( | ||
| kvp("nodename_list", [&request](sub_document nodename_list_doc) { | ||
| array nodename_list_array; | ||
| for (const auto& nodename : request->filter_nodename_list()) { | ||
| nodename_list_array.append(nodename); | ||
| } | ||
| nodename_list_doc.append(kvp("$in", nodename_list_array)); | ||
| })); | ||
| } | ||
|
|
||
| bool has_wckeys_constraint = !request->filter_wckeys().empty(); | ||
| if (has_wckeys_constraint) { | ||
| filter.append(kvp("wckey", [&request](sub_document wckey_doc) { | ||
| array wckey_array; | ||
| for (const auto& wckey : request->filter_wckeys()) { | ||
| wckey_array.append(wckey); | ||
| } | ||
| wckey_doc.append(kvp("$in", wckey_array)); | ||
| })); | ||
| } | ||
|
|
||
| absl::flat_hash_map<JobSizeSummaryKey, JobSizeSummaryResult> agg_map; | ||
| try { | ||
| mongocxx::cursor cursor = | ||
| (*GetClient_())[m_db_name_][m_task_collection_name_].find( | ||
| filter.view()); | ||
| for (auto view : cursor) { | ||
| std::string acc = std::string(view["account"].get_string().value); | ||
| std::string wk = | ||
| view["wckey"] ? std::string(view["wckey"].get_string().value) : ""; | ||
| uint32_t cpus_alloc = | ||
| static_cast<uint32_t>(view["cpus_alloc"].get_double().value * | ||
| view["nodes_alloc"].get_int32().value); | ||
| int64_t time_start = view["time_start"].get_int64().value; | ||
| int64_t time_end = view["time_end"].get_int64().value; | ||
| double total_cpu_time = (time_end - time_start) * cpus_alloc; | ||
| if (grouping_list.empty()) { | ||
| agg_map[{acc, wk, cpus_alloc}].total_cpu_time += total_cpu_time; | ||
| agg_map[{acc, wk, cpus_alloc}].total_count += 1; | ||
| } else { | ||
| int group_index = 0; | ||
| for (const auto group : grouping_list) { | ||
| if (cpus_alloc < group) { | ||
| break; | ||
| } | ||
| group_index++; | ||
| } | ||
| uint32_t bucket = grouping_list[group_index == 0 ? 0 : group_index - 1]; | ||
| agg_map[{acc, wk, bucket}].total_cpu_time += total_cpu_time; | ||
| agg_map[{acc, wk, bucket}].total_count += 1; | ||
| } | ||
| } | ||
| } catch (const bsoncxx::exception& e) { | ||
| CRANE_LOGGER_ERROR(m_logger_, e.what()); | ||
| } | ||
|
|
||
| crane::grpc::QueryJobSizeSummaryReply reply; | ||
| for (const auto& kv : agg_map) { | ||
| crane::grpc::JobSizeSummaryItem item; | ||
| item.set_cluster(g_config.CraneClusterName); | ||
| item.set_account(kv.first.account); | ||
| item.set_wckey(kv.first.wckey); | ||
| item.set_cpus_alloc(kv.first.cpus_alloc); | ||
| item.set_total_cpu_time(kv.second.total_cpu_time); | ||
| item.set_total_count(kv.second.total_count); | ||
| reply.add_item_list()->CopyFrom(item); | ||
| if (reply.item_list_size() >= MaxJobSummaryBatchSize) { | ||
| stream->Write(reply); | ||
| reply.clear_item_list(); | ||
| } | ||
| } | ||
| if (reply.item_list_size() > 0) { | ||
| stream->Write(reply); | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filter out step‑only documents in QueryJobSizeSummaryByJobIds.
This query runs on the raw task collection but doesn’t filter on has_job_info. Step‑only documents inserted by InsertSteps lack fields like account, cpus_alloc, time_start, etc., so accessing them here can throw and be caught, but you’ll get log noise and silently missing jobs.
To align with FetchJobRecords and ensure only fully populated jobs are aggregated, add has_job_info == true to the filter:
bool MongodbClient::QueryJobSizeSummaryByJobIds(
@@
- bool has_wckeys_constraint = !request->filter_wckeys().empty();
+ bool has_wckeys_constraint = !request->filter_wckeys().empty();
if (has_wckeys_constraint) {
@@
wckey_doc.append(kvp("$in", wckey_array));
}));
}
+
+ // Only aggregate documents with complete job information.
+ filter.append(kvp("has_job_info", true));🤖 Prompt for AI Agents
In src/CraneCtld/DbClient.cpp around lines 2369 to 2521, the query builds a
Mongo filter but does not exclude "step-only" documents lacking job fields,
causing exceptions and noisy logs; add a filter clause requiring has_job_info ==
true (same approach as other filters: append a kvp("has_job_info",
[&](sub_document d){ d.append(kvp("$eq", true)); }) or equivalent boolean
equality) before performing the find so only fully populated job documents are
aggregated.
| ``` | ||
| ### Sample Output | ||
|
|
||
|  |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
将图片的文字内容用代码块格式展示,不再上传原始图片
| - Use Cases: | ||
| - Overview: reference/use_cases.md | ||
| - Multi-Node Parallel Job: reference/use_case_multi_node.md | ||
| - GPU-Accelerated Training: reference/use_case_gpu.md |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这一段是为什么
| return std::string(Internal::CraneStepStatusStrArr[static_cast<int>(status)]); | ||
| } | ||
|
|
||
| std::string FormatTime(std::time_t t) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
为什么用time_t?不用chrono?
|
|
||
| struct TimeRange { | ||
| std::time_t start; | ||
| std::time_t end; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
用chrono
|
|
||
| ## Display User Resource Consumption Ranking | ||
| ```bash | ||
| creport user topusage [--start-time=...] [--end-time=...] [--account=...] ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
学习下ccontorl的文档结构吧
| } | ||
|
|
||
| // Try to merge consecutive full-day intervals until the next month's first day. | ||
| bool TryPushDays(std::vector<TimeRange> &result, std::time_t &cur, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个东西和string有什么关系?为什么放这里面,而且这个东西不会在其他地方用到吧?
| item.set_total_count(GetTotalCount(doc)); | ||
| item.set_total_cpu_time(GetTotalCpuTime(doc)); | ||
|
|
||
| reply.add_item_list()->CopyFrom(item); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里直接Swap就行,
| std::time_t end); | ||
| bool TryPushDays(std::vector<TimeRange> &result, std::time_t &cur, | ||
| std::time_t end); | ||
| std::vector<TimeRange> EfficientSplitTimeRange(std::time_t start, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
注释下吧,
Convert a time duration to minimal Month/Day/Hour time ranges.
| } | ||
| } | ||
|
|
||
| void MongodbClient::QueryJobSizeSummary( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个没有过滤?和QueryJobSizeSummaryByJobIds有什么区别
| e.what()); | ||
| } | ||
|
|
||
| UpdateSummaryLastSuccessTimeSec(RollupType::HOUR, cur_start); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个应该成功才调用?,其他的聚合同理
Summary by CodeRabbit
New Features
Chores
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.