-
Notifications
You must be signed in to change notification settings - Fork 32
feat:sol lua #783
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat:sol lua #783
Conversation
📝 WalkthroughWalkthroughAdds optional Lua-based job-submit/modify hooks: CMake build switches and sol2 fetch, a thread-pooled Lua environment, LuaJobHandler exposing CTLD types to scripts, integration points in scheduler and RPC flows to run async Lua checks, protobuf extensions (task env map and Lua error), and docs/config examples. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant RPC as CtldGrpcServer
participant Scheduler
participant LuaPool
participant LuaEnv as LuaEnvironment
participant Script as Lua Script
Client->>RPC: SubmitTask(s)
RPC->>Scheduler: JobSubmitLuaCheck(task)
alt JobSubmitLuaScript configured
Scheduler->>LuaPool: ExecuteLuaScript(callback -> uses LuaEnv + script)
LuaPool->>LuaEnv: Init(script) / LoadLuaScript()
LuaEnv->>Script: register types/functions & call crane_job_submit(job_desc, part_list)
Script-->>LuaEnv: return success / error
LuaEnv-->>LuaPool: CraneRichError (future)
LuaPool-->>Scheduler: future<CraneRichError>
Scheduler->>RPC: Lua result (ready)
else No script configured
Scheduler-->>RPC: std::nullopt (skip Lua)
end
alt Lua validation success
RPC->>Scheduler: proceed with scheduling
Scheduler-->>RPC: job accepted (job_id)
RPC-->>Client: Success response
else Lua validation failed
RPC-->>Client: Error response with reason (ERR_LUA_FAILED / message)
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
637-747: ModifyTask now silently drops IDs that aren’t in Pd/Rn queues when Lua is enabled.The new Lua gating logic in
ModifyTask:std::list<task_id_t> task_ids; std::list<std::pair<task_id_t, std::future<CraneRichError>>> futures; for (const auto task_id : request->task_ids()) { auto fut = g_task_scheduler->JobModifyLuaCheck(task_id); if (fut) futures.emplace_back(task_id, std::move(fut.value())); } for (auto& [task_id, fut] : futures) { auto rich_err = fut.get(); if (rich_err.code() != CraneErrCode::SUCCESS) { response->add_not_modified_tasks(task_id); ... // add reason } else { task_ids.emplace_back(task_id); } }Only task IDs that both:
- Are present in Pd/Rn queues (so
JobModifyLuaCheckreturns a future), and- Pass the Lua check,
end up in
task_idsand are later processed by the time-limit / priority / hold branches.Any requested
task_idthat is not in the pending or running maps causesJobModifyLuaCheckto returnstd::nulloptand is never:
- Added to
futures,- Added to
task_ids, or- Added to either
modified_tasksornot_modified_tasks.Previously,
ChangeTaskTimeLimit/ChangeTaskPriority/HoldReleaseTaskAsyncwould returnERR_NON_EXISTENT, and the RPC would report those jobs explicitly asnot_modifiedwith a reason. Now, such IDs can effectively disappear from the response, which is confusing for callers.Coupled with
JobModifyLuaCheck’s use ofstd::nulloptto mean both “Lua disabled” and “job not found”, this makes the behavior hard to reason about.Suggestion:
- Treat a
std::nulloptreturn fromJobModifyLuaCheckas “task not found in pending/running queues” and immediately push:
task_idintonot_modified_tasks, and- A standardized reason like “Task #X was not found in running or pending queue.”
- Or, change
JobModifyLuaCheck’s signature to explicitly return a status (e.g.,std::expected<std::optional<std::future<CraneRichError>>, CraneErrCode>) soModifyTaskcan distinguish “Lua disabled” from “job not found”.This keeps the ModifyTask API behavior consistent with pre-Lua semantics and avoids silently ignoring some requested IDs.
🧹 Nitpick comments (7)
CMakeLists.txt (1)
276-283: Simplify redundant cache variable assignments.Lines 279-280 assign
LUA_INCLUDE_DIRandLUA_LIBRARIESto themselves in the cache, which is redundant sincefind_package(Lua)already sets these as cache variables. Consider removing these lines:🔎 Suggested simplification
find_package(Lua QUIET) if(LUA_FOUND) add_definitions(-DHAVE_LUA) - set(LUA_INCLUDE_DIR ${LUA_INCLUDE_DIR} CACHE PATH "Lua include dir") - set(LUA_LIBRARIES ${LUA_LIBRARIES} CACHE STRING "Lua libraries") message(STATUS "LUA_INCLUDE_DIR = ${LUA_INCLUDE_DIR}") message(STATUS "LUA_LIBRARIES = ${LUA_LIBRARIES}") endif()src/Utilities/PublicHeader/CMakeLists.txt (1)
13-16: Make Lua include directory a public usage requirement (or ensure it comes fromsol2).
crane/Lua.his listed in the PUBLIC headers ofUtility_PublicHeader, and it’s expected to include Lua / sol2 headers internally. Currently:target_include_directories(Utility_PublicHeader PUBLIC include) if (${LUA_FOUND}) target_include_directories(Utility_PublicHeader PRIVATE ${LUA_INCLUDE_DIR}) endif ()If
crane/Lua.h(or anything it includes) directly includes Lua headers (e.g.lua.hpp) and you rely onUtility_PublicHeaderas a PUBLIC dependency in other targets, those consumers may not see${LUA_INCLUDE_DIR}because it’s only a PRIVATE include dir here.Unless
sol2already exposes${LUA_INCLUDE_DIR}transitively, it would be safer to make this PUBLIC:-if (${LUA_FOUND}) - target_include_directories(Utility_PublicHeader PRIVATE ${LUA_INCLUDE_DIR}) -endif () +if (LUA_FOUND) + target_include_directories(Utility_PublicHeader PUBLIC ${LUA_INCLUDE_DIR}) +endif ()(Also, using
if (LUA_FOUND)instead ofif (${LUA_FOUND})avoids accidental string expansion issues.)Please double-check that a consumer target which only does
target_link_libraries(foo PUBLIC Utility_PublicHeader)can compile code including"crane/Lua.h"without manually adding${LUA_INCLUDE_DIR}.Also applies to: 17-26, 28-30
src/CraneCtld/Lua/LuaJobHandler.h (1)
32-60: Consider namespacing and centralizing theg_lua_poolglobal.
LuaJobHandleritself looks fine, but the global pool:inline std::unique_ptr<crane::LuaPool> g_lua_pool;is:
- Defined in a header (inline, so one instance per process) and
- Placed in the global namespace, separate from other Ctld-wide globals.
This works technically, but for clarity and encapsulation it would be cleaner to:
- Move
g_lua_poolinto a namespace (e.g.namespace Ctld { ... }), and/or- Declare it as
externin the header and define it in a single.cpp(e.g. alongside other Ctld globals inCtldPublicDefs.cpp), keeping global state concentrated.That makes initialization and lifetime easier to audit and keeps the global namespace tidy.
Also applies to: 64-64
src/Utilities/PublicHeader/Lua.cpp (2)
29-29: Redundant temporary construction.
std::make_unique<sol::state>(sol::state())creates an unnecessary temporary. Use direct construction instead.Suggested fix
- m_lua_state_ptr_ = std::make_unique<sol::state>(sol::state()); + m_lua_state_ptr_ = std::make_unique<sol::state>();
46-59: Code duplication withInitmethod.The Lua state initialization logic (lines 52-58) duplicates the code in
Init(lines 29-35). Consider extracting a private helper or callingInitwith the stored script path to avoid duplication.Also, line 52 has the same redundant construction pattern.
Suggested approach
bool LuaEnvironment::LoadLuaScript(const std::vector<std::string>& req_funcs) { if (m_lua_state_ptr_ == nullptr) { CRANE_DEBUG( "Lua state (m_lua_state_) is null when loading script '{}'. " "This usually indicates Lua VM initialization failed.", m_lua_script_); - m_lua_state_ptr_ = std::make_unique<sol::state>(sol::state()); - m_lua_state_ptr_->open_libraries(sol::lib::base, sol::lib::package, - sol::lib::table, sol::lib::string); - m_crane_table_ = (*m_lua_state_ptr_)["crane"].valid() - ? (*m_lua_state_ptr_)["crane"] - : (*m_lua_state_ptr_).create_named_table("crane"); - RegisterFunctions_(); + if (!Init(m_lua_script_)) { + return false; + } }src/Utilities/PublicHeader/include/crane/Lua.h (1)
44-44: Minor: Extra space in return statement.- sol::state& GetLuaState() const { return *m_lua_state_ptr_; } + sol::state& GetLuaState() const { return *m_lua_state_ptr_; }src/CraneCtld/Lua/LuaJobHandler.cpp (1)
32-139: Consider extracting common Lua setup into a helper.
JobSubmitandJobModifyshare significant setup code: Lua environment initialization, registering global functions, types, and variables, and loading the script. Extracting this into a private helper would reduce duplication and improve maintainability.Suggested approach
// Private helper std::unique_ptr<crane::LuaEnvironment> LuaJobHandler::SetupLuaEnv_( const std::string& lua_script, CraneRichError* error) { auto lua_env = std::make_unique<crane::LuaEnvironment>(); if (!lua_env->Init(lua_script)) { *error = FormatRichErr(CraneErrCode::ERR_LUA_FAILED, "Failed to init lua environment"); return nullptr; } RegisterGlobalFunctions_(*lua_env); RegisterTypes_(*lua_env); RegisterGlobalVariables_(*lua_env); if (!lua_env->LoadLuaScript(kReqFxns)) { *error = FormatRichErr(CraneErrCode::ERR_LUA_FAILED, "Failed to load lua script"); return nullptr; } return lua_env; }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
CMakeLists.txtdependencies/cmake/CMakeLists.txtdependencies/cmake/lua_sol/CMakeLists.txtdocs/en/reference/lua_guide.mddocs/zh/reference/lua_guide.mdetc/config.yamlmkdocs.yamlprotos/Crane.protoprotos/PublicDefs.protosrc/CraneCtld/CMakeLists.txtsrc/CraneCtld/CraneCtld.cppsrc/CraneCtld/CtldPublicDefs.cppsrc/CraneCtld/CtldPublicDefs.hsrc/CraneCtld/Lua/LuaJobHandler.cppsrc/CraneCtld/Lua/LuaJobHandler.hsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/TaskScheduler.cppsrc/CraneCtld/TaskScheduler.hsrc/Utilities/PublicHeader/CMakeLists.txtsrc/Utilities/PublicHeader/Lua.cppsrc/Utilities/PublicHeader/String.cppsrc/Utilities/PublicHeader/include/crane/Lua.hsrc/Utilities/PublicHeader/include/crane/PublicHeader.hsrc/Utilities/PublicHeader/include/crane/String.h
🧰 Additional context used
🧠 Learnings (14)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/Utilities/PublicHeader/include/crane/Lua.hsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/Lua/LuaJobHandler.cppsrc/CraneCtld/CraneCtld.cppsrc/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/CraneCtld.cppsrc/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-09T01:54:39.465Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-12-08T08:11:40.332Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 727
File: src/Craned/Supervisor/TaskManager.cpp:1401-1407
Timestamp: 2025-12-08T08:11:40.332Z
Learning: In src/Craned/Supervisor/TaskManager.cpp, StepInstance::Prepare() is always called before any task's Spawn() method in the execution flow (via EvGrpcExecuteTaskCb_). If Prepare() fails, tasks are immediately finished and Spawn() is never invoked. For Crun tasks, StepInstance::Prepare() guarantees that x11_meta is set (even if x11 is false), so accessing x11_meta.value() in Spawn() when both IsCrun() and x11 are true is safe without additional has_value() checks.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.
Applied to files:
src/CraneCtld/TaskScheduler.hsrc/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().
Applied to files:
src/CraneCtld/TaskScheduler.hsrc/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-06-23T07:53:30.513Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/Craned/CranedServer.cpp:50-51
Timestamp: 2025-06-23T07:53:30.513Z
Learning: In the CraneSched codebase, `g_ctld_client` is a guaranteed global variable that is always initialized before any gRPC service methods are called, so null pointer checks are not necessary when calling methods on it.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/TaskScheduler.cpp
📚 Learning: 2025-08-12T08:58:39.772Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 577
File: src/Craned/Supervisor/TaskManager.cpp:124-130
Timestamp: 2025-08-12T08:58:39.772Z
Learning: In the CraneSched project using C++23, Class Template Argument Deduction (CTAD) allows std::unique_ptr declarations without explicit template parameters when the type can be deduced from the initializer, such as `std::unique_ptr task = std::move(m_task_map_.at(task_id))` where the template parameter is deduced from the move operation.
Applied to files:
src/CraneCtld/TaskScheduler.cpp
🧬 Code graph analysis (6)
src/Utilities/PublicHeader/include/crane/String.h (1)
src/Utilities/PublicHeader/String.cpp (2)
TimeStr2Mins(544-590)TimeStr2Mins(544-544)
src/CraneCtld/TaskScheduler.h (1)
src/CraneCtld/TaskScheduler.cpp (4)
JobSubmitLuaCheck(1563-1568)JobSubmitLuaCheck(1563-1563)JobModifyLuaCheck(1570-1592)JobModifyLuaCheck(1570-1570)
src/CraneCtld/Lua/LuaJobHandler.cpp (3)
src/Craned/Supervisor/TaskManager.h (1)
job_id(51-51)src/Craned/Core/SupervisorKeeper.cpp (1)
pair(123-183)src/CraneCtld/CranedMetaContainer.h (1)
vector(174-177)
src/CraneCtld/CraneCtld.cpp (1)
src/Craned/Core/CranedPublicDefs.h (1)
g_config(160-160)
src/Utilities/PublicHeader/Lua.cpp (1)
src/Utilities/PublicHeader/String.cpp (2)
TimeStr2Mins(544-590)TimeStr2Mins(544-544)
src/CraneCtld/TaskScheduler.cpp (2)
src/Craned/Core/CranedPublicDefs.h (1)
g_config(160-160)src/CraneCtld/Lua/LuaJobHandler.cpp (4)
JobSubmit(32-93)JobSubmit(32-33)JobModify(95-139)JobModify(95-96)
🪛 LanguageTool
docs/en/reference/lua_guide.md
[style] ~35-~35: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...b submission rejected due to error. - crane.* (Crane error codes) — Reject submi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (27)
protos/Crane.proto (1)
90-90: LGTM! Backward-compatible field addition.The
reasonfield is correctly placed outside theoneof payloadblock, allowing it to provide explanatory messages for both successful and failed submissions.etc/config.yaml (1)
89-90: LGTM! Clear documentation for optional Lua script configuration.The commented example provides users with guidance on enabling Lua-based job submission validation.
mkdocs.yaml (1)
86-86: LGTM! Documentation navigation updated correctly.The Lua Guide is properly integrated into both English and Chinese navigation sections, maintaining the bilingual documentation structure.
Also applies to: 124-124
src/CraneCtld/CtldPublicDefs.h (1)
186-187: LGTM! Configuration field addition is straightforward.The
JobSubmitLuaScriptfield appropriately usesstd::stringfor the Lua script path. An empty string will naturally indicate that Lua validation is disabled.src/CraneCtld/CMakeLists.txt (2)
22-24: LGTM! Lua job handler sources added.The new source files are correctly integrated into the cranectld executable build.
70-72: LGTM! Conditional Lua linking is correctly configured.The conditional linking and include directories properly use
LUA_FOUNDto enable optional Lua support. Note thatsol2is always linked while the Lua runtime libraries are conditional, which is appropriate for header-only sol2 usage.Also applies to: 94-96
dependencies/cmake/CMakeLists.txt (1)
37-37: LGTM! Sol2 subdirectory correctly integrated.The
lua_solsubdirectory is properly added to the third-party dependencies build, enabling the FetchContent-based sol2 library integration.dependencies/cmake/lua_sol/CMakeLists.txt (1)
3-7: sol2 version v3.5.0 is current and appropriate for use.v3.5.0 was released on March 29, 2025, making it the latest stable version. No known security vulnerabilities were identified for this version. The library is distributed under the MIT License and has no known vulnerabilities.
src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)
262-264: Protobuf ErrCode enum correctly includes ERR_LUA_FAILED = 85.The error message array is properly aligned. "Lua script validation failed" at array index 84 (line 264) correctly corresponds to the
ERR_LUA_FAILED = 85entry in the protobufErrCodeenum (protos/PublicDefs.proto:697). The comment "// 85-89" confirms the index alignment.src/CraneCtld/CtldPublicDefs.cpp (2)
1176-1178: LGTM - Username populated from valid password entry.The code correctly checks if
password_entryis valid before callingSetUsername. This ensures the username field is populated from the password database when available.
1322-1327: LGTM - Environment variables propagated to TaskInfo.The code correctly copies all environment variables from the task's
envmap to the protobufTaskInfo.envfield, enabling Lua scripts and other components to access task environment data.src/Utilities/PublicHeader/include/crane/String.h (1)
210-211: LGTM - New time parsing utility function declared.The new
TimeStr2Minsfunction provides time string parsing capability. The signature is appropriate withstd::string_viewfor efficient string handling.src/CraneCtld/TaskScheduler.h (1)
781-784: LGTM - Lua pre-check entry points declared.The new
JobSubmitLuaCheckandJobModifyLuaCheckdeclarations enable optional Lua-based validation. The return typestd::optional<std::future<CraneRichError>>appropriately conveys both optionality (when Lua is disabled) and asynchronous execution.protos/PublicDefs.proto (2)
471-471: LGTM - Environment map field added to TaskInfo.The new
envfield enables per-task environment variables to be carried inTaskInfo, which is essential for Lua-based validation that needs access to task environment data.
696-697: LGTM - New error code for Lua failures.The
ERR_LUA_FAILEDerror code (85) properly extends the error classification to handle Lua script execution failures in the job submission/modification workflow.docs/zh/reference/lua_guide.md (1)
1-321: LGTM - Comprehensive Lua integration guide in Chinese.The documentation thoroughly covers Lua script configuration including required functions, available APIs, data structures, and configuration examples. The sample code demonstrates proper usage of the Lua integration.
src/Utilities/PublicHeader/String.cpp (1)
544-590: LGTM - Robust time string parsing implementation.The
TimeStr2Minsfunction correctly handles multiple time format variations:
- Special values: "-1", "INFINITE", "UNLIMITED" → -1
- Optional day prefix with dash separator
- Hours:minutes:seconds, minutes:seconds, or minutes-only formats
- Rounds up by 1 minute when seconds > 0
The implementation validates input characters and returns -1 for malformed input, providing consistent error handling.
src/CraneCtld/CraneCtld.cpp (3)
36-36: LGTM - Lua job handler included.The include for
Lua/LuaJobHandler.henables Lua-based job submission validation when configured.
253-254: LGTM - Lua script path configuration.The
JobSubmitLuaScriptconfiguration field is properly read with an empty string default, allowing optional Lua integration.
923-924: LGTM - Conditional Lua pool initialization.The Lua pool is correctly initialized only when
JobSubmitLuaScriptis configured, avoiding unnecessary resource allocation when Lua support is not needed. The initialization order (after task scheduler, before server start) is appropriate.src/Utilities/PublicHeader/Lua.cpp (2)
175-182: LGTM!Clean use of protobuf reflection to dynamically register error codes into the Lua environment.
184-194: LGTM!Proper validation of required Lua functions with informative error messages.
src/Utilities/PublicHeader/include/crane/Lua.h (1)
93-106: LGTM!Clean use of C++20 concepts to constrain the callback type, and proper forwarding of the callback.
src/CraneCtld/Lua/LuaJobHandler.cpp (4)
190-205: Verify lifetime of capturedlua_envreference in lambdas.Several lambdas capture
lua_envby reference (via[&]) and are stored in usertype definitions. These lambdas calllua_env.GetLuaState().create_table(). If thelua_envreference outlives the referenced object, this would cause use-after-free.The current code appears safe since
lua_envis created at the start ofJobSubmit/JobModifyand lives until those functions return, after Lua script execution completes. However, consider capturinglua_env.GetLuaState()reference directly to make the dependency clearer and more resilient to refactoring.
143-178: LGTM!Well-structured global function registration with proper use of
std::optionalfor nullable returns and appropriate error logging.
286-394: LGTM!Comprehensive and well-typed property bindings for
TaskInfo, with proper conversions for timestamps, durations, and container types.
487-506: LGTM!Clean implementation of lazy-loading for reservations using Lua metatable
__pairs, ensuring fresh data is queried on each iteration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
662-747: ModifyTask Lua gating breaks behavior when Lua is disabled and hides non‑existent tasks; also incorrectadd_not_modified_tasks(false)The new Lua gating in
ModifyTaskhas a few correctness issues:
When Lua is disabled (
JobSubmitLuaScriptempty), no tasks are modified at all
JobModifyLuaCheckreturnsstd::nulloptwhen no Lua script is configured.- Your first loop builds
futuresonly whenfutis non‑nullopt, andtask_idsis populated only from successful Lua results in the second loop.- With Lua disabled,
futuresandtask_idsremain empty, so the later loops overtask_idsnever touch any of the requested task IDs. The RPC returns with all lists empty, effectively becoming a no‑op.You probably want the old behavior (no Lua gating) when no script is configured. One simple pattern:
// Start with all requested ids std::list<task_id_t> task_ids(request->task_ids().begin(), request->task_ids().end()); std::list<std::pair<task_id_t, std::future<CraneRichError>>> futures; for (auto task_id : task_ids) { auto fut = g_task_scheduler->JobModifyLuaCheck(task_id); if (fut) futures.emplace_back(task_id, std::move(*fut)); } for (auto& [task_id, fut] : futures) { auto rich_err = fut.get(); if (rich_err.code() != CraneErrCode::SUCCESS) { response->add_not_modified_tasks(task_id); response->add_not_modified_reasons( rich_err.description().empty() ? CraneErrStr(rich_err.code()) : rich_err.description()); task_ids.remove(task_id); // or track a separate allowed set } }This way:
- If Lua is not configured (nullopt), the task still flows through and is handled as before.
- If Lua rejects a task, it is reported in
not_modified_*and excluded from the modification loops.Non‑existent tasks now drop out silently
JobModifyLuaCheckalso returnsstd::nulloptwhen the task_id isn’t found in the pending/running maps. With the new logic, such IDs never make it intotask_idsand never get an error recorded, whereas previouslyChangeTaskTimeLimit/ChangeTaskPriority/HoldReleaseTaskAsyncwould returnERR_NON_EXISTENTand that would be surfaced to the client. The restructuring above (starting from all requested IDs and only removing those explicitly rejected by Lua) preserves the prior error-reporting behavior.Typo:
add_not_modified_tasks(false)In the Hold branch:
} else { response->add_not_modified_tasks(false); response->add_not_modified_reasons( fmt::format("Failed to hold/release job: {}.", CraneErrStr(err))); }should use
task_idrather thanfalse, otherwise you’re appending0as a bogus “not modified” task id:
response->add_not_modified_tasks(false);
response->add_not_modified_tasks(task_id);Addressing these points should restore the original ModifyTask semantics when Lua is off, add proper reporting for Lua‑rejected tasks, and fix the incorrect task id in the error case.
♻️ Duplicate comments (5)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
512-552: Multi‑submit (SubmitBatchTasks) still bypasses Lua validation and can bypass policyYou’ve correctly wired
JobSubmitLuaCheckinto:
- Interactive submissions via
CforedStream(TASK_REQUEST), and- Single batch submissions via
SubmitBatchTask.However,
SubmitBatchTasks(thecount> 1 API) still goes directly toSubmitTaskToSchedulerwithout any Lua check:for (int i = 0; i < task_count; i++) { auto task = std::make_unique<TaskInCtld>(); task->SetFieldsByTaskToCtld(task_to_ctld); auto result = g_task_scheduler->SubmitTaskToScheduler(std::move(task)); results.emplace_back(std::move(result)); }This allows callers to bypass any site policies enforced in Lua by using the multi-submit API instead of the single-task API.
I’d recommend:
- Running
JobSubmitLuaCheck(task.get())for each task in the loop, and- If Lua returns a non‑SUCCESS
CraneRichError, append an appropriate error code tocode_list(and a sentinel like0intask_id_listif needed), mirroring the single-submit behavior, otherwise enqueue viaSubmitTaskToScheduler.That keeps all submission entry points consistent with respect to Lua-based validation.
src/Utilities/PublicHeader/include/crane/Lua.h (1)
73-90: Fix LuaPool string‑based ExecuteLuaScript: early returns and success resultThe
ExecuteLuaScript(const std::string& lua_script)overload still has the issues previously noted:
- If
Init(lua_script)fails, the lambda continues and callsLoadLuaScripton a not‑properly‑initialized environment.- If
LoadLuaScriptfails, the lambda continues; only the last failure “wins”.- On success,
resultremains default-constructed instead of being explicitly set to a SUCCESSCraneRichError.This risks undefined behavior on init failure and makes the success path ambiguous. Consider restructuring as:
m_thread_pool_->detach_task([lua_script, promise]() { - CraneRichError result; - auto lua_env = std::make_unique<crane::LuaEnvironment>(); - if (!lua_env->Init(lua_script)) - result = FormatRichErr(CraneErrCode::ERR_LUA_FAILED, - "Failed to init lua environment"); - - if (!lua_env->LoadLuaScript({})) - result = FormatRichErr(CraneErrCode::ERR_LUA_FAILED, - "Failed to load lua script"); - - promise->set_value(result); + auto lua_env = std::make_unique<crane::LuaEnvironment>(); + if (!lua_env->Init(lua_script)) { + promise->set_value(FormatRichErr( + CraneErrCode::ERR_LUA_FAILED, "Failed to init lua environment")); + return; + } + + if (!lua_env->LoadLuaScript({})) { + promise->set_value(FormatRichErr( + CraneErrCode::ERR_LUA_FAILED, "Failed to load lua script")); + return; + } + + promise->set_value(FormatRichErr(CraneErrCode::SUCCESS, "")); });This ensures the promise is set exactly once with a well-defined
CraneRichErrorin all paths.docs/en/reference/lua_guide.md (1)
73-79: Fix minor inconsistencies and typos in the English Lua guideThe guide is very detailed; a few small fixes will make it match the actual API and examples:
- In the function table, change the logging row to consistently use
crane.log_info/crane.log_debug(not barelog_info/log_debug), matching the example script.- In the
job_descattributes table,included_nodes/excluded_nodesare declared asstring, but the example treats them as arrays (table.concat(job_desc.included_nodes, ", ")), so the documented type should betable(list of strings).- In the
crane.reservationstable, fix the typopartiton→partition.- The meta example uses
job_desc.bash_meta; this almost certainly should bebatch_metato match the underlying concept.- As in the Chinese doc, the documented signature for
crane.set_job_env_fielddoes not match the example call; please align the argument order with the real binding.Also applies to: 86-112, 175-191, 339-345, 272-279
src/CraneCtld/Lua/LuaJobHandler.cpp (1)
519-529: Potentialstd::out_of_rangefrom.at()call.This issue was previously flagged. Line 521 uses
.at(actual_account)which throws if the key doesn't exist.src/Utilities/PublicHeader/Lua.cpp (1)
97-144: Missing validation for empty variadic arguments.This issue was previously flagged. The logging functions access
va[0]without checking ifvais empty, causing undefined behavior when called with no arguments.
🧹 Nitpick comments (9)
CMakeLists.txt (1)
89-90: Lua CMake option wiring looks fine; optionally surface missing-Lua as a warningThe
CRANE_WITH_LUAoption andfind_package(Lua QUIET)block are sane and keep Lua fully optional. You may want to emit amessage(WARNING ...)whenCRANE_WITH_LUAisONbutLUA_FOUNDis false so users notice that their Lua config is being ignored instead of silently disabling Lua features.Also applies to: 279-288
src/Utilities/PublicHeader/String.cpp (1)
544-593: TimeStr2Mins logic is reasonable; consider clarifying 2‑field format and rejecting negative partsImplementation looks correct for the supported forms (
D-HH:MM:SS,HH:MM:SS,MM:SS,MM) and is defensive about malformed input by returning-1. Two small optional refinements:
- The 2‑field case is interpreted as
minutes:seconds, nothours:minutes. If that’s intentional, it’d be good to document it at the call sites / docs; otherwise you might want to swap the mapping.- Currently any
'-'is allowed in the time part, so inputs like"10:-1:00"will parse with a negative component. If you want to treat such inputs as invalid, you could either disallow'-'after the day prefix, or explicitly reject negative values afterSimpleAtoi.src/CraneCtld/CraneCtld.cpp (1)
36-36: Lua hook wiring into ctld init/config looks good; consider explicit cleanupThe new
JobSubmitLuaScriptconfig field and conditional initialization ofg_lua_poolare wired in cleanly and fail fast onInit()failure, which is appropriate for a core control daemon.For symmetry with other globals, you might optionally add
g_lua_pool.reset();inDestroyCtldGlobalVariables()so the Lua thread pool is explicitly torn down alongsideg_thread_pooland the DB clients, instead of relying solely on static destruction at process exit.Also applies to: 253-255, 883-886
src/CraneCtld/Lua/LuaJobHandler.h (3)
20-21: Misleading comment about include order.The comment "Precompiled header comes first!" appears after
CtldPublicDefs.his included. IfCtldPublicDefs.his the precompiled header, the comment should precede it. Otherwise, remove or relocate this comment.
34-34:explicitis unnecessary on default constructor.The
explicitkeyword has no effect on a default constructor since there's no parameter to prevent implicit conversions from.Suggested fix
- explicit LuaJobHandler() = default; + LuaJobHandler() = default;
64-64: Globalg_lua_pooldeclared outside namespace.The global
g_lua_poolis declared in the global namespace while related code is inCtld. Consider moving it inside theCtldnamespace for consistency, or document why it needs global scope.src/CraneCtld/Lua/LuaJobHandler.cpp (1)
44-61: Querying all tasks on eachpairs(crane.jobs)iteration.Every time Lua iterates over
crane.jobs,QueryTasksInRamfetches all tasks. For large job counts, this could be expensive. Consider caching or documenting this behavior for script authors.src/Utilities/PublicHeader/Lua.cpp (2)
28-28: Redundant temporary inmake_unique.
std::make_unique<sol::state>(sol::state())creates a temporarysol::stateand then move-constructs into the unique_ptr. Usestd::make_unique<sol::state>()for direct construction.Suggested fix
- m_lua_state_ptr_ = std::make_unique<sol::state>(sol::state()); + m_lua_state_ptr_ = std::make_unique<sol::state>();Apply the same fix at line 46.
41-53: Duplicated initialization logic.Lines 41-53 duplicate the initialization logic from
Init(). Extract a private helper to avoid repetition and ensure consistency.Suggested refactor
+void LuaEnvironment::InitState_() { + m_lua_state_ptr_ = std::make_unique<sol::state>(); + m_lua_state_ptr_->open_libraries(sol::lib::base, sol::lib::package, + sol::lib::table, sol::lib::string); + m_crane_table_ = (*m_lua_state_ptr_)["crane"].valid() + ? (*m_lua_state_ptr_)["crane"] + : (*m_lua_state_ptr_).create_named_table("crane"); + RegisterFunctions_(); +} + bool LuaEnvironment::Init(const std::string& script) { #ifdef HAVE_LUA m_lua_script_ = script; - m_lua_state_ptr_ = std::make_unique<sol::state>(sol::state()); - m_lua_state_ptr_->open_libraries(sol::lib::base, sol::lib::package, - sol::lib::table, sol::lib::string); - m_crane_table_ = (*m_lua_state_ptr_)["crane"].valid() - ? (*m_lua_state_ptr_)["crane"] - : (*m_lua_state_ptr_).create_named_table("crane"); - RegisterFunctions_(); + InitState_(); #endif return true; }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (12)
CMakeLists.txtdocs/en/reference/lua_guide.mddocs/zh/reference/lua_guide.mdsrc/CraneCtld/CraneCtld.cppsrc/CraneCtld/Lua/LuaJobHandler.cppsrc/CraneCtld/Lua/LuaJobHandler.hsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/TaskScheduler.cppsrc/CraneCtld/TaskScheduler.hsrc/Utilities/PublicHeader/Lua.cppsrc/Utilities/PublicHeader/String.cppsrc/Utilities/PublicHeader/include/crane/Lua.h
🚧 Files skipped from review as they are similar to previous changes (1)
- src/CraneCtld/TaskScheduler.cpp
🧰 Additional context used
🧠 Learnings (12)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/Lua/LuaJobHandler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-07-01T08:00:05.383Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-09T01:54:39.465Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/Lua/LuaJobHandler.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/TaskScheduler.h
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method after PMIx registration, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup resources. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly() before returning from the function.
Applied to files:
src/CraneCtld/TaskScheduler.h
📚 Learning: 2025-05-08T07:39:29.207Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 469
File: src/Craned/TaskManager.cpp:1162-1175
Timestamp: 2025-05-08T07:39:29.207Z
Learning: In CraneSched, when a task fails in the LaunchTaskInstanceMt_ method, ActivateTaskStatusChangeAsync_ does not automatically release PMIx resources or cgroup. Explicit cleanup is required by calling g_pmix_server->DeregisterTask() and g_cg_mgr->ReleaseCgroupByTaskIdOnly().
Applied to files:
src/CraneCtld/TaskScheduler.h
🧬 Code graph analysis (4)
src/CraneCtld/CraneCtld.cpp (1)
src/Craned/Core/CranedPublicDefs.h (1)
g_config(160-160)
src/Utilities/PublicHeader/include/crane/Lua.h (1)
src/Utilities/PublicHeader/String.cpp (2)
SetCurrentThreadName(371-379)SetCurrentThreadName(371-371)
src/CraneCtld/TaskScheduler.h (1)
src/CraneCtld/TaskScheduler.cpp (4)
JobSubmitLuaCheck(1563-1569)JobSubmitLuaCheck(1563-1564)JobModifyLuaCheck(1571-1596)JobModifyLuaCheck(1571-1572)
src/Utilities/PublicHeader/Lua.cpp (2)
src/Utilities/PublicHeader/include/crane/Lua.h (1)
Init(62-71)src/Utilities/PublicHeader/String.cpp (2)
TimeStr2Mins(544-593)TimeStr2Mins(544-544)
🪛 LanguageTool
docs/en/reference/lua_guide.md
[style] ~35-~35: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...b submission rejected due to error. - crane.* (Crane error codes) — Reject submi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
🔇 Additional comments (6)
src/CraneCtld/Lua/LuaJobHandler.cpp (4)
73-78: Verify Lua function signature expectations.The
crane_job_submitLua function is called with(task, part_list, task->uid). Ensure the Lua script contract and documentation match this signature. If the Lua function expects different arguments, runtime errors will occur.
116-124:task_infois unused afterSetFieldsOfTaskInfo.
task_infois populated at line 117 and passed tomodify()at line 124, but the Lua function receives bothtask(the mutableTaskInCtld*) andtask_info(a copy). Verify this is intentional—if Lua modifiestask_info, those changes are lost.
181-486: LGTM on type registrations.The sol2 usertype bindings for
ResourceView,TaskInCtld,TaskInfo,PartitionInfo, andReservationInfoare well-structured and expose the necessary properties to Lua.
191-206: Capturelua_envby value or store a reference to the underlying state.The lambdas capture
lua_envby reference at lines 191-206, 246, 272, 335, and 390. While theLuaEnvironmentobject remains valid (held byunique_ptr), the reference parameter goes out of scope whenRegisterTypes_returns. When sol2 invokes these lambdas during Lua property access (synchronously duringLoadLuaScript()and Lua function calls), they access a dangling reference parameter—undefined behavior even though the target survives.Either:
- Capture
lua_envby value:[lua_env]instead of[&]- Or store and reuse
lua_state: capture it directly and avoid the parameter reference entirelyLikely an incorrect or invalid review comment.
src/Utilities/PublicHeader/Lua.cpp (2)
174-181: LGTM on error table registration.The
RegisterOutputErrTab_function correctly iterates over the protobuf enum descriptor and exposes all error codes to Lua.
183-193: LGTM on function validation.The
CheckLuaScriptFunctions_correctly validates that required functions exist and are callable in the Lua state.
| std::optional<std::future<CraneRichError>> JobSubmitLuaCheck( | ||
| TaskInCtld* task); | ||
|
|
||
| std::optional<std::future<CraneRichError>> JobModifyLuaCheck( | ||
| task_id_t task_id); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JobModifyLuaCheck implementation risks use‑after‑free on internal task maps
The public API shape (std::optional<std::future<CraneRichError>>) is fine, but based on the TaskScheduler.cpp snippet, JobModifyLuaCheck currently does:
- Lock pending/running map mutex,
- Find
task_id, - Capture the map iterator by value in a lambda submitted to
g_lua_pool->ExecuteLuaScript, - Release the mutex before the async lambda runs, and inside the lambda call
pd_iter->second.get()/rn_iter->second.get().
If the underlying m_pending_task_map_ or m_running_task_map_ entry is erased or the map rehashes before the Lua task runs, that iterator (and the TaskInCtld* obtained from it) become invalid, leading to undefined behavior.
It’d be safer to:
- Capture a stable object for Lua (e.g., a copy/struct projection of just the fields needed by Lua), or
- Capture a
TaskInCtld*plus some lifetime guarantee (e.g., keep the job in a separate holder or delay erasure until Lua completes), or - Rework the API so Lua validation is done synchronously under the same lock scope.
🤖 Prompt for AI Agents
In src/CraneCtld/TaskScheduler.h around lines 781-786 the current
JobModifyLuaCheck implementation (in TaskScheduler.cpp) captures a map iterator
and releases the map mutex before the async Lua lambda runs, risking
use-after-free if the map entry is erased or the map rehashes; fix this by
avoiding capturing iterators: either (1) construct a small POD struct containing
only the fields Lua needs and capture that copy in the lambda, or (2) change the
map value to a shared ownership type (e.g., std::shared_ptr<TaskInCtld>) and
capture a shared_ptr to the task in the lambda so the task lifetime is extended
until Lua completes, or (3) perform the Lua validation synchronously under the
lock; implement one of these options, remove any direct iterator capture from
the async lambda, and ensure the mutex is released only after a safe copy or
ownership transfer has been captured.
| #include <BS_thread_pool.hpp> | ||
| #include <memory> | ||
|
|
||
| #include "crane/Logger.h" | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Include header for util::SetCurrentThreadName to make LuaPool self-contained
LuaPool::Init uses util::SetCurrentThreadName("LuaThreadPool") in the thread-pool worker lambda, but this header only includes BS_thread_pool.hpp and crane/Logger.h. Unless util::SetCurrentThreadName is guaranteed to be visible via transitive includes, this will fail to compile.
Given SetCurrentThreadName is defined in util (see src/Utilities/PublicHeader/String.cpp), it’s safer for this header to include its declaration explicitly:
#include "crane/Logger.h"
#include "crane/String.h" // for util::SetCurrentThreadNameor to move the SetCurrentThreadName usage into a .cpp that already includes crane/String.h.
Also applies to: 62-71, 108-109
| bool LuaEnvironment::Init(const std::string& script) { | ||
| #ifdef HAVE_LUA | ||
| m_lua_script_ = script; | ||
| m_lua_state_ptr_ = std::make_unique<sol::state>(sol::state()); | ||
| m_lua_state_ptr_->open_libraries(sol::lib::base, sol::lib::package, | ||
| sol::lib::table, sol::lib::string); | ||
| m_crane_table_ = (*m_lua_state_ptr_)["crane"].valid() | ||
| ? (*m_lua_state_ptr_)["crane"] | ||
| : (*m_lua_state_ptr_).create_named_table("crane"); | ||
| RegisterFunctions_(); | ||
| #endif | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Init() always returns true even when HAVE_LUA is undefined.
When HAVE_LUA is not defined, Init() does nothing but returns true. This could mislead callers into thinking Lua is available. Consider returning false or logging a warning when Lua support is disabled.
Suggested fix
bool LuaEnvironment::Init(const std::string& script) {
+#ifndef HAVE_LUA
+ CRANE_WARN("Lua support is not enabled");
+ return false;
+#else
-#ifdef HAVE_LUA
m_lua_script_ = script;
m_lua_state_ptr_ = std::make_unique<sol::state>();
m_lua_state_ptr_->open_libraries(sol::lib::base, sol::lib::package,
sol::lib::table, sol::lib::string);
m_crane_table_ = (*m_lua_state_ptr_)["crane"].valid()
? (*m_lua_state_ptr_)["crane"]
: (*m_lua_state_ptr_).create_named_table("crane");
RegisterFunctions_();
-#endif
return true;
+#endif
}🤖 Prompt for AI Agents
In src/Utilities/PublicHeader/Lua.cpp around lines 25 to 37, Init() currently
always returns true even when HAVE_LUA is undefined; update the function so that
when HAVE_LUA is not defined it returns false (and optionally logs a warning
about Lua support being disabled) instead of indicating success, and keep the
existing initialization path under the HAVE_LUA block returning true on success.
|
|
||
| if (!g_config.JobSubmitLuaScript.empty()) { | ||
| g_lua_pool = std::make_unique<crane::LuaPool>(); | ||
| if (!g_lua_pool->Init()) std::exit(1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
加下日志吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
里面有日志
| } | ||
|
|
||
| std::list<task_id_t> task_ids; | ||
| std::list<std::pair<task_id_t, std::future<CraneRichError>>> futures; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些地方是要求用list吗?最好换成vector,这里都知道size吧
src/CraneCtld/Lua/LuaJobHandler.cpp
Outdated
|
|
||
| crane::grpc::TaskInfo task_info; | ||
| task->SetFieldsOfTaskInfo(&task_info); | ||
| std::list<crane::grpc::PartitionInfo> part_list; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
以及这里的list
|
可以先提测 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
662-747: FixModifyTaskLua gating and incorrect task id reporting.Two problems in the new
ModifyTaskLua integration:
Tasks with
JobModifyLuaCheckreturning empty optional are silently dropped.
You currently do:std::list<task_id_t> task_ids; std::vector<std::pair<task_id_t, std::future<CraneRichError>>> futures; for (const auto task_id : request->task_ids()) { auto fut = g_task_scheduler->JobModifyLuaCheck(task_id); if (fut) futures.emplace_back(task_id, std::move(fut.value())); } for (auto& [task_id, fut] : futures) { auto rich_err = fut.get(); if (rich_err.code() != CraneErrCode::SUCCESS) { // push into not_modified_* } else { task_ids.emplace_back(task_id); } }If
JobModifyLuaCheckever returnsstd::nullopt(e.g., Lua disabled or job not found), thattask_idis neither added totask_idsnor recorded innot_modified_tasks/not_modified_reasons; subsequent attribute handlers iterate onlytask_ids. This makes some requested task IDs disappear from the reply.Either guarantee
JobModifyLuaCheckalways returns a future, or (more robustly) treatstd::nulloptas “no Lua check; allow modification” by inserting such IDs directly intotask_ids.Hold attribute branch reports
falseinstead of the actual task id.
In theHoldbranch:} else { response->add_not_modified_tasks(false); response->add_not_modified_reasons( fmt::format("Failed to hold/release job: {}.", CraneErrStr(err))); }
falseis converted to0, so clients see a bogus task id rather than the realtask_id.A possible fix that addresses both issues:
Proposed fix
- std::list<task_id_t> task_ids; - std::vector<std::pair<task_id_t, std::future<CraneRichError>>> futures; - futures.reserve(request->task_ids().size()); - for (const auto task_id : request->task_ids()) { - auto fut = g_task_scheduler->JobModifyLuaCheck(task_id); - if (fut) futures.emplace_back(task_id, std::move(fut.value())); - } - - for (auto& [task_id, fut] : futures) { - auto rich_err = fut.get(); - if (rich_err.code() != CraneErrCode::SUCCESS) { - response->add_not_modified_tasks(task_id); - if (rich_err.description().empty()) - response->add_not_modified_reasons(CraneErrStr(rich_err.code())); - else - response->add_not_modified_reasons(rich_err.description()); - } else { - task_ids.emplace_back(task_id); - } - } + std::list<task_id_t> task_ids; + std::vector<std::pair<task_id_t, std::future<CraneRichError>>> futures; + futures.reserve(request->task_ids().size()); + + for (const auto task_id : request->task_ids()) { + auto fut = g_task_scheduler->JobModifyLuaCheck(task_id); + if (!fut) { + // No Lua check (e.g. Lua disabled) – allow the modification path. + task_ids.emplace_back(task_id); + continue; + } + futures.emplace_back(task_id, std::move(fut.value())); + } + + for (auto& [task_id, fut] : futures) { + auto rich_err = fut.get(); + if (rich_err.code() != CraneErrCode::SUCCESS) { + response->add_not_modified_tasks(task_id); + if (rich_err.description().empty()) + response->add_not_modified_reasons(CraneErrStr(rich_err.code())); + else + response->add_not_modified_reasons(rich_err.description()); + } else { + task_ids.emplace_back(task_id); + } + } @@ } else if (request->attribute() == ModifyTaskRequest::Hold) { @@ - for (auto& [task_id, res] : results) { + for (auto& [task_id, res] : results) { err = res.get(); if (err == CraneErrCode::SUCCESS) { response->add_modified_tasks(task_id); @@ } else { - response->add_not_modified_tasks(false); + response->add_not_modified_tasks(task_id); response->add_not_modified_reasons( fmt::format("Failed to hold/release job: {}.", CraneErrStr(err))); }
♻️ Duplicate comments (2)
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
512-549: Multi-submit (SubmitBatchTasks) still bypasses Lua validation.This path still enqueues
countcopies oftask_to_ctlddirectly intoSubmitTaskToSchedulerwithout callingJobSubmitLuaCheckper instance:for (int i = 0; i < task_count; i++) { auto task = std::make_unique<TaskInCtld>(); task->SetFieldsByTaskToCtld(task_to_ctld); auto result = g_task_scheduler->SubmitTaskToScheduler(std::move(task)); results.emplace_back(std::move(result)); }If sites rely on Lua scripts to enforce submit‑time policy, callers can continue to circumvent those policies by using this multi-submit RPC instead of the single-submit APIs.
Align this code path with
SubmitBatchTaskby runningJobSubmitLuaCheckfor each synthetic task and, on failure, returning an appropriate per‑task error entry (and a sentinel ID) while skipping scheduler submission for that instance.docs/en/reference/lua_guide.md (1)
73-79: Synchronize function names, types, and typos with the actual Lua bindings.Several inconsistencies remain between this guide and the bindings / examples:
- The logging row shows
crane.log_error("msg") / log_info("msg") / log_debug("msg"); the latter two should becrane.log_info/crane.log_debug.- In
job_descattributes,included_nodesandexcluded_nodesare documented asstring, but the example uses them as arrays (table.concat(job_desc.included_nodes, ", ")etc.). The type should reflect this (e.g.tableortable(string list)).- In
crane.reservations,partitonis misspelled; should bepartition.- The “meta (batch_meta / interactive_meta)” snippet uses
job_desc.bash_metainstead ofbatch_meta.Aligning these with the real API will make the doc less confusing for users relying on it as a reference.
Also applies to: 86-112, 175-191, 339-345
🧹 Nitpick comments (2)
src/CraneCtld/CraneCtld.cpp (1)
36-36: Lua pool initialization path is sound; consider explicit teardown for symmetry.The flow
- parses
JobSubmitLuaScriptintog_config.JobSubmitLuaScript, and- initializes
g_lua_poolonly when the script path is non-empty, failing fast ifInit()returns false,which is correct and keeps Lua completely disabled by default.
For cleaner lifetime symmetry with other globals in
DestroyCtldGlobalVariables(), you may optionally add an explicitg_lua_pool.reset();there so Lua resources are released alongside the scheduler, DB client, etc., rather than relying solely on static destruction.Also applies to: 253-255, 896-899
src/CraneCtld/Lua/LuaJobHandler.h (1)
32-64: LuaJobHandler API is fine; consider relocatingg_lua_poolout of this header.The
LuaJobHandlerinterface (staticJobSubmit/JobModifyplus private registration helpers underHAVE_LUA) looks good.For
inline std::unique_ptr<crane::LuaPool> g_lua_pool;, you might consider moving this global into a more central “globals” header or a.cppwith anexterndeclaration here, to keep this header focused on the handler API and avoid exposing yet another global from a feature-specific header. Not required functionally, but it will make ownership of global state easier to track.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (24)
CMakeLists.txtdependencies/cmake/CMakeLists.txtdependencies/cmake/lua_sol/CMakeLists.txtdocs/en/reference/lua_guide.mddocs/zh/reference/lua_guide.mdetc/config.yamlmkdocs.yamlprotos/Crane.protoprotos/PublicDefs.protosrc/CraneCtld/CMakeLists.txtsrc/CraneCtld/CraneCtld.cppsrc/CraneCtld/CtldPublicDefs.cppsrc/CraneCtld/CtldPublicDefs.hsrc/CraneCtld/Lua/LuaJobHandler.cppsrc/CraneCtld/Lua/LuaJobHandler.hsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/TaskScheduler.cppsrc/CraneCtld/TaskScheduler.hsrc/Utilities/PublicHeader/CMakeLists.txtsrc/Utilities/PublicHeader/Lua.cppsrc/Utilities/PublicHeader/String.cppsrc/Utilities/PublicHeader/include/crane/Lua.hsrc/Utilities/PublicHeader/include/crane/PublicHeader.hsrc/Utilities/PublicHeader/include/crane/String.h
🚧 Files skipped from review as they are similar to previous changes (13)
- mkdocs.yaml
- src/CraneCtld/CtldPublicDefs.cpp
- protos/Crane.proto
- src/CraneCtld/TaskScheduler.h
- protos/PublicDefs.proto
- src/Utilities/PublicHeader/CMakeLists.txt
- dependencies/cmake/lua_sol/CMakeLists.txt
- src/CraneCtld/TaskScheduler.cpp
- etc/config.yaml
- docs/zh/reference/lua_guide.md
- CMakeLists.txt
- src/Utilities/PublicHeader/include/crane/Lua.h
- dependencies/cmake/CMakeLists.txt
🧰 Additional context used
🧠 Learnings (11)
📚 Learning: 2025-05-09T01:54:21.256Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/CraneCtld/CranedKeeper.cpp:51-53
Timestamp: 2025-05-09T01:54:21.256Z
Learning: The ConfigureCraned function in src/CraneCtld/CranedKeeper.cpp is called from a thread pool, so there's no need to worry about it blocking the gRPC completion queue thread.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/Lua/LuaJobHandler.cpp
📚 Learning: 2025-06-30T08:43:44.470Z
Learnt from: huerni
Repo: PKUHPC/CraneSched PR: 537
File: src/Craned/TaskManager.cpp:145-147
Timestamp: 2025-06-30T08:43:44.470Z
Learning: In the CraneSched codebase, src/Craned/Craned.cpp guarantees that g_config.CranedRes.contains(g_config.CranedIdOfThisNode) through explicit validation during startup. The code checks if g_config.CranedRes.contains(g_config.Hostname) and exits the process if not found, then sets g_config.CranedIdOfThisNode = g_config.Hostname. TaskManager constructor is called after this validation, so g_config.CranedRes[g_config.CranedIdOfThisNode] is guaranteed to be valid.
Applied to files:
src/CraneCtld/CraneCtld.cppsrc/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-07-01T08:00:05.383Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Supervisor/TaskManager.cpp:1-1809
Timestamp: 2025-07-01T08:00:05.383Z
Learning: In TaskManager.cpp, the security model relies on administrator-configured command templates in ParseOCICmdPattern_ and system-provided usernames in ParseFilePathPattern_, with file permissions using the user's own uid/gid for privilege isolation. The user L-Xiafeng considers this security boundary sufficient and chooses not to fix job name-related path issues at this location.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/Lua/LuaJobHandler.cpp
📚 Learning: 2025-06-07T10:47:59.071Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 520
File: src/CraneCtld/CranedKeeper.cpp:416-417
Timestamp: 2025-06-07T10:47:59.071Z
Learning: In src/CraneCtld/CranedKeeper.h, the m_shutting_down_ member in CranedStub class is declared as std::atomic_bool, making it thread-safe for concurrent access without additional synchronization.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-09T01:54:39.465Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 496
File: src/Craned/TaskManager.cpp:1509-1517
Timestamp: 2025-05-09T01:54:39.465Z
Learning: The CraneSched project uses C++23, which supports Class Template Argument Deduction (CTAD) for standard containers like std::set and includes ranges support, making std::ranges::views::keys valid without additional headers.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:05:49.032Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CranedKeeper.cpp:503-506
Timestamp: 2025-05-02T07:05:49.032Z
Learning: In CraneCtld/CranedKeeper.cpp, using m_unavail_craned_set_.at() is intentional as the key is guaranteed to exist by design, and crashing on missing key is preferred to silently handling it (fail-fast approach).
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cppsrc/CraneCtld/Lua/LuaJobHandler.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` in `VaultClientWrapper` is implemented as a `phmap::parallel_flat_hash_set<std::string>`, which is a thread-safe container designed for concurrent access, making it safe to use in multithreaded contexts without additional synchronization.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-26T11:00:54.563Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 399
File: src/Craned/Craned/SupervisorKeeper.cpp:174-178
Timestamp: 2025-05-26T11:00:54.563Z
Learning: The CraneSched project uses C++23 standard, allowing the use of modern C++ features like std::ranges::to and other C++23 language features and library components.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-03-31T09:29:40.388Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 456
File: src/CraneCtld/Tls/VaultClientWrapper.cpp:116-137
Timestamp: 2025-03-31T09:29:40.388Z
Learning: In the CraneSched project, the `m_allowed_certs_` is an `std::unordered_set<std::string>` which is not inherently thread-safe. The `IsCertAllowed` method in `VaultClientWrapper` involves both read and write operations on this container and could cause race conditions when called from multiple threads.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-12-08T08:11:40.332Z
Learnt from: L-Xiafeng
Repo: PKUHPC/CraneSched PR: 727
File: src/Craned/Supervisor/TaskManager.cpp:1401-1407
Timestamp: 2025-12-08T08:11:40.332Z
Learning: In src/Craned/Supervisor/TaskManager.cpp, StepInstance::Prepare() is always called before any task's Spawn() method in the execution flow (via EvGrpcExecuteTaskCb_). If Prepare() fails, tasks are immediately finished and Spawn() is never invoked. For Crun tasks, StepInstance::Prepare() guarantees that x11_meta is set (even if x11 is false), so accessing x11_meta.value() in Spawn() when both IsCrun() and x11 are true is safe without additional has_value() checks.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
📚 Learning: 2025-05-02T07:06:36.103Z
Learnt from: RileyWen
Repo: PKUHPC/CraneSched PR: 475
File: src/CraneCtld/CtldGrpcServer.cpp:113-118
Timestamp: 2025-05-02T07:06:36.103Z
Learning: In CraneSched, gRPC methods should generally return Status::OK even when handling error conditions, as non-OK statuses cause the connection to terminate. Error information should be communicated within the RPC response payload instead.
Applied to files:
src/CraneCtld/RpcService/CtldGrpcServer.cpp
🧬 Code graph analysis (3)
src/CraneCtld/CraneCtld.cpp (1)
src/Craned/Core/CranedPublicDefs.h (1)
g_config(160-160)
src/Utilities/PublicHeader/Lua.cpp (2)
src/Utilities/PublicHeader/include/crane/Lua.h (1)
Init(62-71)src/Utilities/PublicHeader/String.cpp (2)
TimeStr2Mins(544-593)TimeStr2Mins(544-544)
src/CraneCtld/Lua/LuaJobHandler.cpp (1)
src/CraneCtld/CtldPublicDefs.h (1)
node_num(505-505)
🪛 LanguageTool
docs/en/reference/lua_guide.md
[style] ~35-~35: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...b submission rejected due to error. - crane.* (Crane error codes) — Reject submi...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: build
🔇 Additional comments (5)
src/Utilities/PublicHeader/include/crane/PublicHeader.h (1)
264-264: LGTM! Trailing commas improve maintainability.Adding trailing commas to array entries makes future additions cleaner and reduces diff noise.
Also applies to: 271-271
src/Utilities/PublicHeader/String.cpp (1)
544-593: Time string parsing logic looks consistent with SLURM-style formats.The implementation correctly handles
D-HH:MM:SS/HH:MM:SS/MM:SS/MMand the “infinite” sentinels, with simple validation and minute rounding viasecs > 0 ? 1 : 0. No issues with the core parsing logic; see the header comment for the declaration/definition type alignment.src/CraneCtld/CtldPublicDefs.h (1)
186-187: JobSubmitLuaScript config field is well-placed.Adding
JobSubmitLuaScriptas a plainstd::stringtoggle for Lua integration fits the existing config structure and defaults safely to “disabled” when empty.src/CraneCtld/CMakeLists.txt (1)
22-23: CMake wiring for Lua integration is reasonable.Including
Lua/LuaJobHandler.*incranectldand guarding Lua libraries and headers withLUA_FOUNDmatches the intended optional-Lua design; this should not affect non-Lua builds.Also applies to: 69-72, 94-96
src/CraneCtld/RpcService/CtldGrpcServer.cpp (1)
28-29: Interactive-task Lua pre-check integration looks correct.The
CforedStreamTASK_REQUESTpath now callsJobSubmitLuaCheckand only skips scheduling when Lua returns a non‑SUCCESSCraneRichError, while still submitting as before when Lua is disabled or returns SUCCESS. The result wiring intoWriteTaskIdReplypreserves previous behavior for non-Lua clusters.Also applies to: 320-347
Summary by CodeRabbit
New Features
Utilities
Documentation
Configuration
✏️ Tip: You can customize this high-level summary in your review settings.