Skip to content

Conversation

@huerni
Copy link
Collaborator

@huerni huerni commented Oct 20, 2025

Summary by CodeRabbit

  • New Features
    • Added optional --prolog, --epilog, --task-prolog and --task-epilog options.
    • Global prolog runs during connection setup; global epilog runs after task acknowledgement.
    • Per-task prolog/epilog can be supplied for individual tasks.
    • External commands now support timeouts, signal handling, and capture stdout/stderr.
    • Non-zero command exits are surfaced as backend errors; successes are logged.
    • Prolog/epilog settings are configurable via the application config.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

📝 Walkthrough

Walkthrough

Adds configurable prolog/epilog command execution: new CLI flags (global and per-task), config fields, protobuf task/step fields, StateMachineOfCrun fields, and a RunCommand method that executes external programs with timeout/signal handling; prolog runs at connect and epilog after task ack. (35 words)

Changes

Cohort / File(s) Summary
Command-line flags
internal/crun/cmd.go
Added exported string variables FlagProlog, FlagEpilog, FlagTaskProlog, FlagTaskEpilog and bound them to CLI flags (--prolog, --epilog, --task-prolog, --task-epilog).
Runtime / state machine
internal/crun/crun.go
Added RunCommandArgs type and RunCommand method on StateMachineOfCrun; added public CrunProlog and CrunEpilog fields; RunCommand implements context timeout, signal handling (interrupt/TERM), Setpgid, stdout/stderr capture and exit-code mapping; prolog invoked at StateConnectCfored, epilog invoked at StateWaitAck; task submission now carries TaskProlog/TaskEpilog.
Protobuf messages
protos/PublicDefs.proto
Added string task_prolog = 43; and string task_epilog = 44; to TaskToCtld; added string task_prolog = 27; and string task_epilog = 28; to StepToCtld.
Configuration
internal/util/util.go
Added CrunProlog and CrunEpilog fields to Config (yaml:"CrunProlog", yaml:"CrunEpilog"); minor reordering of neighboring fields.

Sequence Diagram(s)

sequenceDiagram
    participant CLI as CLI
    participant SM as StateMachineOfCrun
    participant Prolog as PrologCmd
    participant Cfored as cfored
    participant Epilog as EpilogCmd

    CLI->>SM: start / connect
    note over SM: StateConnectCfored
    SM->>Prolog: RunCommand(CrunProlog)
    Prolog-->>SM: exitCode / output
    alt non-zero exit
        SM-->>CLI: backend error (terminate)
    end

    SM->>Cfored: connect
    Cfored-->>SM: connected

    CLI->>SM: submit task (includes task_prolog/task_epilog)
    SM->>Cfored: send task
    Cfored-->>SM: task ack

    note over SM: StateWaitAck
    SM->>Epilog: RunCommand(CrunEpilog or TaskEpilog)
    Epilog-->>SM: exitCode / output
    alt non-zero exit
        SM-->>CLI: backend error (terminate)
    end

    SM-->>CLI: continue / success
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • L-Xiafeng
  • RileyWen

Poem

🐰 I nibble prologs before the run,
I mind the timeout, dodge each stun,
I hop to epilogs when tasks are done,
Exit codes counted, carrots won —
A twitch, a cheer — the job's begun.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: Crunprolog/epilog' directly relates to the main changes: introducing prolog/epilog execution support for crun tasks across multiple files (cmd.go, crun.go, util.go, and PublicDefs.proto).
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev/Prolog

📜 Recent review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3e3a181 and d06a667.

📒 Files selected for processing (4)
  • internal/crun/cmd.go
  • internal/crun/crun.go
  • internal/util/util.go
  • protos/PublicDefs.proto
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/crun/cmd.go
  • internal/util/util.go
🔇 Additional comments (4)
protos/PublicDefs.proto (2)

184-186: LGTM!

The new task_prolog and task_epilog fields are properly added to TaskToCtld with unique sequential tags. The string type is appropriate for command paths.


251-252: LGTM!

The new task_prolog and task_epilog fields are properly added to StepToCtld with unique sequential tags, mirroring the task-level fields.

internal/crun/crun.go (2)

116-118: LGTM!

The CrunProlog and CrunEpilog fields are appropriately added to StateMachineOfCrun to store the prolog/epilog command paths.


1212-1214: LGTM!

The task and step creation properly initializes Env and wires TaskProlog/TaskEpilog from the corresponding CLI flags.

Also applies to: 1234-1236


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (5)
internal/crun/cmd.go (2)

63-65: Clarify prolog/epilog semantics; consider argument support.

Flags hold a single string and RunCommand is invoked with Args=nil, so users can’t pass arguments or quotes. Either:

  • Document these as “path to executable/script” (no args), or
  • Support args/shell execution (see RunCommand refactor below).

Do you intend these to accept arguments (e.g., --prolog "/opt/prolog.sh --init A")?


121-123: Improve help text and user expectations.

Help should state precedence (flag overrides config), local execution timing, and default timeout (300s). Example:

- RootCmd.Flags().StringVarP(&FlagProlog, "prolog", "", "", "Prolog of the job")
- RootCmd.Flags().StringVarP(&FlagEpilog, "epilog", "", "", "Epilog of the job")
+ RootCmd.Flags().StringVarP(&FlagProlog, "prolog", "", "", "Local prolog executable/script run before connecting (overrides config CrunProlog). 300s timeout.")
+ RootCmd.Flags().StringVarP(&FlagEpilog, "epilog", "", "", "Local epilog executable/script run after task ack (overrides config CrunEpilog). 300s timeout.")
internal/util/util.go (1)

34-43: Config fields added — add omitempty and brief docs.

To avoid emitting empty fields on marshal and to clarify usage:

- CrunProlog string `yaml:"CrunProlog"`
- CrunEpilog string `yaml:"CrunEpilog"`
+ // Path to local executable/script (arguments optional per CLI behavior).
+ CrunProlog string `yaml:"CrunProlog,omitempty"`
+ CrunEpilog string `yaml:"CrunEpilog,omitempty"`

Also consider validating existence/executability at startup for clearer errors.

Please confirm whether config values may include arguments or should be absolute paths only.

internal/crun/crun.go (2)

114-116: Limit API surface: make fields private unless external use is required.

StateMachineOfCrun.CrunProlog/CrunEpilog look internal; exporting them expands API unnecessarily. Prefer crunProlog/crunEpilog.

Are these accessed by other packages within the module?


122-133: Extend RunCommandArgs to support working dir and optional shell execution.

This enables honoring --chdir/task.Cwd and (optionally) parsing full command strings.

 type RunCommandArgs struct {
   Program    string
   Args       []string
   Envs       map[string]string
   TimeoutSec int
+  // Optional working directory for the subprocess.
+  Dir        string
+  // If true, run via 'sh -c Program' to allow quoted/complex commands.
+  Shell      bool
 }
 
 type RunCommandResult struct {
   ExitCode int
   Output   string
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between babf743 and 1510fe4.

📒 Files selected for processing (3)
  • internal/crun/cmd.go (3 hunks)
  • internal/crun/crun.go (5 hunks)
  • internal/util/util.go (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/crun/crun.go (2)
internal/util/err.go (2)
  • ExitCode (30-30)
  • ErrorBackend (38-38)
internal/crun/cmd.go (2)
  • FlagProlog (63-63)
  • FlagEpilog (64-64)
🔇 Additional comments (2)
internal/crun/cmd.go (1)

23-23: No functional change in imports.

internal/crun/crun.go (1)

26-30: No functional change beyond added imports.

@huerni huerni marked this pull request as draft October 20, 2025 07:09
@huerni huerni marked this pull request as ready for review October 20, 2025 07:34
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
internal/crun/crun.go (3)

210-222: Previous review suggestions not yet addressed.

Earlier reviews recommended:

  • Setting working directory to task.Cwd to align with --chdir
  • Reducing log verbosity to avoid leaking sensitive data
  • Making timeout configurable rather than hardcoded to 300 seconds

These suggestions remain valid and should be implemented.


632-644: Previous review suggestions not yet addressed.

The same concerns raised for prolog execution (working directory, log verbosity, timeout configurability) apply to epilog execution as well.


1051-1110: Previous review suggestions for RunCommand not yet addressed.

A prior comprehensive review identified several robustness issues that remain unaddressed:

  1. Environment variable merging (lines 1067-1073): Current implementation appends duplicates rather than overriding. Should build a map from os.Environ(), apply overrides from runCommandArgs.Envs, then reconstruct.

  2. Working directory: No support for setting cmd.Dir, which is needed for prolog/epilog to honor task.Cwd.

  3. Exit code conventions: Timeouts and signals should map to conventional exit codes (124 for timeout, 128+N for signal N).

  4. Optional shell mode: Consider supporting sh -c execution for full command strings.

  5. Unbounded output: The outBuf can grow without limit, risking OOM. Consider capping output (e.g., 1 MiB).

These improvements would make the command execution more robust and align with standard practices.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1510fe4 and 35fef83.

📒 Files selected for processing (1)
  • internal/crun/crun.go (5 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/crun/crun.go (1)
internal/crun/cmd.go (2)
  • FlagProlog (63-63)
  • FlagEpilog (64-64)
🪛 GitHub Actions: Go Code Quality Check
internal/crun/crun.go

[error] 223-223: go vet: github.com/sirupsen/logrus.Tracef call needs 1 arg but has 2 args

🪛 GitHub Check: go-check
internal/crun/crun.go

[failure] 223-223:
github.com/sirupsen/logrus.Tracef call needs 1 arg but has 2 args

🔇 Additional comments (3)
internal/crun/crun.go (3)

26-26: LGTM: Necessary imports for command execution.

The bytes and os/exec imports support the new RunCommand functionality.

Also applies to: 29-29


114-115: LGTM: Prolog/epilog fields added.

The public CrunProlog and CrunEpilog fields appropriately store the command paths for pre/post execution hooks.


645-645: Fix compilation error: incorrect number of arguments to log.Tracef.

Same issue as Line 223: the format string contains one placeholder (%s) but two values are provided.

Apply this diff:

-		log.Tracef("Epilog '%s' finished successfully.", m.CrunEpilog)
+		log.Tracef("Epilog '%s' finished successfully.", m.CrunEpilog)

Note: If the current code passes ExitCode as a second argument (like Line 223), remove it or add %d to the format string.

Likely an incorrect or invalid review comment.

@huerni huerni force-pushed the dev/Prolog branch 2 times, most recently from a900f8f to 769c54f Compare November 11, 2025 01:58
@huerni huerni force-pushed the dev/Prolog branch 2 times, most recently from ea5568f to 7d5e4eb Compare December 9, 2025 07:07
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
internal/crun/crun.go (2)

120-125: Add Dir field to support working directory control.

Past reviews requested a Dir field to allow prolog/epilog to execute in the correct working directory (particularly task.Cwd). Without this, commands run in the current working directory of the crun process, which may not match the user's intended context.

Apply this diff:

 type RunCommandArgs struct {
 	Program    string
 	Args       []string
 	Envs       map[string]string
 	TimeoutSec int
+	Dir        string
 }

Based on learnings from past reviews.


1045-1101: Improve environment variable handling and exit code extraction.

The current implementation has two notable issues:

  1. Environment merge creates duplicates (lines 1061-1067): Appending to os.Environ() can result in duplicate keys (e.g., PATH=/a and PATH=/b both present), leading to undefined behavior. Build a map to ensure deterministic override semantics.

  2. Exit codes not extracted in all paths:

    • Timeout case (line 1080): ExitCode remains 127 instead of conventional 124
    • Signal case (line 1083): ExitCode remains 127 instead of 128+signal_number
    • Error case (lines 1086-1089): Exit code should be extracted from exec.ExitError if available

Address the environment merge issue:

 	if len(runCommandArgs.Envs) > 0 {
-		envs := os.Environ()
+		envMap := make(map[string]string)
+		for _, kv := range os.Environ() {
+			if idx := strings.IndexByte(kv, '='); idx > 0 {
+				envMap[kv[:idx]] = kv[idx+1:]
+			}
+		}
 		for k, v := range runCommandArgs.Envs {
-			envs = append(envs, fmt.Sprintf("%s=%s", k, v))
+			envMap[k] = v
+		}
+		envs := make([]string, 0, len(envMap))
+		for k, v := range envMap {
+			envs = append(envs, fmt.Sprintf("%s=%s", k, v))
+		}
 		cmd.Env = envs
 	}

For better exit code handling, consider extracting codes from exec.ExitError in the error case and using conventional codes (124 for timeout, 128+N for signals).

Based on learnings from past reviews.

🧹 Nitpick comments (2)
internal/crun/crun.go (2)

208-223: Consider adding working directory and configurable timeout for prolog execution.

Once the Dir field is added to RunCommandArgs, set it to m.task.Cwd to ensure prolog runs in the user's intended directory. Additionally, the hardcoded 300-second timeout could be made configurable via a flag or config field.

Example adjustment (after Dir field is added):

 		ExitCode := m.RunCommand(RunCommandArgs{
 			Program:    m.CrunProlog,
 			Args:       nil,
 			Envs:       m.task.Env,
 			TimeoutSec: 300,
+			Dir:        m.task.Cwd,
 		})

Based on learnings from past reviews.


626-640: Apply the same Dir and timeout improvements to epilog execution.

The epilog execution has the same considerations as the prolog: add Dir: m.task.Cwd once the field is available, and consider making the timeout configurable.

Based on learnings from past reviews.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 769c54f and ea5568f.

📒 Files selected for processing (4)
  • internal/crun/cmd.go (2 hunks)
  • internal/crun/crun.go (6 hunks)
  • internal/util/util.go (1 hunks)
  • protos/PublicDefs.proto (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • internal/crun/cmd.go
  • internal/util/util.go
🧰 Additional context used
🧬 Code graph analysis (1)
internal/crun/crun.go (2)
internal/crun/cmd.go (4)
  • FlagProlog (65-65)
  • FlagEpilog (66-66)
  • FlagTaskProlog (67-67)
  • FlagTaskEpilog (68-68)
internal/util/err.go (2)
  • ExitCode (30-30)
  • ErrorBackend (38-38)
🔇 Additional comments (3)
protos/PublicDefs.proto (1)

183-185: LGTM! Clean protobuf field additions.

The new task_prolog and task_epilog fields are properly numbered and follow protobuf conventions. They align with the prolog/epilog execution support added in internal/crun/crun.go.

internal/crun/crun.go (2)

112-113: LGTM! Struct fields for prolog/epilog commands.

The CrunProlog and CrunEpilog fields are appropriately public and will be populated from config and CLI flags.


1141-1143: LGTM! Task-level prolog/epilog flags properly propagated.

The TaskProlog and TaskEpilog fields are correctly set from CLI flags and will be included in the task submission to the controller.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (4)
internal/crun/crun.go (4)

208-223: Prolog execution lacks working directory; hardcoded timeout.

The prolog is executed without setting a working directory, which may cause issues if the prolog script expects to run in task.Cwd. The 300-second timeout is also hardcoded—consider making it configurable via flag or config.

Once Dir is added to RunCommandArgs, update the call:

 	if len(m.CrunProlog) > 0 {
 		ExitCode := m.RunCommand(RunCommandArgs{
 			Program:    m.CrunProlog,
 			Args:       nil,
 			Envs:       m.task.Env,
 			TimeoutSec: 300,
+			Dir:        m.task.Cwd,
 		})

626-640: Epilog execution: same concerns as prolog.

Apply the same adjustments as prolog—set Dir to m.task.Cwd and consider making the timeout configurable.


120-125: Add Dir field to support working directory control.

The RunCommandArgs struct should include a Dir field to allow callers to specify the working directory, particularly for prolog/epilog execution in the context of task.Cwd.

Apply this diff:

 type RunCommandArgs struct {
 	Program    string
 	Args       []string
 	Envs       map[string]string
 	TimeoutSec int
+	Dir        string
 }

1045-1101: Process not reaped after timeout/signal; exit code may be stale.

After ctx.Done() or sigCh triggers, the code kills the process group but doesn't wait for the done channel. This can result in:

  1. Zombie processes if the process hasn't terminated yet
  2. ProcessState being nil or incomplete when checking exit status

Additionally, the environment merge (lines 1061-1066) appends new values without overriding existing keys, leading to duplicate env vars with non-deterministic behavior.

Apply this diff to ensure the process is reaped and env vars are merged correctly:

 func (m *StateMachineOfCrun) RunCommand(runCommandArgs RunCommandArgs) int {
 	ExitCode := 127
 
 	ctx := context.Background()
 	if runCommandArgs.TimeoutSec > 0 {
 		var cancel context.CancelFunc
 		ctx, cancel = context.WithTimeout(ctx, time.Duration(runCommandArgs.TimeoutSec)*time.Second)
 		defer cancel()
 	}
 
 	sigCh := make(chan os.Signal, 1)
 	signal.Notify(sigCh, os.Interrupt, syscall.SIGTERM)
 	defer signal.Stop(sigCh)
 
 	cmd := exec.CommandContext(ctx, runCommandArgs.Program, runCommandArgs.Args...)
 
 	if len(runCommandArgs.Envs) > 0 {
-		envs := os.Environ()
+		envMap := make(map[string]string)
+		for _, kv := range os.Environ() {
+			if idx := strings.IndexByte(kv, '='); idx > 0 {
+				envMap[kv[:idx]] = kv[idx+1:]
+			}
+		}
 		for k, v := range runCommandArgs.Envs {
-			envs = append(envs, fmt.Sprintf("%s=%s", k, v))
+			envMap[k] = v
+		}
+		envs := make([]string, 0, len(envMap))
+		for k, v := range envMap {
+			envs = append(envs, fmt.Sprintf("%s=%s", k, v))
 		}
 		cmd.Env = envs
 	}
 
+	if runCommandArgs.Dir != "" {
+		cmd.Dir = runCommandArgs.Dir
+	}
+
 	cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
 
 	if err := cmd.Start(); err != nil {
 		log.Errorf("Failed to start command: %v", err.Error())
 		return -1
 	}
 
 	done := make(chan error, 1)
 	go func() { done <- cmd.Wait() }()
 
 	select {
 	case <-ctx.Done():
 		_ = syscall.Kill(-cmd.Process.Pid, syscall.SIGKILL)
+		<-done // Wait for process to be reaped
+		ExitCode = 124
 
 	case <-sigCh:
 		_ = syscall.Kill(-cmd.Process.Pid, syscall.SIGTERM)
+		<-done // Wait for process to be reaped
 
 	case err := <-done:
 		if err != nil {
 			log.Errorf("Failed to execute command: %v", err.Error())
 		}
 	}
 
 	if cmd.ProcessState != nil {
 		if status, ok := cmd.ProcessState.Sys().(syscall.WaitStatus); ok {
 			if status.Exited() {
 				ExitCode = status.ExitStatus()
+			} else if status.Signaled() {
+				ExitCode = 128 + int(status.Signal())
 			}
 		}
 	}
 
 	return ExitCode
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ea5568f and 7d5e4eb.

📒 Files selected for processing (3)
  • internal/crun/cmd.go (2 hunks)
  • internal/crun/crun.go (6 hunks)
  • protos/PublicDefs.proto (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
internal/crun/crun.go (2)
internal/crun/cmd.go (4)
  • FlagProlog (65-65)
  • FlagEpilog (66-66)
  • FlagTaskProlog (67-67)
  • FlagTaskEpilog (68-68)
internal/util/err.go (2)
  • ExitCode (30-30)
  • ErrorBackend (38-38)
🔇 Additional comments (5)
protos/PublicDefs.proto (1)

183-185: LGTM!

The new task_prolog and task_epilog fields are correctly added with sequential tag numbers and appropriate string type for command paths.

internal/crun/cmd.go (1)

65-69: LGTM!

Flag variable declarations are clear and follow the existing naming conventions.

internal/crun/crun.go (3)

1141-1143: LGTM!

The TaskProlog and TaskEpilog fields are correctly wired from CLI flags to the TaskToCtld struct for propagation to the backend.


111-113: LGTM!

The CrunProlog and CrunEpilog fields are appropriately added to the state machine struct.


198-206: LGTM on priority handling.

The CLI flag correctly overrides config values, establishing proper precedence (CLI > config).

@huerni huerni force-pushed the dev/Prolog branch 2 times, most recently from db4bdeb to 3e3a181 Compare December 25, 2025 02:39
m.err = util.ErrorBackend
}

if len(m.CrunEpilog) > 0 {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个是只有正常退出才触发吗?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

只要step启动 不管怎么退出都会触发 如果crun前端就失败了,不会触发 StateWaitAck 只有任务正常退出才会触发吗?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants