Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/en/command/calloc.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ calloc must be started on a node where `cfored` is running. When the task starts
- **--export string**: Propagate environment variables

### Scheduling Options
- **-d, --dependency string**: Job dependency. Format: `<type>:<job_id>[+<delay>][:<job_id>][,<type>:<job_id>[:<job_id>]]` or `<type>:<job_id>[:<job_id>][?<type>:<job_id>[:<job_id>]]`. Supported types: `after`, `afterok`, `afternotok`, `afterany`. **Note**: For `<delay>`, use time with units (e.g., `10s`, `5m`, `2h`) - do NOT use `HH:MM:SS` format as `:` is the job ID separator. See [Job Dependency](../reference/job_dependency.md) for details
- **--exclusive**: Request exclusive node resources
- **-H, --hold**: Submit job in held state
- **-r, --reservation string**: Use reserved resources
Expand Down
1 change: 1 addition & 0 deletions docs/en/command/cbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ cbatch cbatch_test.sh

### Scheduling Options
- **--begin string**: Start time for the job. Format: `YYYY-MM-DDTHH:MM:SS`
- **-d, --dependency string**: Job dependency. Format: `<type>:<job_id>[+<delay>][:<job_id>][,<type>:<job_id>[:<job_id>]]` or `<type>:<job_id>[:<job_id>][?<type>:<job_id>[:<job_id>]]`. Supported types: `after`, `afterok`, `afternotok`, `afterany`. **Note**: For `<delay>`, use time with units (e.g., `10s`, `5m`, `2h`) - do NOT use `HH:MM:SS` format as `:` is the job ID separator. See [Job Dependency](../reference/job_dependency.md) for details
- **--exclusive**: Request exclusive node resources
- **-H, --hold**: Submit job in held state
- **-r, --reservation string**: Use reserved resources
Expand Down
1 change: 1 addition & 0 deletions docs/en/command/crun.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ crun only supports request parameters via command line. Supported command-line o
- **-C/--config string**: Path to configuration file (default: "/etc/crane/config.yaml")
- **-c/--cpus-per-task float**: Number of CPUs required per task (default: 1)
- **--comment string**: Comment for the job
- **-d/--dependency string**: Job dependency. Format: `<type>:<job_id>[+<delay>][:<job_id>][,<type>:<job_id>[:<job_id>]]` or `<type>:<job_id>[:<job_id>][?<type>:<job_id>[:<job_id>]]`. Supported types: `after`, `afterok`, `afternotok`, `afterany`. **Note**: For `<delay>`, use time with units (e.g., `10s`, `5m`, `2h`) - do NOT use `HH:MM:SS` format as `:` is the job ID separator. See [Job Dependency](../reference/job_dependency.md) for details
- **--debug-level string**: Available debug levels: trace, debug, info (default: "info")
- **-x/--exclude string**: Exclude specific nodes from allocation (comma-separated list)
- **--exclusive**: Exclusive node resources
Expand Down
233 changes: 233 additions & 0 deletions docs/en/reference/job_dependency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,233 @@
# Job Dependency

## Overview

The dependency feature in CraneSched-FrontEnd allows jobs to control their execution timing based on the status of other jobs, enabling job dependency management. Through dependency relationships, you can build complex workflows to ensure jobs execute in the correct order.

## Supported Commands

The dependency feature is available in the following commands:

- `cbatch` - Batch job submission
- `calloc` - Interactive resource allocation
- `crun` - Interactive job execution

## Command Line Parameter

```bash
--dependency, -d <dependency_string>
```

Use the `--dependency` or `-d` parameter when submitting a job to specify dependency relationships.

## Dependency String Format

### Basic Syntax

```
<type>:<job_id>[+<delay>][:<job_id>[+<delay>]]...
```

### Dependency Types

| Type | Description | Trigger Condition |
|------|-------------|-------------------|
| `after` | Start after specified job begins or is cancelled | Dependent job leaves Pending state |
| `afterok` | Start after specified job succeeds | Dependent job completes with exit code 0 |
| `afternotok` | Start after specified job fails | Dependent job completes with non-zero exit code (including timeout, node errors, etc.) |
| `afterany` | Start after specified job completes | Dependent job ends (regardless of success or failure) |

### Delay Time

Optional delay parameter, supporting the following formats:

- **Plain numbers** (default unit is minutes)
- Example: `10` = 10 minutes

- **Time with units**
- `s`, `sec`, `second`, `seconds` - seconds
- `m`, `min`, `minute`, `minutes` - minutes
- `h`, `hour`, `hours` - hours
- `d`, `day`, `days` - days
- `w`, `week`, `weeks` - weeks

!!! warning "Unsupported Format"
**Do NOT use** `HH:MM:SS` or `D-HH:MM:SS` format (e.g., `01:30:00` or `1-01:30:00`). The colon `:` character is reserved as the job ID separator, so such formats will be misinterpreted as multiple job IDs instead of delay time. This may either cause "duplicate task" errors or silently succeed with completely wrong dependency behavior. Always use time units instead (e.g., `90m` or `1h30m`).

### Multiple Dependency Combinations

#### AND Logic (all conditions must be satisfied)

Use `,` to separate different dependency conditions:

```bash
after:100,afterok:101
```

The job will wait for job 100 to start **and** job 101 to complete successfully.

#### OR Logic (any condition satisfied)

Use `?` to separate different dependency conditions:

```bash
afterok:100?afterok:101
```

The job will start after job 100 **or** job 101 completes successfully.

!!! warning "Note"
You cannot mix `,` and `?` in the same dependency string. The system will return an error.

## Usage Examples

### 1. Basic Dependencies

```bash
# Wait for job 100 to start before running
cbatch --dependency after:100 my_script.sh

# Wait for job 100 to complete successfully before running
cbatch --dependency afterok:100 my_script.sh

# Wait for job 100 to fail before running
cbatch --dependency afternotok:100 my_script.sh

# Wait for job 100 to complete before running (regardless of success or failure)
cbatch --dependency afterany:100 my_script.sh
```

### 2. Dependencies with Delays

```bash
# Wait for job 100 to complete successfully, then delay 30 minutes before running
cbatch --dependency afterok:100+30 my_script.sh

# Wait for job 100 to complete successfully, then delay 10 seconds before running
cbatch --dependency afterok:100+10s my_script.sh

# Delay for 1 hour 30 minutes (use unit-based format)
cbatch --dependency afterok:100+90m my_script.sh
```

### 3. Multiple Dependencies

```bash
# Wait for job 100 to start AND jobs 101, 102 to both complete successfully
cbatch --dependency after:100,afterok:101:102 my_script.sh

# Wait for job 100 to start for 10 minutes AND job 101 to complete successfully for 30 minutes
cbatch --dependency after:100+10m,afterok:101+30m my_script.sh

# Wait for job 100 to succeed OR job 101 to fail
cbatch --dependency afterok:100?afternotok:101 my_script.sh

# Wait for jobs 100, 101 to both succeed with 2 hour delay, or job 102 to start immediately
cbatch --dependency afterok:100:101+2h?after:102 my_script.sh
```

### 4. Using in Batch Scripts

You can also use the `#CBATCH` directive in batch scripts:

```bash
#!/bin/bash
#CBATCH --dependency afterok:100
#CBATCH --nodes 2
#CBATCH --time 1:00:00
#CBATCH --output job-%j.out

echo "This job starts after job 100 completes successfully"
# Your job code
```

### 5. Using in Interactive Commands

```bash
# Using dependency with calloc
calloc --dependency afterok:100 -n 4 -N 2

# Using dependency with crun
crun --dependency after:100 -n 1 hostname
```

## Viewing Dependency Status

Use the `ccontrol show job <job_id>` command to view job dependency status:

```bash
ccontrol show job 105
```

### Output Example

```
JobId=105
...
Dependency=PendingDependencies=afterok:100+01:00:00 Status=WaitForAll
```

### Dependency Status Field Descriptions

| Field | Description |
|-------|-------------|
| `PendingDependencies` | Dependencies not yet triggered |
| `DependencyStatus` | Dependency satisfaction status (see table below) |

### Dependency Status Values

| Status | Description |
|--------|-------------|
| `WaitForAll` | Waiting for all dependencies to be satisfied (AND logic) |
| `WaitForAny` | Waiting for any dependency to be satisfied (OR logic) |
| `ReadyAfter <time>` | Will be ready after the specified time |
| `SomeFailed` | Some dependencies failed (AND logic, cannot be satisfied) |
| `AllFailed` | All dependencies failed (OR logic, cannot be satisfied) |

## Error Handling

The system will return errors in the following situations:

| Error Condition | Description | Example |
|----------------|-------------|---------|
| Mixed separators | Cannot use `,` and `?` together | `afterok:100,afterok:101?afterok:102` |
| Format error | Dependency string doesn't conform to syntax | `afterok:` or `after100` |
| Invalid delay format | Delay time format is incorrect | `afterok:100+invalid` |
| Duplicate dependency | Same job ID appears multiple times | `afterok:100:100` |
| Job ID doesn't exist or ended | Dependent job doesn't exist (runtime check) | `afterok:99999` |
| Unsupported time format | Using `:` in delay (misinterpreted as job IDs) | `after:1+00:00:01` or `after:1+00:00:02` (parsed as multiple job IDs, may or may not error) |

### Error Examples

```bash
# Error: Mixed AND and OR separators
$ cbatch --dependency afterok:100,afterok:101?afterok:102 job.sh

# Error: Invalid delay format
$ cbatch --dependency afterok:100+invalid job.sh

# Error: Duplicate job ID
$ cbatch --dependency afterok:100,afternotok:100 job.sh

# Error: Using colon in delay time (misinterpreted as job ID separator)
$ cbatch --dependency after:1+00:00:01 job.sh
# This will be parsed as: after jobs 1, 00, 00, 01
# Error message: "duplicate task 1 in dependencies" (job 1 appears twice)

$ cbatch --dependency after:1+00:00:02 job.sh
# This will be parsed as: after jobs 1, 00, 00, 02
# May succeed but with wrong behavior (waits for jobs 00 and 02, ignores the 1-second delay)

# Correct: Always use time units
$ cbatch --dependency after:1+1s job.sh
$ cbatch --dependency after:1+2s job.sh
```

## Related Commands

- [cbatch](../command/cbatch.md) - Submit batch jobs
- [crun](../command/crun.md) - Run interactive tasks
- [calloc](../command/calloc.md) - Allocate resources and create interactive shell
- [cqueue](../command/cqueue.md) - View job queue status
- [ccontrol](../command/ccontrol.md) - Control jobs and system resources
- [ccancel](../command/ccancel.md) - Cancel jobs
54 changes: 54 additions & 0 deletions docs/en/reference/pending_reason.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
# Job Pending Reasons

## Overview

When a job is in PENDING (queued) state, the system displays the reason why the job cannot run immediately. You can view the pending reason using `cqueue` or `ccontrol show job` commands to understand why the job is waiting.

## Viewing Pending Reasons

### Using cqueue

```bash
cqueue
```

Example output:
```
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
101 CPU job1 user1 PD 0:00 2 (Priority)
102 CPU job2 user1 PD 0:00 4 (Resource)
103 GPU job3 user2 PD 0:00 1 (Dependency)
104 CPU job4 user1 PD 0:00 2 (Held)
```

### Using ccontrol show job

```bash
ccontrol show job 101
```

Example output:
```
JobId=101
...
State=PENDING
Reason=Priority
```

Comment on lines +15 to +37
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add fenced-code languages for example outputs (markdownlint MD040)

The two example output blocks currently use bare code fences, which triggers MD040 and is inconsistent with the rest of the page.

You can fix this by tagging them as plain text, e.g.:

-``` 
+```text
 JOBID    PARTITION  NAME     USER   ST   TIME     NODES  NODELIST(REASON)
 ...

@@
- +text
JobId=101
...
State=PENDING
Reason=Priority

🧰 Tools
🪛 markdownlint-cli2 (0.18.1)

16-16: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


31-31: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents
In docs/en/reference/pending_reason.md around lines 15 to 37, the example output
code fences are missing language tags which triggers markdownlint MD040; update
both fenced code blocks to use a language tag (use "text") so they become
```text ... ``` for the JOBID table block and ```text ... ``` for the ccontrol
show job block, leaving the contents unchanged.

## Pending Reason Descriptions

Pending reasons are listed in judgment order from top to bottom. If a job satisfies multiple conditions simultaneously, the reason that appears first will be displayed.

| Reason | Description | When It Appears |
|--------|-------------|-----------------|
| `Held` | Job is held | Job was submitted in held state or set to held, requires manual release |
| `BeginTime` | Start time not reached | Job has a delayed start time (`--begin` parameter), waiting for specified time |
| `DependencyNeverSatisfied` | Dependency can never be satisfied | Required dependent job to succeed, but it actually failed, dependency conditions cannot be met |
| `Dependency` | Waiting for dependency | Job dependencies have not been satisfied (dependent jobs not completed, not started, etc.) |
| `Resource changed` | Resource configuration changed | Node resources changed during job scheduling, waiting for rescheduling |
| `Reservation deleted` | Reservation was deleted | Reservation originally allocated to the job has been deleted |
| `Reservation changed` | Reservation was changed | Reservation changed during scheduling, waiting for rescheduling |
| `License` | Insufficient licenses | Currently insufficient license resources requested by the job |
| `Resource` | Insufficient resources | Cluster does not have enough resources (CPU, memory, GPU, etc.) to satisfy job requirements |
| `Resource Reserved` | Resources are reserved | Resources needed by the job are reserved by other reservations in future time periods |
| `Priority` | Insufficient priority | Job priority is lower than other queued jobs, or concurrent job limit reached |
1 change: 1 addition & 0 deletions docs/zh/command/calloc.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ calloc 必须在运行 `cfored` 的节点上启动。当任务启动时,会进
- **--export string**: 传播环境变量

### 调度选项
- **-d, --dependency string**: 作业依赖关系。格式:`<type>:<job_id>[+<delay>][:<job_id>][,<type>:<job_id>[:<job_id>]]` 或 `<type>:<job_id>[:<job_id>][?<type>:<job_id>[:<job_id>]]`。支持的类型:`after`、`afterok`、`afternotok`、`afterany`。**注意**:`<delay>` 必须使用带单位的时间格式(如 `10s`、`5m`、`2h`),不要使用 `HH:MM:SS` 格式,因为 `:` 是作业 ID 分隔符。详见 [作业依赖](../reference/job_dependency.md)
- **--exclusive**: 请求独占节点资源
- **-H, --hold**: 以挂起状态提交作业
- **-r, --reservation string**: 使用预留资源
Expand Down
1 change: 1 addition & 0 deletions docs/zh/command/cbatch.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ cbatch cbatch_test.sh

### 调度选项
- **--begin string**: 作业的开始时间。格式:`YYYY-MM-DDTHH:MM:SS`
- **-d, --dependency string**: 作业依赖关系。格式:`<type>:<job_id>[+<delay>][:<job_id>][,<type>:<job_id>[:<job_id>]]` 或 `<type>:<job_id>[:<job_id>][?<type>:<job_id>[:<job_id>]]`。支持的类型:`after`、`afterok`、`afternotok`、`afterany`。**注意**:`<delay>` 必须使用带单位的时间格式(如 `10s`、`5m`、`2h`),不要使用 `HH:MM:SS` 格式,因为 `:` 是作业 ID 分隔符。详见 [作业依赖](../reference/job_dependency.md)
- **--exclusive**: 请求独占节点资源
- **-H, --hold**: 以挂起状态提交作业
- **-r, --reservation string**: 使用预留资源
Expand Down
1 change: 1 addition & 0 deletions docs/zh/command/crun.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ crun只支持通过命令行指定请求参数,支持的命令行选项:
- **-C/--config string**: 配置文件路径(默认"/etc/crane/config.yaml")
- **-c/--cpus-per-task float**: 每个任务所需的CPU数量(默认值为1)
- **--comment string**: 作业的备注
- **-d/--dependency string**: 作业依赖关系。格式:`<type>:<job_id>[+<delay>][:<job_id>][,<type>:<job_id>[:<job_id>]]` 或 `<type>:<job_id>[:<job_id>][?<type>:<job_id>[:<job_id>]]`。支持的类型:`after`、`afterok`、`afternotok`、`afterany`。**注意**:`<delay>` 必须使用带单位的时间格式(如 `10s`、`5m`、`2h`),不要使用 `HH:MM:SS` 格式,因为 `:` 是作业 ID 分隔符。详见 [作业依赖](../reference/job_dependency.md)
- **--debug-level string**: 可用的调试级别:trace、debug、info(默认值为"info")
- **-x/--exclude string**: 从分配中排除特定节点(以逗号分隔的列表)
- **--exclusive**: 独占节点资源
Expand Down
Loading
Loading