-
Notifications
You must be signed in to change notification settings - Fork 32
feat: dependency #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
NamelessOIer
wants to merge
3
commits into
master
Choose a base branch
from
dev/dependency_re
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
feat: dependency #742
Changes from all commits
Commits
Show all changes
3 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,233 @@ | ||
| # Job Dependency | ||
|
|
||
| ## Overview | ||
|
|
||
| The dependency feature in CraneSched-FrontEnd allows jobs to control their execution timing based on the status of other jobs, enabling job dependency management. Through dependency relationships, you can build complex workflows to ensure jobs execute in the correct order. | ||
|
|
||
| ## Supported Commands | ||
|
|
||
| The dependency feature is available in the following commands: | ||
|
|
||
| - `cbatch` - Batch job submission | ||
| - `calloc` - Interactive resource allocation | ||
| - `crun` - Interactive job execution | ||
|
|
||
| ## Command Line Parameter | ||
|
|
||
| ```bash | ||
| --dependency, -d <dependency_string> | ||
| ``` | ||
|
|
||
| Use the `--dependency` or `-d` parameter when submitting a job to specify dependency relationships. | ||
|
|
||
| ## Dependency String Format | ||
|
|
||
| ### Basic Syntax | ||
|
|
||
| ``` | ||
| <type>:<job_id>[+<delay>][:<job_id>[+<delay>]]... | ||
| ``` | ||
|
|
||
| ### Dependency Types | ||
|
|
||
| | Type | Description | Trigger Condition | | ||
| |------|-------------|-------------------| | ||
| | `after` | Start after specified job begins or is cancelled | Dependent job leaves Pending state | | ||
| | `afterok` | Start after specified job succeeds | Dependent job completes with exit code 0 | | ||
| | `afternotok` | Start after specified job fails | Dependent job completes with non-zero exit code (including timeout, node errors, etc.) | | ||
| | `afterany` | Start after specified job completes | Dependent job ends (regardless of success or failure) | | ||
|
|
||
| ### Delay Time | ||
|
|
||
| Optional delay parameter, supporting the following formats: | ||
|
|
||
| - **Plain numbers** (default unit is minutes) | ||
| - Example: `10` = 10 minutes | ||
|
|
||
| - **Time with units** | ||
| - `s`, `sec`, `second`, `seconds` - seconds | ||
| - `m`, `min`, `minute`, `minutes` - minutes | ||
| - `h`, `hour`, `hours` - hours | ||
| - `d`, `day`, `days` - days | ||
| - `w`, `week`, `weeks` - weeks | ||
|
|
||
| !!! warning "Unsupported Format" | ||
| **Do NOT use** `HH:MM:SS` or `D-HH:MM:SS` format (e.g., `01:30:00` or `1-01:30:00`). The colon `:` character is reserved as the job ID separator, so such formats will be misinterpreted as multiple job IDs instead of delay time. This may either cause "duplicate task" errors or silently succeed with completely wrong dependency behavior. Always use time units instead (e.g., `90m` or `1h30m`). | ||
|
|
||
| ### Multiple Dependency Combinations | ||
|
|
||
| #### AND Logic (all conditions must be satisfied) | ||
|
|
||
| Use `,` to separate different dependency conditions: | ||
|
|
||
| ```bash | ||
| after:100,afterok:101 | ||
| ``` | ||
|
|
||
| The job will wait for job 100 to start **and** job 101 to complete successfully. | ||
|
|
||
| #### OR Logic (any condition satisfied) | ||
|
|
||
| Use `?` to separate different dependency conditions: | ||
|
|
||
| ```bash | ||
| afterok:100?afterok:101 | ||
| ``` | ||
|
|
||
| The job will start after job 100 **or** job 101 completes successfully. | ||
|
|
||
| !!! warning "Note" | ||
| You cannot mix `,` and `?` in the same dependency string. The system will return an error. | ||
|
|
||
| ## Usage Examples | ||
|
|
||
| ### 1. Basic Dependencies | ||
|
|
||
| ```bash | ||
| # Wait for job 100 to start before running | ||
| cbatch --dependency after:100 my_script.sh | ||
|
|
||
| # Wait for job 100 to complete successfully before running | ||
| cbatch --dependency afterok:100 my_script.sh | ||
|
|
||
| # Wait for job 100 to fail before running | ||
| cbatch --dependency afternotok:100 my_script.sh | ||
|
|
||
| # Wait for job 100 to complete before running (regardless of success or failure) | ||
| cbatch --dependency afterany:100 my_script.sh | ||
| ``` | ||
|
|
||
| ### 2. Dependencies with Delays | ||
|
|
||
| ```bash | ||
| # Wait for job 100 to complete successfully, then delay 30 minutes before running | ||
| cbatch --dependency afterok:100+30 my_script.sh | ||
|
|
||
| # Wait for job 100 to complete successfully, then delay 10 seconds before running | ||
| cbatch --dependency afterok:100+10s my_script.sh | ||
|
|
||
| # Delay for 1 hour 30 minutes (use unit-based format) | ||
| cbatch --dependency afterok:100+90m my_script.sh | ||
| ``` | ||
|
|
||
| ### 3. Multiple Dependencies | ||
|
|
||
| ```bash | ||
| # Wait for job 100 to start AND jobs 101, 102 to both complete successfully | ||
| cbatch --dependency after:100,afterok:101:102 my_script.sh | ||
|
|
||
| # Wait for job 100 to start for 10 minutes AND job 101 to complete successfully for 30 minutes | ||
| cbatch --dependency after:100+10m,afterok:101+30m my_script.sh | ||
|
|
||
| # Wait for job 100 to succeed OR job 101 to fail | ||
| cbatch --dependency afterok:100?afternotok:101 my_script.sh | ||
|
|
||
| # Wait for jobs 100, 101 to both succeed with 2 hour delay, or job 102 to start immediately | ||
| cbatch --dependency afterok:100:101+2h?after:102 my_script.sh | ||
| ``` | ||
|
|
||
| ### 4. Using in Batch Scripts | ||
|
|
||
| You can also use the `#CBATCH` directive in batch scripts: | ||
|
|
||
| ```bash | ||
| #!/bin/bash | ||
| #CBATCH --dependency afterok:100 | ||
| #CBATCH --nodes 2 | ||
| #CBATCH --time 1:00:00 | ||
| #CBATCH --output job-%j.out | ||
|
|
||
| echo "This job starts after job 100 completes successfully" | ||
| # Your job code | ||
| ``` | ||
|
|
||
| ### 5. Using in Interactive Commands | ||
|
|
||
| ```bash | ||
| # Using dependency with calloc | ||
| calloc --dependency afterok:100 -n 4 -N 2 | ||
|
|
||
| # Using dependency with crun | ||
| crun --dependency after:100 -n 1 hostname | ||
| ``` | ||
|
|
||
| ## Viewing Dependency Status | ||
|
|
||
| Use the `ccontrol show job <job_id>` command to view job dependency status: | ||
|
|
||
| ```bash | ||
| ccontrol show job 105 | ||
| ``` | ||
|
|
||
| ### Output Example | ||
|
|
||
| ``` | ||
| JobId=105 | ||
| ... | ||
| Dependency=PendingDependencies=afterok:100+01:00:00 Status=WaitForAll | ||
| ``` | ||
|
|
||
| ### Dependency Status Field Descriptions | ||
|
|
||
| | Field | Description | | ||
| |-------|-------------| | ||
| | `PendingDependencies` | Dependencies not yet triggered | | ||
| | `DependencyStatus` | Dependency satisfaction status (see table below) | | ||
|
|
||
| ### Dependency Status Values | ||
|
|
||
| | Status | Description | | ||
| |--------|-------------| | ||
| | `WaitForAll` | Waiting for all dependencies to be satisfied (AND logic) | | ||
| | `WaitForAny` | Waiting for any dependency to be satisfied (OR logic) | | ||
| | `ReadyAfter <time>` | Will be ready after the specified time | | ||
| | `SomeFailed` | Some dependencies failed (AND logic, cannot be satisfied) | | ||
| | `AllFailed` | All dependencies failed (OR logic, cannot be satisfied) | | ||
|
|
||
| ## Error Handling | ||
|
|
||
| The system will return errors in the following situations: | ||
|
|
||
| | Error Condition | Description | Example | | ||
| |----------------|-------------|---------| | ||
| | Mixed separators | Cannot use `,` and `?` together | `afterok:100,afterok:101?afterok:102` | | ||
| | Format error | Dependency string doesn't conform to syntax | `afterok:` or `after100` | | ||
| | Invalid delay format | Delay time format is incorrect | `afterok:100+invalid` | | ||
| | Duplicate dependency | Same job ID appears multiple times | `afterok:100:100` | | ||
| | Job ID doesn't exist or ended | Dependent job doesn't exist (runtime check) | `afterok:99999` | | ||
| | Unsupported time format | Using `:` in delay (misinterpreted as job IDs) | `after:1+00:00:01` or `after:1+00:00:02` (parsed as multiple job IDs, may or may not error) | | ||
|
|
||
| ### Error Examples | ||
|
|
||
| ```bash | ||
| # Error: Mixed AND and OR separators | ||
| $ cbatch --dependency afterok:100,afterok:101?afterok:102 job.sh | ||
|
|
||
| # Error: Invalid delay format | ||
| $ cbatch --dependency afterok:100+invalid job.sh | ||
|
|
||
| # Error: Duplicate job ID | ||
| $ cbatch --dependency afterok:100,afternotok:100 job.sh | ||
|
|
||
| # Error: Using colon in delay time (misinterpreted as job ID separator) | ||
| $ cbatch --dependency after:1+00:00:01 job.sh | ||
| # This will be parsed as: after jobs 1, 00, 00, 01 | ||
| # Error message: "duplicate task 1 in dependencies" (job 1 appears twice) | ||
|
|
||
| $ cbatch --dependency after:1+00:00:02 job.sh | ||
| # This will be parsed as: after jobs 1, 00, 00, 02 | ||
| # May succeed but with wrong behavior (waits for jobs 00 and 02, ignores the 1-second delay) | ||
|
|
||
| # Correct: Always use time units | ||
| $ cbatch --dependency after:1+1s job.sh | ||
| $ cbatch --dependency after:1+2s job.sh | ||
| ``` | ||
|
|
||
| ## Related Commands | ||
|
|
||
| - [cbatch](../command/cbatch.md) - Submit batch jobs | ||
| - [crun](../command/crun.md) - Run interactive tasks | ||
| - [calloc](../command/calloc.md) - Allocate resources and create interactive shell | ||
| - [cqueue](../command/cqueue.md) - View job queue status | ||
| - [ccontrol](../command/ccontrol.md) - Control jobs and system resources | ||
| - [ccancel](../command/ccancel.md) - Cancel jobs |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,54 @@ | ||
| # Job Pending Reasons | ||
|
|
||
| ## Overview | ||
|
|
||
| When a job is in PENDING (queued) state, the system displays the reason why the job cannot run immediately. You can view the pending reason using `cqueue` or `ccontrol show job` commands to understand why the job is waiting. | ||
|
|
||
| ## Viewing Pending Reasons | ||
|
|
||
| ### Using cqueue | ||
|
|
||
| ```bash | ||
| cqueue | ||
| ``` | ||
|
|
||
| Example output: | ||
| ``` | ||
| JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) | ||
| 101 CPU job1 user1 PD 0:00 2 (Priority) | ||
| 102 CPU job2 user1 PD 0:00 4 (Resource) | ||
| 103 GPU job3 user2 PD 0:00 1 (Dependency) | ||
| 104 CPU job4 user1 PD 0:00 2 (Held) | ||
| ``` | ||
|
|
||
| ### Using ccontrol show job | ||
|
|
||
| ```bash | ||
| ccontrol show job 101 | ||
| ``` | ||
|
|
||
| Example output: | ||
| ``` | ||
| JobId=101 | ||
| ... | ||
| State=PENDING | ||
| Reason=Priority | ||
| ``` | ||
|
|
||
| ## Pending Reason Descriptions | ||
|
|
||
| Pending reasons are listed in judgment order from top to bottom. If a job satisfies multiple conditions simultaneously, the reason that appears first will be displayed. | ||
|
|
||
| | Reason | Description | When It Appears | | ||
| |--------|-------------|-----------------| | ||
| | `Held` | Job is held | Job was submitted in held state or set to held, requires manual release | | ||
| | `BeginTime` | Start time not reached | Job has a delayed start time (`--begin` parameter), waiting for specified time | | ||
| | `DependencyNeverSatisfied` | Dependency can never be satisfied | Required dependent job to succeed, but it actually failed, dependency conditions cannot be met | | ||
| | `Dependency` | Waiting for dependency | Job dependencies have not been satisfied (dependent jobs not completed, not started, etc.) | | ||
| | `Resource changed` | Resource configuration changed | Node resources changed during job scheduling, waiting for rescheduling | | ||
| | `Reservation deleted` | Reservation was deleted | Reservation originally allocated to the job has been deleted | | ||
| | `Reservation changed` | Reservation was changed | Reservation changed during scheduling, waiting for rescheduling | | ||
| | `License` | Insufficient licenses | Currently insufficient license resources requested by the job | | ||
| | `Resource` | Insufficient resources | Cluster does not have enough resources (CPU, memory, GPU, etc.) to satisfy job requirements | | ||
| | `Resource Reserved` | Resources are reserved | Resources needed by the job are reserved by other reservations in future time periods | | ||
| | `Priority` | Insufficient priority | Job priority is lower than other queued jobs, or concurrent job limit reached | | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add fenced-code languages for example outputs (markdownlint MD040)
The two example output blocks currently use bare code fences, which triggers MD040 and is inconsistent with the rest of the page.
You can fix this by tagging them as plain text, e.g.:
@@
-
+textJobId=101
...
State=PENDING
Reason=Priority
🧰 Tools
🪛 markdownlint-cli2 (0.18.1)
16-16: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
31-31: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents