armadaproject · sudiptob2 · Feb 12, 2026 · Feb 9, 2026 · Feb 9, 2026 · Feb 10, 2026
diff --git a/.claude/commands/build.md b/.claude/commands/build.md
@@ -0,0 +1,17 @@
+Build the project.
+
+If argument is "quick" or "fast", skip tests:
+```
+mvn clean package -DskipTests
+```
+
+Otherwise run the full build with tests:
+```
+mvn clean package
+```
+
+Rules:
+1. Before building, run `mvn spotless:check` first — if formatting fails, run `mvn spotless:apply` and report which files were fixed
+2. If the build fails, read the error output and provide a clear summary of what went wrong
+3. Do NOT automatically fix build errors — report them and let the user decide
+4. Show the final artifact path on success (target/*.jar)
diff --git a/.claude/commands/ci-local.md b/.claude/commands/ci-local.md
@@ -0,0 +1,26 @@
+Run the full CI pipeline locally to verify everything passes before pushing.
+
+This mirrors what GitHub Actions runs. Execute these steps in order, stopping on first failure:
+
+1. **Lint**: `mvn spotless:check`
+   - If it fails, ask the user if they want to auto-fix with `mvn spotless:apply`
+
+2. **Compile**: `mvn compile`
+   - Report any compilation errors clearly
+
+3. **Test**: `mvn test`
+   - Summarize test results (total, passed, failed, skipped)
+
+4. **Package**: `mvn package -DskipTests`
+   - Confirm the JAR was built successfully
+
+Report a final summary:
+```
+CI Local Results:
+  Lint:    PASS/FAIL
+  Compile: PASS/FAIL
+  Test:    PASS/FAIL (X passed, Y failed)
+  Package: PASS/FAIL
+```
+
+Do NOT continue to the next step if any step fails — report the failure and stop.
diff --git a/.claude/commands/commit.md b/.claude/commands/commit.md
@@ -0,0 +1,29 @@
+Create a git commit for the current staged/unstaged changes.
+
+Rules:
+1. Run `git status` and `git diff` to understand what changed
+2. Stage relevant files (prefer specific files over `git add -A`)
+3. Write a SHORT commit message (max 50 chars for subject line, imperative mood)
+4. Use `--signoff` to sign off using the committer's git config (do NOT hardcode any name/email)
+5. Do NOT add Co-Authored-By or any other trailers
+6. If there are no changes, say so and stop
+
+Commit format:
+```
+git commit --signoff -m "short imperative message"
+```
+
+The `--signoff` flag automatically uses the name and email from `git config user.name` and `git config user.email`, so each collaborator's own identity is used.
+
+Examples of good messages:
+- "add user and permission models"
+- "implement executor allocation logic"
+- "fix gang scheduling annotation prefix"
+- "add table-driven tests for config parsing"
+
+Do NOT:
+- Use long descriptive messages
+- Add Co-Authored-By trailers
+- Use past tense ("added", "fixed")
+- Prefix with type tags ("feat:", "fix:") unless asked
+- Hardcode any author name or email
diff --git a/.claude/commands/lint.md b/.claude/commands/lint.md
@@ -0,0 +1,11 @@
+Check and fix code formatting using Spotless/Scalafmt.
+
+Steps:
+1. Run `mvn spotless:check` to see if there are formatting violations
+2. If violations are found, run `mvn spotless:apply` to auto-fix them
+3. After applying, run `git diff --stat` to show what files were reformatted
+4. Summarize the changes (which files, what kind of formatting was fixed)
+
+If no violations are found, say so and stop.
+
+Do NOT commit the formatting changes — just apply and report.
diff --git a/.claude/commands/summary.md b/.claude/commands/summary.md
@@ -0,0 +1,27 @@
+Generate a concise implementation summary for a PR description.
+
+Steps:
+1. Run `git diff master...HEAD --stat` and `git log master..HEAD --oneline` to understand all changes on this branch
+2. Read the changed files to understand what was implemented and why
+3. Write a summary suitable for a GitHub PR description
+
+Summary format rules:
+- Start with a one-line "What" statement explaining the change
+- Follow with a "Why" section (2-3 sentences max) explaining the motivation
+- List the key changes as plain bullet points (no nested bullets)
+- If there are new tests, mention what they cover in one line
+- End with a "How to verify" section with concrete steps if applicable
+- Keep the total summary under 30 lines
+- Use plain text with minimal markdown (no tables, no headers larger than ##, no code blocks unless showing a command)
+- Do not repeat file paths or class names unnecessarily
+- Focus on behavior changes, not implementation details
+- Write in present tense, active voice
+
+Do NOT:
+- Use heavy markdown formatting (no bold, no tables, no badges)
+- List every single file changed
+- Include generic boilerplate like "This PR adds..."
+- Add emojis
+- Over-explain things that are obvious from the diff
+
+Output the summary directly so the user can copy-paste it into the PR description.
diff --git a/.claude/hooks/spotless-apply.sh b/.claude/hooks/spotless-apply.sh
@@ -0,0 +1,18 @@
+#!/bin/bash
+# Auto-format Scala files after Edit/Write using Spotless
+INPUT=$(cat)
+FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
+
+# Only run for Scala source files
+if [[ "$FILE_PATH" != *.scala ]]; then
+  exit 0
+fi
+
+RESULT=$(cd "$CLAUDE_PROJECT_DIR" && mvn spotless:apply -q 2>&1)
+EXIT_CODE=$?
+
+if [ $EXIT_CODE -eq 0 ]; then
+  echo "{\"systemMessage\": \"Spotless formatting applied successfully\"}"
+else
+  echo "{\"systemMessage\": \"Spotless formatting failed: $RESULT\"}"
+fi
diff --git a/.claude/hooks/verify-build.sh b/.claude/hooks/verify-build.sh
@@ -0,0 +1,23 @@
+#!/bin/bash
+# Verify the project compiles before Claude stops
+INPUT=$(cat)
+STOP_HOOK_ACTIVE=$(echo "$INPUT" | jq -r '.stop_hook_active // false')
+
+# Prevent infinite loops - skip if already triggered by a Stop hook
+if [ "$STOP_HOOK_ACTIVE" = "true" ]; then
+  echo '{"systemMessage": "Skipped build verification (stop hook already active)"}'
+  exit 0
+fi
+
+cd "$CLAUDE_PROJECT_DIR"
+RESULT=$(mvn compile -q 2>&1)
+EXIT_CODE=$?
+
+if [ $EXIT_CODE -ne 0 ]; then
+  ESCAPED_RESULT=$(echo "$RESULT" | head -20 | jq -Rs .)
+  echo "{\"systemMessage\": \"Build failed. Fix compilation errors before finishing: ${ESCAPED_RESULT}\"}" >&2
+  exit 2
+fi
+
+echo '{"systemMessage": "Build verification passed"}'
+exit 0
diff --git a/.claude/settings.json b/.claude/settings.json
@@ -0,0 +1,93 @@
+{
+  "permissions": {
+    "allow": [
+      "Read",
+      "Edit",
+      "Write",
+      "Glob",
+      "Grep",
+      "Bash(mvn *)",
+      "Bash(mvn)",
+      "Bash(./scripts/*)",
+      "Bash(git status)",
+      "Bash(git status *)",
+      "Bash(git diff)",
+      "Bash(git diff *)",
+      "Bash(git log *)",
+      "Bash(git log)",
+      "Bash(git add *)",
+      "Bash(git commit *)",
+      "Bash(git checkout *)",
+      "Bash(git branch *)",
+      "Bash(git branch)",
+      "Bash(git switch *)",
+      "Bash(git merge *)",
+      "Bash(git stash *)",
+      "Bash(git stash)",
+      "Bash(git tag *)",
+      "Bash(git push *)",
+      "Bash(git push)",
+      "Bash(git pull *)",
+      "Bash(git pull)",
+      "Bash(git fetch *)",
+      "Bash(git fetch)",
+      "Bash(git remote *)",
+      "Bash(git config *)",
+      "Bash(gh *)",
+      "Bash(ls *)",
+      "Bash(ls)",
+      "Bash(mkdir *)",
+      "Bash(which *)",
+      "Bash(java *)",
+      "Bash(scala *)",
+      "Bash(docker *)"
+    ],
+    "deny": [
+      "Bash(rm -rf *)",
+      "Bash(git reset --hard *)",
+      "Bash(git push --force *)",
+      "Bash(git push -f *)",
+      "Bash(git clean -f *)"
+    ]
+  },
+  "hooks": {
+    "PostToolUse": [
+      {
+        "matcher": "Edit|Write",
+        "hooks": [
+          {
+            "type": "command",
+            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/spotless-apply.sh",
+            "timeout": 120,
+            "statusMessage": "Formatting with Spotless...",
+            "async": true
+          }
+        ]
+      }
+    ],
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "\"$CLAUDE_PROJECT_DIR\"/.claude/hooks/verify-build.sh",
+            "timeout": 300,
+            "statusMessage": "Verifying build compiles..."
+          }
+        ]
+      }
+    ]
+  },
+  "enabledPlugins": {
+    "backend-development@claude-code-workflows": true,
+    "code-review@claude-plugins-official": true,
+    "code-simplifier@claude-plugins-official": true,
+    "cicd-automation@claude-code-workflows": true,
+    "code-documentation@claude-code-workflows": true,
+    "documentation-generation@claude-code-workflows": true,
+    "unit-testing@claude-code-workflows": true,
+    "debugging-toolkit@claude-code-workflows": true,
+    "error-debugging@claude-code-workflows": true,
+    "dependency-management@claude-code-workflows": true
+  }
+}
diff --git a/.gitignore b/.gitignore
@@ -83,3 +83,6 @@ scripts/.tmp/
 
 # Jupyter
 example/jupyter/workspace/
+
+# Claude Code local settings (personal per-user config)
+.claude/settings.local.json
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -0,0 +1,132 @@
+# CLAUDE.md - armada-spark
+
+## Project Overview
+
+Apache Spark plugin that integrates with [Armada](https://armadaproject.io/), a multi-cluster Kubernetes batch scheduler. Implements Spark's `ExternalClusterManager` SPI to submit and manage Spark jobs via Armada's gRPC API.
+
+## Build & Run
+
+```bash
+# Build
+mvn clean package
+
+# Run tests
+mvn test
+
+# Lint check / auto-fix
+mvn spotless:check
+mvn spotless:apply
+
+# Set Spark/Scala versions (e.g., Spark 3.5.5, Scala 2.13.8)
+./scripts/set-version.sh 3 5 5 2 13 8
+```
+
+**Stack:** Scala 2.13 | Maven | Spark 3.5 | Java 17 | Fabric8 Kubernetes Client | gRPC/Protobuf (via armada-scala-client)
+
+## Project Structure
+
+```
+src/main/scala/org/apache/spark/
+├── deploy/armada/              # Configuration & job submission
+│   ├── Config.scala            # All spark.armada.* config entries (ConfigBuilder API)
+│   ├── DeploymentModeHelper.scala
+│   ├── submit/                 # Job submission pipeline
+│   │   ├── ArmadaClientApplication.scala   # Main submission logic
+│   │   ├── PodSpecConverter.scala          # Fabric8 <-> Protobuf conversion
+│   │   ├── PodMerger.scala                 # JSON deep merge for pod specs
+│   │   └── ...
+│   └── validators/K8sValidator.scala
+└── scheduler/cluster/armada/   # Cluster manager & scheduling
+    ├── ArmadaClusterManager.scala          # ExternalClusterManager SPI entry point
+    ├── ArmadaClusterManagerBackend.scala    # Executor lifecycle management
+    ├── ArmadaEventWatcher.scala            # gRPC event stream processing
+    └── ArmadaExecutorAllocator.scala       # Dynamic allocation
+```
+
+Version-specific sources live in `src/main/scala-spark-{version}/`.
+
+## Code Style
+
+- **Formatter:** Scalafmt 3.9.5 (enforced by Spotless Maven plugin)
+- **Max line length:** 100 columns
+- **Alignment:** `align.preset = more`
+- **Dialect:** scala213
+- Always run `mvn spotless:apply` before committing
+
+### Naming Conventions
+
+- Classes/traits: `PascalCase` (e.g., `ArmadaClusterManager`)
+- Methods/variables: `camelCase`
+- Config constants: `UPPER_SNAKE_CASE` (e.g., `ARMADA_JOB_QUEUE`)
+- Test files: `{ClassName}Suite.scala`
+
+### Scala Patterns Used
+
+- **Scoped visibility:** `private[spark]` for package-private classes, `private[submit]` / `private[armada]` for internal APIs
+- **Case classes** for data types (e.g., `ClientArguments`, `CLIConfig`, `ResourceConfig`)
+- **Companion objects** for factory methods and constants
+- **Option/Try monads** over null/exceptions; `NonFatal` for catch blocks
+- **For-comprehensions** for chained Option/Try operations
+- **Call-by-name parameters** (`=> Option[T]`) for lazy evaluation
+- **`scala.jdk.CollectionConverters._`** for Java/Scala interop (`.asScala` / `.asJava`)
+- **Spark's `Logging` trait** for all logging (`logInfo`, `logWarning`, `logDebug`)
+- **Spark's `ConfigBuilder` API** for all configuration entries in `Config.scala`
+
+### Import Order
+
+1. Java/javax imports
+2. Scala stdlib imports
+3. Third-party imports (io.armadaproject, io.fabric8, com.fasterxml)
+4. Spark imports (org.apache.spark)
+
+### License Header
+
+All source files must include the Apache 2.0 license header (see any existing file).
+
+## Testing Standards
+
+- **Framework:** ScalaTest 3.2.16 (`AnyFunSuite` style exclusively)
+- **Mocking:** Mockito 5.12 (`mock(classOf[...])`, `when(...).thenReturn(...)`)
+- **Assertions:** ScalaTest matchers (`shouldBe`, `shouldEqual`, `should contain`)
+
+### Test Patterns
+
+```scala
+// Standard test class structure
+class FooSuite extends AnyFunSuite with BeforeAndAfter with Matchers {
+  before { /* setup */ }
+  after  { /* cleanup */ }
+  test("description of behavior") { /* assertions */ }
+}
+
+// Table-driven property tests (preferred for parameterized cases)
+class BarSuite extends AnyFunSuite with TableDrivenPropertyChecks with Matchers {
+  test("validates multiple inputs") {
+    val testCases = Table(("input", "expected"), ("a", true), ("", false))
+    forAll(testCases) { (input, expected) =>
+      validate(input) shouldBe expected
+    }
+  }
+}
+```
+
+- Use `BeforeAndAfter` or `BeforeAndAfterEach` for fixtures (temp files, mocks)
+- Use `TableDrivenPropertyChecks` for parameterized/data-driven tests
+- Mock SparkContext/SparkConf rather than creating real Spark sessions
+- Clean up temp files in `after` blocks
+- No shared base test class; use trait composition
+- E2E tests tagged with custom `E2ETest` ScalaTest tag (excluded from `mvn test`)
+
+## Agent Workflow
+
+**The main agent must act as an orchestrator.** Never do work inline that can be delegated to a subagent.
+
+- **Delegate everything:** Use the Task tool with specialized subagents for all research, code exploration, code writing, testing, and analysis. The main agent should plan, coordinate, and summarize — not do the work itself.
+- **Maximize parallelism:** Launch multiple subagents concurrently whenever their tasks are independent. For example, when exploring code patterns AND analyzing tests AND checking dependencies, spawn all three agents in a single message rather than sequentially. Always send independent Task calls in a **single message** with multiple tool-use blocks.
+- **Use the right agent type:** Pick `Explore` for codebase search/understanding, `Plan` for architecture decisions, `Bash` for commands, and specialized agents (e.g., `code-reviewer`, `test-automator`, `debugger`) when they match the task.
+- **Keep the main context clean:** Offload large file reads, multi-file searches, and deep analysis to subagents so the main conversation stays focused on coordination and user communication.
+- **Hooks run automatically — use subagents to respond:** When a hook (Spotless, build verification, code review, or simplification) reports an issue, delegate the fix to a subagent rather than doing it inline. If multiple hooks fail simultaneously, spawn parallel subagents to address each issue concurrently.
+
+## CI/CD
+
+GitHub Actions with matrix builds across Spark 3.3/3.5/4.1 and Scala 2.12/2.13. Pipeline: lint -> build -> e2e tests.