-
Notifications
You must be signed in to change notification settings - Fork 5
docs: rewrite README, add CONTRIBUTING with conventional commits #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,29 +1,51 @@ | ||
| Create a git commit for the current staged/unstaged changes. | ||
| Create a git commit for the current staged/unstaged changes using Conventional Commits. | ||
|
|
||
| Rules: | ||
| 1. Run `git status` and `git diff` to understand what changed | ||
| 2. Stage relevant files (prefer specific files over `git add -A`) | ||
| 3. Write a SHORT commit message (max 50 chars for subject line, imperative mood) | ||
| 3. Write a commit message following Conventional Commits format | ||
| 4. Use `--signoff` to sign off using the committer's git config (do NOT hardcode any name/email) | ||
| 5. Do NOT add Co-Authored-By or any other trailers | ||
| 5. Do NOT add Co-Authored-By or any other trailers beyond Signed-off-by | ||
| 6. If there are no changes, say so and stop | ||
|
|
||
| Commit format: | ||
| ``` | ||
| git commit --signoff -m "short imperative message" | ||
| <type>[optional scope]: <description> | ||
| ``` | ||
|
|
||
| The `--signoff` flag automatically uses the name and email from `git config user.name` and `git config user.email`, so each collaborator's own identity is used. | ||
| Types: feat, fix, docs, style, refactor, perf, test, build, ci, chore | ||
|
|
||
| Scope (optional): a noun describing the affected area in parentheses. | ||
| Common scopes: config, submit, allocator, event-watcher, e2e, docker, ci. | ||
|
|
||
| Subject line rules: | ||
| - Imperative mood ("add", not "added" or "adds") | ||
| - Do NOT capitalize the first letter after the type prefix | ||
| - Do NOT end with a period | ||
| - Keep under 72 characters total | ||
|
|
||
| For breaking changes, add `!` after the type/scope: | ||
| ``` | ||
| feat(config)!: rename queue config key | ||
| ``` | ||
|
|
||
| Examples of good messages: | ||
| - "add user and permission models" | ||
| - "implement executor allocation logic" | ||
| - "fix gang scheduling annotation prefix" | ||
| - "add table-driven tests for config parsing" | ||
| - "feat(allocator): support configurable batch size" | ||
| - "fix(event-watcher): handle reconnection on stream error" | ||
| - "refactor(submit): extract TLS channel builder" | ||
| - "test(backend): add table-driven tests for state transitions" | ||
| - "docs: update architecture diagram in README" | ||
| - "build: bump armada-scala-client to 0.5.0" | ||
| - "chore: remove unused import in Config.scala" | ||
|
|
||
| Commit command: | ||
| ``` | ||
| git commit --signoff -m "<type>[scope]: <description>" | ||
| ``` | ||
|
|
||
| Do NOT: | ||
| - Use long descriptive messages | ||
| - Use long multi-line messages for simple changes | ||
| - Add Co-Authored-By trailers | ||
| - Use past tense ("added", "fixed") | ||
| - Prefix with type tags ("feat:", "fix:") unless asked | ||
| - Hardcode any author name or email | ||
| - Omit the type prefix |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| Create a GitHub issue markdown file from the details provided by the user. | ||
|
|
||
| The user will describe a bug, feature request, or discussion context (e.g. Slack conversations, error logs, reproduction steps). Your job is to turn that into a well-structured issue and save it to `plans/`. | ||
|
|
||
| Steps: | ||
| 1. Read the user's input to understand the problem, who reported it, and any logs or reproduction steps provided | ||
| 2. Investigate the codebase to identify relevant code paths, pinpoint where the issue likely originates, and gather context that would help a contributor understand the problem | ||
| 3. Write a concise GitHub issue markdown file and save it to `plans/issue-<short-slug>.md` | ||
|
|
||
| Issue format: | ||
| - Start with `## <title>` as the first line (this becomes the GitHub issue title) | ||
| - `### Problem` — 2-3 sentences explaining the issue and its impact | ||
| - `### Steps to Reproduce` — numbered list (if applicable) | ||
| - `### Expected Behavior` — 1-2 sentences | ||
| - `### Actual Behavior` — include relevant log snippets in code blocks, keep them short (trim stack traces to the key lines) | ||
| - `### Potential Root Cause` — based on your codebase investigation, explain where the issue likely lives. Link to source code using upstream GitHub URLs: `https://github.com/armadaproject/armada-spark/blob/master/...#L<start>-L<end>`. For Apache Spark references, link to the `apache/spark` repo at the relevant tag. | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. the master branch changes a lot. A "sha" based url like this: would be more stable/accurate.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried, but Claude occasionally generates the wrong SHA, causing broken links. It needs some investigation, but should be doable. |
||
| - `### Affected Files` — table with file links and one-line role descriptions | ||
| - `### Additional Context` — bullet points for version info, related code paths, or anything else useful | ||
|
|
||
| Style rules: | ||
| - Keep it concise — the whole issue should be under 80 lines | ||
| - Use code blocks sparingly — only for key log lines and small code snippets | ||
| - Do not propose fixes or implementation approaches | ||
| - Do not add Labels, Description, or Title as separate sections — the `##` heading is the title | ||
| - Write for a contributor who knows the project but hasn't seen this specific bug | ||
| - No emojis | ||
|
|
||
| $ARGUMENTS | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # Contributing to armada-spark | ||
|
|
||
| ## Development Setup | ||
|
|
||
| **Prerequisites:** Java 17, Maven 3.9.6+, Scala 2.13.8 | ||
|
|
||
| ```bash | ||
| mvn clean package # Build with tests | ||
| mvn test # Run unit tests only | ||
| mvn spotless:check # Check formatting | ||
| mvn spotless:apply # Auto-fix formatting | ||
| scripts/dev-e2e.sh # Run E2E tests (requires a running Armada cluster) | ||
|
|
||
| # Target a different Spark/Scala version (e.g., Spark 3.3.4, Scala 2.12.15) | ||
| ./scripts/set-version.sh 3.3.4 2.12.15 | ||
| ``` | ||
|
|
||
| ## Commit Conventions | ||
|
|
||
| This project follows [Conventional Commits v1.0.0](https://www.conventionalcommits.org/en/v1.0.0/). All commits must be signed off: | ||
|
|
||
| ```bash | ||
| git commit --signoff -m "feat(allocator): support configurable batch size" | ||
| ``` | ||
|
|
||
| Common scopes: `config`, `submit`, `allocator`, `event-watcher`, `e2e`, `docker`, `ci`. | ||
|
|
||
| ## Pull Requests | ||
|
|
||
| 1. Branch from `master` (`git checkout -b feat/my-feature master`) | ||
| 2. Format code (`mvn spotless:apply`) and run tests (`mvn test`) | ||
| 3. Commit using conventional commits with `--signoff` | ||
| 4. Open a PR against `master` — the PR title should also follow conventional commit format | ||
|
|
||
| ### PR Checklist | ||
|
|
||
| - [ ] `mvn clean package` compiles | ||
| - [ ] `mvn test` passes | ||
| - [ ] `mvn spotless:check` passes | ||
| - [ ] Commits follow [Conventional Commits](https://www.conventionalcommits.org/en/v1.0.0/) | ||
| - [ ] New code includes tests | ||
|
|
||
| CI runs lint, build (Spark 3.3/3.5 x Scala 2.12/2.13), snapshots, and E2E tests on every PR. All checks must pass. | ||
|
|
||
| ## Coding Standards | ||
|
|
||
| - **Scalafmt** enforced (100-col limit) — Apache 2.0 license header required on all source files | ||
| - **Naming:** `PascalCase` classes, `camelCase` methods, `UPPER_SNAKE_CASE` config constants, `{ClassName}Suite` test files | ||
| - **Patterns:** `Option`/`Try` over null/exceptions, `private[spark]` scoped visibility, Spark's `Logging` trait | ||
| - **Imports:** Java/javax → Scala stdlib → Third-party → Spark | ||
| - **Testing:** ScalaTest `AnyFunSuite` + Mockito (`mock(classOf[Type])`), `TableDrivenPropertyChecks` for parameterized tests | ||
| - **Version-specific code** lives in `src/main/scala-spark-{3.3,3.5,4.1}/` — changes to submission logic likely need updates in all version directories | ||
|
|
||
| ## Working with Claude Code (Optional) | ||
|
|
||
| This project includes a [Claude Code](https://docs.anthropic.com/en/docs/claude-code) setup. Shared config (`.claude/settings.json`, `.claude/commands/`, `CLAUDE.md`) is checked into git. Hooks auto-format code and verify builds. | ||
|
|
||
| | Command | What it does | | ||
| |--------------|-------------------------------------------------| | ||
| | `/build` | Build the project (`/build fast` to skip tests) | | ||
| | `/lint` | Check and auto-fix formatting | | ||
| | `/commit` | Create a conventional commit | | ||
| | `/ci-local` | Run the full CI pipeline locally | | ||
| | `/summary` | Generate a PR description from your branch | | ||
| | `/issue` | Draft a GitHub issue from bug details/logs | | ||
|
|
||
| **Optional plugins:** Copy `.claude/settings.local.example.json` to `.claude/settings.local.json` and restart. Source: https://github.com/wshobson/agents | ||
|
|
||
| ## Getting Help | ||
|
|
||
| - Open a GitHub issue for bugs or feature requests | ||
| - For Armada questions, see [armadaproject.io](https://armadaproject.io/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason not to use the "gh" command to create the actual issue in https://github.com/G-Research/spark/issues rather than just a markdown file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I considered that as well. However, since we create issues across different boards/projects, sometimes in armada-spark and sometimes in gr/spark, I kept it simple for now and generated a markdown file instead.