docs: document AI contribution policy and agent guidelines#27
docs: document AI contribution policy and agent guidelines#27SamBarker wants to merge 11 commits intokroxylicious:mainfrom
Conversation
Sets out the project's position on AI-assisted contributions: contributors may use AI tools, but they own what they submit, must understand it, and must disclose significant AI usage. Also introduces the concept of AGENTS.md files in repositories. Closes kroxylicious#26 Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds guidance on using an Assisted-by trailer in commit messages to identify the AI tool and model used. The trailer is intended to be populated by the tooling itself, with AGENTS.md providing tool-specific configuration details. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds a brief 'About the Project' section noting Kroxylicious is a Java project built with Apache Maven. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Provides AI coding tools with process expectations including DCO sign-off, Assisted-by trailers, commit discipline, and pull request review requirements. Clarifies that human committer review and merge decisions are not substituted by AI reviews. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds language to both CONTRIBUTING.md and AGENTS.md clarifying that AI-assisted reviews supplement but do not substitute for Committer review, and that merge decisions follow the project's decision making framework. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds guidance on commit messages (why not what), cohesive PRs, PR descriptions focused on problems and trade-offs, and naming conventions that prefer intent over encoded logic. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
|
@k-wall's issue gave the following reasons for wanting an AI contribution policy:
I think there are broadly two way we can look at those things:
Finally, It's not clear to me that |
|
I like the suggestions that @tombentley is making under the 2) bullet. |
Explicitly requires AI-generated content must not reproduce copyrighted material and that contributors enable available controls to reduce that risk. Adds that PRs may be closed where the contributor does not appear to understand their submission. Adds matching copyright instruction to AGENTS.md. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Adds conciseness requirement to both CONTRIBUTING.md and AGENTS.md. Adds PR review guidance that unfocused or oversized PRs may be closed and the contributor asked to break them down. These apply to all contributions regardless of how they were produced. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Routine IDE-like AI features (code completion, spelling) do not require disclosure. Disclosure is expected when AI generates substantial content such as functions, tests, or documentation. Assisted-by: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
CONTRIBUTING.md
Outdated
| Commits should include an `Assisted-by` trailer identifying the tool and model used (e.g. `Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>`). | ||
| Most AI coding tools can be configured to add this automatically — see the repository's `AGENTS.md` for details. | ||
| Use of AI features in the same way you would use an IDE — code completion, spelling, and the like — does not require disclosure. | ||
| Disclosure is expected when AI tools are used to generate substantial content such as functions, tests, documentation, or design approaches. |
There was a problem hiding this comment.
Open question: AI-assisted thinking vs AI-assisted production
One scenario worth considering: a contributor discusses design options with an AI tool but then writes the code and PR themselves, without the AI being directly involved in producing the contribution.
Under this policy, we don't think this requires disclosure. The contributor understood the problem, evaluated the options, and wrote the code — the AI influenced their thinking in much the same way that reading a blog post, discussing ideas with a colleague, or whiteboarding a design would. The policy is concerned with AI tools producing the content of a contribution, not with how a contributor arrived at their ideas.
This also helps clarify the intent behind "played a significant role in producing a contribution" — it's about the production of the submitted content, not about the contributor's broader learning or decision-making process.
Does this reading match others' expectations, or should the policy say something explicit about this distinction?
There was a problem hiding this comment.
Does this reading match others' expectations, or should the policy say something explicit about this distinction?
It matches my expectations. No need to say anything explicit.
AGENTS.md
Outdated
| ### Assisted-by Trailer | ||
|
|
||
| Commits produced with AI assistance must include an `Assisted-by` trailer identifying the tool and model. | ||
| The trailer should be added to the commit message body, after the sign-off: |
There was a problem hiding this comment.
| The trailer should be added to the commit message body, after the sign-off: | |
| Add the trailer to the commit message body after the sign-off: |
There was a problem hiding this comment.
Do we need to say that the project requires that this format is followed exactly?
There was a problem hiding this comment.
I think this is specific enough for the LLM, but influencing them is a bit tricky so alway open to suggestions
| <commit message> | ||
|
|
||
| Signed-off-by: Name <email> | ||
| Assisted-by: <Tool and model> <noreply@example.com> |
There was a problem hiding this comment.
Is the tool email to use clear?
Would people know what to add there beyond a placeholder
There was a problem hiding this comment.
the agents file is for the llm and it will know what to put, I think.
Address all suggestion-style comments from @PaulRMellor: tighten wording, prefer active/imperative voice, and remove vague phrasing. Also make the Maven build line in AGENTS.md actionable, and clarify the CONTRIBUTING.md project intro to focus on human readers. Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
Apply wording improvements suggested by @PaulRMellor: - Clarify "fully understand" in contribution requirement - Use present tense "play" instead of "played" for AI disclosure - Strengthen "should" to "must" for Assisted-by trailer - Simplify IDE comparison sentence - Strengthen "expected" to "required" for disclosure threshold - Rework licensing compliance sentence for clarity - "text" -> "content" and "verbosity" -> "detail" in conciseness bullet - "may" -> "can" for AGENTS.md repositories - Rephrase AGENTS.md access sentence, "norms" -> "conventions" Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Sam Barker <sam@quadrocket.co.uk>
| Automated or AI-assisted reviews, such as security or style checks, can supplement review but do not replace review by a Committer. | ||
| Committers make merge decisions following the project's [decision-making](./GOVERNANCE.md#decision-making) framework. | ||
| Pull requests must focus on a single goal and be sized for effective review. | ||
| We may close pull requests that are unfocused or too large to review effectively, and ask the contributor to break them into smaller, more reviewable changes. |
There was a problem hiding this comment.
"too large" is a big vague. It would be more helpful to the reader if we could explain what we mean in more concrete terms.
| Contributors can use AI tools, such as LLMs and code assistants, when preparing contributions to Kroxylicious. | ||
| As with any tool, the contributor is responsible for the quality of the result and for understanding what they submit. | ||
|
|
||
| You are responsible for understanding your contribution and ensuring that it meets project standards, regardless of the tools used. |
There was a problem hiding this comment.
"You" is assumed to be a human. We're basically assuming that there's a human in the loop for opening the PR. That might be what we want, or not. But either way we should be explicit about what our expectations are.
| * **You are the contributor.** When you sign off the [DCO](./DCO.txt), you certify the contribution as your own. | ||
| AI-generated or AI-assisted content does not change this obligation. |
There was a problem hiding this comment.
I think we can be a bit clearer on this point: The project requires DCO on all commits. The DCO requires that signoff is done by a legal person. An AI is not a legal entity so it cannot sign off. When a person signs off on the commit they're taking responsibility for an AI assistance included in that commit.
| * **You are the contributor.** When you sign off the [DCO](./DCO.txt), you certify the contribution as your own. | ||
| AI-generated or AI-assisted content does not change this obligation. | ||
| * **Understand your contribution.** You must have a clear understanding of what your contribution does and why. | ||
| Do not submit code, documentation, or other content that you do not fully understand. |
There was a problem hiding this comment.
| Do not submit code, documentation, or other content that you do not fully understand. | |
| Do not submit code, documentation, or other content that you do not fully understand. | |
| You should be able to answer reviewers' questions yourself, without recourse to AI. | |
| In particular, do not waste the time of other contributors by being a proxy between reviewers and an AI. |
| * **Understand your contribution.** You must have a clear understanding of what your contribution does and why. | ||
| Do not submit code, documentation, or other content that you do not fully understand. | ||
| * **Disclose AI usage.** If AI tools play a significant role in a contribution, note this in the pull request description. | ||
| Commits must include an `Assisted-by` trailer that identifies the tool and model used (for example, `Assisted-by: Claude Opus 4.6 <noreply@anthropic.com>`). |
There was a problem hiding this comment.
TBH I'm in two minds on this one. I see that by requiring this you're putting the onus on the human to own what they're pushing. But the fact is that some commits are going to be entirely generated by API, and actually Generated-by or Coauthored-by seem to be to be more appropriate.
| @@ -0,0 +1,68 @@ | |||
| # Kroxylicious AI Agent Guidelines | |||
|
|
|||
| This file provides guidance for AI coding tools, such as GitHub Copilot and Claude Code, when working in Kroxylicious repositories. | |||
There was a problem hiding this comment.
I'm confused. Is this file intended to be read by the AI, by the human, or both?
There was a problem hiding this comment.
The AI tools.
Readme/developer guides are for humans. Agents.md is for them (thus the content might overlap but its intended for different audiences).
There was a problem hiding this comment.
My understanding of a .github repo is that it serves as the default for other repos in the org. This is most relevant for those files which GitHub treats specially (like CONTRIBUTING.md). AFAIK AGENTS.md is not on that list (but maybe it will be added in the future).
So I don't know quite how to interpret this file:
- It is really intended to apply to all our other repos? Probably yes? But it's insufficient on its own to be useful.
- If yes, then do we know agents will actually find it (especially if the user has not checked out this repo).
- But if another repo has its own
AGENTS.mdfile (which by the first point is necessary), won't that override this file (at least from a GitHub PoV).
It all just seems confusing how the various parties (AI, GitHub, Humans) are supposed to construct the full AGENTS.md context.
| ### DCO Sign-off | ||
|
|
||
| All commits must be signed off with the Developer Certificate of Origin (DCO). | ||
| Use `git commit -s` to add the sign-off automatically. |
There was a problem hiding this comment.
Why repeat this here when it's mentioned in the CONTRIBUTING.md?
There was a problem hiding this comment.
We need to think what this means. My understanding is that an AI cannot do the DCO signoff, because it's not a legal entity. I would also contend that it should not be signing off on behalf of the person which we're assuming to be in the loop.
So the instruction should be the a coding agent or AI should not be signing off. And then we need a complementart instruction in CONTRIBUTING.md telling the human that they need to review and sign off on their genAI's commits.
There was a problem hiding this comment.
I agree that a human sign-off is necessary. From a legal perspective, it provides a much safer layer of accountability.
|
|
||
| ### Commit Discipline | ||
|
|
||
| - Each commit must be atomic and represent a single logical change. |
There was a problem hiding this comment.
What does 'atomic' mean in this context? It sounds like it has a technical meaning, but you don't explain what it is.
| ### Commit Discipline | ||
|
|
||
| - Each commit must be atomic and represent a single logical change. | ||
| - Keep commits small enough to be reviewed in a few minutes. |
There was a problem hiding this comment.
Do we do this ourselves? I've been seeing some humongous PRs recently >5kLOC.
There was a problem hiding this comment.
We are not great at it no, but the more nudges we have in the right direction the better.
There was a problem hiding this comment.
IMO PR size is not a good indicator of the time it takes to review.
Suggestion:
Focus on making commits small enough to be reviewed quickly, though keep in mind that logical complexity matters more than line count.
| ### Pull Requests | ||
|
|
||
| - A pull request should address a single cohesive goal. Do not bundle unrelated changes together — each PR should tell a clear story that a reviewer can follow from start to finish. | ||
| - Submit all changes as pull requests. |
There was a problem hiding this comment.
This feels like it should be the first item in the list. Or I think it could be taken as read.
|
This PR, not being in the kroxylicious repo, might not have crossed the radar of some people involved in the project. So I think it's worth getting wider community engagement on this PR via an email to the dev mailing list plus a mention on Slack. |
| - At least one human [Committer](./COMMITTERS.md) must review and approve a pull request before it is merged. | ||
| Automated or AI-assisted reviews, such as security or style checks, can supplement human review but do not replace it. | ||
| The decision to merge is always made by human committers following the project's [decision making](./GOVERNANCE.md#decision-making) framework. |
There was a problem hiding this comment.
This is really a policy we're applying to committers. It's not relevant for non-committer contributors whether/how AI might be used in code review, nor the gating requirements. So I think we can take this out.
|
|
||
| Do not reproduce copyrighted material in generated code, documentation, or other content. | ||
| If you are aware of controls or configuration that reduce or remove the risk of reproducing copyrighted content, ensure they are active. | ||
| All contributions must be compatible with the project's [license](./LICENSE). |
There was a problem hiding this comment.
We should also say something about the acceptable licenses of dependencies.
|
|
||
| ### Copyright and Licensing | ||
|
|
||
| Do not reproduce copyrighted material in generated code, documentation, or other content. |
There was a problem hiding this comment.
I think this is actually subtle, and also highlights a way that our own IP handling could be better.
I'm not a lawyer, but my understanding the problem arises not from the reproduction of code that is copyrighted, but rather how that code is licensed. For example, I think it would be perfectly fine to copy code verbatim from Apache Kafka, because its license allows that, subject to things like preserving copyright information.
The improvement we could make concerns how we're enforcing source code copyright headers. Currently we insist that our header is used, and that header says "copyright Kroxylicious Authors", and has an ASL license. By insisting on that header we aren't able to copy ASL licensed code from other projects. But this problem comes from our own requirement to have that header, rather than being something that's actually legally necessary. IIRC Richard Fontana has said that copyright headers are not necessary, that copyright exists without them.
AGENTS.md
Outdated
| The decision to merge is always made by human Committer(s) following the project's [decision making](./GOVERNANCE.md#decision-making) framework. | ||
| - PR descriptions should focus on the problem being addressed, the approach taken, and any trade-offs or alternatives considered. Note any AI tool involvement. | ||
| Automated or AI-assisted reviews, such as security or style checks, can supplement human review but do not replace it. | ||
| The decision to merge is always made by human committers following the project's [decision making](./GOVERNANCE.md#decision-making) framework. | ||
| - PR descriptions must focus on the problem being addressed, the approach taken, and any trade-offs or alternatives considered. They must also note any AI tool involvement. |
There was a problem hiding this comment.
"They must also note any AI tool involvement."
Firstly, there's duplication here with the Assisted-by header of the commit message. So do we really need both? What are we trying to achieve by having AI usage called out in the PR too? Is it just informational for the reviewers? Or is it record keeping?
If we decide we want AI usage called out on the PR, we should guide what we actually expect the contributors to do.
Would a tick box suffice?
[ ] If AI is used, I've added the Assisted-by header to my commit, stating the model used`
or do you want something more narrative?
Alternatively, we could use a workflow to detect the "Assisted-by" header on the commit, and flag that on PR in some systematic way (label?/description?)
|
|
||
| This file provides guidance for AI coding tools, such as GitHub Copilot and Claude Code, when working in Kroxylicious repositories. | ||
|
|
||
| Contributors using AI tools must ensure the tools can access this file and any repository-specific `AGENTS.md` file. |
There was a problem hiding this comment.
So if I want to work on repo X I also need to go and manually copy this AGENTS.md file content from this repo? Not the best experience IMO and something people will probably skip.
| ### DCO Sign-off | ||
|
|
||
| All commits must be signed off with the Developer Certificate of Origin (DCO). | ||
| Use `git commit -s` to add the sign-off automatically. |
There was a problem hiding this comment.
I agree that a human sign-off is necessary. From a legal perspective, it provides a much safer layer of accountability.
| ### Commit Discipline | ||
|
|
||
| - Each commit must be atomic and represent a single logical change. | ||
| - Keep commits small enough to be reviewed in a few minutes. |
There was a problem hiding this comment.
IMO PR size is not a good indicator of the time it takes to review.
Suggestion:
Focus on making commits small enough to be reviewed quickly, though keep in mind that logical complexity matters more than line count.
| ### Assisted-by Trailer | ||
|
|
||
| Commits created with AI assistance must include an `Assisted-by` trailer identifying the tool and model. | ||
| Add the trailer to the commit message body after the sign-off: |
There was a problem hiding this comment.
Commits of this PR do not respect the policy as Assisted-by is added before Signed-off-by. Anyway, why do we need to specify the exact ordering?
|
|
||
| ### Pull Requests | ||
|
|
||
| - A pull request should address a single cohesive goal. Do not bundle unrelated changes together — each PR should tell a clear story that a reviewer can follow from start to finish. |
There was a problem hiding this comment.
Do we also want to say something about refactoring? Should they always be separate PRs or only when they hide the actual changes.
Summary
Addresses #26 — documenting how AI may be used when crafting contributions to the project.
CONTRIBUTING.mdestablishing the project's position: AI tools are permitted, but the contributor owns what they submit, must understand it, and must disclose significant AI usage.CONTRIBUTING.mdnoting the Java/Maven foundation.AGENTS.mdproviding AI coding tools with process expectations (DCO, commit discipline, PR standards, naming conventions). Individual repositories can add their ownAGENTS.mdwith repo-specific technical details.Why
Assisted-byrather than Apache'sGenerated-byApache's
Generated-bytrailer is primarily about provenance tracking — an audit trail so the foundation can later query "which artifacts did model X generate?" if licensing concerns emerge around a model's training data. The focus is on the output's origin.Kroxylicious's
Assisted-bytrailer is primarily about contributor responsibility. The policy's core message is "you are the contributor" — the trailer reinforces that the human is in the driving seat and the tool assisted them, rather than implying the tool produced the output and the human accepted it. The DCO sign-off already establishes legal accountability;Assisted-byextends that spirit to tooling disclosure.In practice, both provide the same audit trail if needed. The difference is philosophical:
Generated-byframes the tool as the actor,Assisted-byframes the contributor as the actor. The latter is more consistent with this project's emphasis on contributor ownership and understanding.References consulted
Test plan
CONTRIBUTING.mdfor tone, completeness, and consistency with governance modelAGENTS.mdfor clarity as instructions to AI toolsGOVERNANCE.md#decision-making,DCO.txt, andLICENSEresolve correctlyAssisted-bytrailer format meets the project's needs🤖 Generated with Claude Code