Skip to content

Introduce an initial LLM Usage Policy#2318

Open
sirosen wants to merge 3 commits intojazzband:mainfrom
sirosen:initial-llm-usage-policy
Open

Introduce an initial LLM Usage Policy#2318
sirosen wants to merge 3 commits intojazzband:mainfrom
sirosen:initial-llm-usage-policy

Conversation

@sirosen
Copy link
Member

@sirosen sirosen commented Jan 31, 2026

Driven by our prior discussion, this lays out an initial policy which is meant to be simple to understand.

After consideration, and in particular looking at the current pip contribution policy1, I have taken us back to the original two "columns" I suggested for our policy: "Disclosure" and "Ownership".

The policy is stated as meant for "LLM Generated Contributions". Although during earlier discussion I suggested that we avoid singling out these tools, on review (especially with some recent PRs), I am not sure that is wise. I would like it to be very clear to LLM users that we have some additional standards for them -- which I view as offsetting the ease with which they can spam projects and do harm.

The policy states that it is "to protect our maintainers as well as our contributors"; hopefully this is a clear hint that the maintainers even need some level of protection, and will help new contributors understand why we have a policy.

Echoing some prior discussion about "Don't let AI speak for you" / "Don't let AI think for you", there's a line included that draws a distinction between "typing" and "thinking".

To give us a clear out, in case we have truly problematic github users show up, the policy calls out "extreme cases" as spam/slop.

Finally, the policy itself links back to the original discussion as an open invitation for anyone who wants to advocate for us refining this policy.

Footnotes

  1. It's very short. See: https://pip.pypa.io/en/stable/development/contributing/

    While contributors may use whatever tools they like when developing a pull request, it is the contributor’s responsibility to ensure that submitted code meets the project requirements, and that they understand the submitted code well enough to respond to review comments.

    In particular, we will mark LLM-generated slop as spam without additional discussion.

Copy link
Member

@webknjaz webknjaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen https://github.com/chaoss/wg-ai-alignment/tree/main/moderation#readme? It's got the links from my gist and some more. It's a centralized effort worth watching periodically.

I've also found Sebastián's framing interesting: https://bsky.app/profile/tiangolo.com/post/3mc6mjosfa22s / https://fastapi.tiangolo.com/contributing/#automated-code-and-ai. It's a bit less formal.

This one is quirky https://curl.se/.well-known/security.txt


Do you think we could add some informal tone to the policy?

CONTRIBUTING.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
1. **Disclosure**: contributors should indicate when they have use an LLM to generate a part of their work.
1. **Disclosure**: contributors should indicate when they have used an LLM to generate a part of their work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also: should or must?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking on LLMs as a tool is evolving. I am doing my best to mentally bucket them with other productivity enhancing tools such as an IDE or a language server. I understand LLMs are vastly different, but at the end of the day it is still a tool that requires human thought, direction, judgment, and taste to guide toward successful outcomes. It may even be a new class of tool in our career field that requires more training before use, similar to operating dangerous heavy machinery.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

If I use an LLM to rubber-duck some code or make a rough prototype, then I hammer it into its final shape and only 10-20% the original LLM-generated code remains, I have enough sweat equity in it to call it "my code" and would not want to taint it with the "LLM" label.

If code is majority LLM-generated but carefully reviewed and understood by the author, that seems worthy of full disclosure.

Wholly LLM-generated code that is not understood by the submitter is not work considering.

There are degrees to this problem. At this point it is probably ok to require any LLM use to be disclosed but we may want to reevaluate that in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 99% agree. If the submitter reworks it (line by line or even less, but in significant measure), the code is probably going to be fine. If they didn't work on it at all after it got vibe-coded for them, it's a waste product and should be discarded.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

I've experienced this as well, but not only with LLM-generated code. I've also experienced it a bit with junior engineers -- or a lot working with people who have a track record of sloppiness.

When I review code from DeveloperA, I know that she always writes solid unit tests, defines clear internal APIs, and thinks particularly hard about performance. So when I review her code, I mostly scan for anything unusual and read the unit tests to help me understand subtle behaviors.

When I review code from DeveloperB, I often see important untested cases or bugs which indicate that the code never ran. I read every line using the tiny Python interpreter in my brain (🫠) because my trust has been broken.

In this framing, what kind of developer is an LLM? Is it one that has earned your trust, and we can mostly ignore and skim the details? Or is it one that has broken your trust, that needs to always be watched for logical errors?

A lot of ink has been spilled over the ways in which LLMs trip up our heuristics for quality. But I hold that it behaves much more like an inexperienced or sloppy developer.

So yeah, I read every line of LLM-generated code much more closely. And I don't consider that something we should work on changing in ourselves. We should read it more closely. "Looking down on it", would be a problem. But not trusting the code is the correct choice.

CONTRIBUTING.md Outdated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make the link detached like others.

@webknjaz
Copy link
Member

webknjaz commented Feb 2, 2026

I feel like (1) we might not want to use word "policy" but rather something along the lines of "expectations", plus maybe (2) it'd be good to have guidelines for reviewers. And have some explanatory wording of "this is how we'll perceive submissions/interactions and here's why".

Here's some more on on what I like in other policy examples:


On a related note, I like CPython's triage process explanations (https://devguide.python.org/triage/triaging/ / https://devguide.python.org/triage/triage-team/) and think that it's a good source to take into account and steer people towards in terms of showing that it's important to get familiar with the community and the code base. Something similar to what you @sirosen wanted to write last year regarding contributing to CPython.

@sirosen
Copy link
Member Author

sirosen commented Feb 2, 2026

I haven't had time in the past couple of days to circle back and apply changes (and it's late in my local time today), but I wanted to drop a quick note.

I'm 👍 on all of the small changes suggested, but I want to review some of these other contrib docs before making more edits. Not all are familiar to me. e.g., I just read, Sebastian's policy for FastAPI projects, and I like it. It strikes a really good balance of brevity and explanation.

My tone was more formal, but I'll reconsider that. I'll need to try a few different versions of this out to see what works best. Possibly I'll post some samples of different possible text when I work through it, but if I really like a result I might just update the PR.

@webknjaz
Copy link
Member

webknjaz commented Feb 2, 2026

Oh, and I forgot one more thing: I think we should explicitly call out that things marked as a "good first issue" are best solved w/o LLMs, explaining that they are likely to be a good learning experience and generated submissions will probably harm this process.

@sirosen sirosen modified the milestones: 7.5.3, 7.5.4 Feb 6, 2026
@webknjaz
Copy link
Member

@sirosen so over on the pytest discord server, @0cjs shared this:

  1. I like the disclosure policy. In theory, it doesn't matter if the contributor used an LLM or not. In practice, if this gives them pause when they have to disclose this and check their work better, that's probably a good thing.,
  2. I think perhaps telling folks you expect them to be able to justify every character changed might help; even if you don't actually ask them to, that they should be able to should make them think about what they're changing.,
  3. Possibly related to #2: make the commit sequence tell a story? I won't explain that in detail here unless asked; I suspect you get it.,
  4. Actually, probably subsuming #2 and #3: design your commits for review.

And looking at that Claude summary, I'm feeling now that perhaps some faster way to figure out if a submission isn't living up to #4 above isn't a good place to start. I might, for example, start with the output of

git log --oneline --no-decorate --reverse \
    main@{u}..origin/dev/submitter/2025-02-15/some-stuff

And say, "Sorry, I'm not seeing from this the overall story of what's being changed. Rewrite this so that we get a decent overview from that, and then we can move on with more detailed review." (This is basically just the first step of the review procedure I describe here.)

@webknjaz
Copy link
Member

@sirosen so this is just crazy, was just shared in one of the discords I'm on: matplotlib/matplotlib#31132 (comment) / https://sethmlarson.dev/automated-public-shaming-of-open-source-maintainers 🤯

h/t @savannahostrowski and @sethmlarson

@webknjaz
Copy link
Member

Another opinion from private spaces:

One thing I think we as maintainers of open projects have to do is to alert our management that if we have to deal with AI slop bug reports or patches then this will make us less productive, and automating responses to that will not help. If we have to face actual abuse or attacks from AI agents in blocking or removing their AI slop PRs, then that will impact our mental health.

Copy link

@samdoran samdoran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding some of my thoughts. I think it's great that this is being discussed.

CONTRIBUTING.md Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking on LLMs as a tool is evolving. I am doing my best to mentally bucket them with other productivity enhancing tools such as an IDE or a language server. I understand LLMs are vastly different, but at the end of the day it is still a tool that requires human thought, direction, judgment, and taste to guide toward successful outcomes. It may even be a new class of tool in our career field that requires more training before use, similar to operating dangerous heavy machinery.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

If I use an LLM to rubber-duck some code or make a rough prototype, then I hammer it into its final shape and only 10-20% the original LLM-generated code remains, I have enough sweat equity in it to call it "my code" and would not want to taint it with the "LLM" label.

If code is majority LLM-generated but carefully reviewed and understood by the author, that seems worthy of full disclosure.

Wholly LLM-generated code that is not understood by the submitter is not work considering.

There are degrees to this problem. At this point it is probably ok to require any LLM use to be disclosed but we may want to reevaluate that in the future.

CONTRIBUTING.md Outdated

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is very good guidance, almost worthy of inclusion in an issue template. The reason being that the prompt actually reveals the thinking behind the code. The code from the LLM may or may not reflect the author's original intention because it is an imperfect tool. But having a better understanding of the thinking behind code is always the most beneficial thing that comes out of code review.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I actually started suggesting asking for the LLM name and the prompt in issue templates in ansible and aio-libs some time last year but didn't get any support as people thought this might be alienating. I'd still like that, though..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely think about changing the PR and issue templates, and I agree that some good form for "what tools did you use?" could be the path forward.

But for now I want to focus on just the contrib doc piece, and just setting down the rules we can agree upon! 🫶

@webknjaz
Copy link
Member

Looks like GH is noticing the slop storm even more now: https://github.blog/open-source/maintainers/welcome-to-the-eternal-september-of-open-source-heres-what-we-plan-to-do-for-maintainers/

Copy link
Member Author

@sirosen sirosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks much for the reviews and feedback! I've just posted a new version of the document here.

I could lean in harder on making the text structured (bulleted lists, tables, etc), but I just didn't like it when I tried. So I'm sharing, as I think I secretly knew I would, only one new revision, as an update to the PR.

I added another section in this draft, for the good first issue label. Intentionally, it's not nested under the LLM guidelines, but is the next-sibling after it in the docs.

CONTRIBUTING.md Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should definitely think about changing the PR and issue templates, and I agree that some good form for "what tools did you use?" could be the path forward.

But for now I want to focus on just the contrib doc piece, and just setting down the rules we can agree upon! 🫶

CONTRIBUTING.md Outdated
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I 99% agree. If the submitter reworks it (line by line or even less, but in significant measure), the code is probably going to be fine. If they didn't work on it at all after it got vibe-coded for them, it's a waste product and should be discarded.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

I've experienced this as well, but not only with LLM-generated code. I've also experienced it a bit with junior engineers -- or a lot working with people who have a track record of sloppiness.

When I review code from DeveloperA, I know that she always writes solid unit tests, defines clear internal APIs, and thinks particularly hard about performance. So when I review her code, I mostly scan for anything unusual and read the unit tests to help me understand subtle behaviors.

When I review code from DeveloperB, I often see important untested cases or bugs which indicate that the code never ran. I read every line using the tiny Python interpreter in my brain (🫠) because my trust has been broken.

In this framing, what kind of developer is an LLM? Is it one that has earned your trust, and we can mostly ignore and skim the details? Or is it one that has broken your trust, that needs to always be watched for logical errors?

A lot of ink has been spilled over the ways in which LLMs trip up our heuristics for quality. But I hold that it behaves much more like an inexperienced or sloppy developer.

So yeah, I read every line of LLM-generated code much more closely. And I don't consider that something we should work on changing in ourselves. We should read it more closely. "Looking down on it", would be a problem. But not trusting the code is the correct choice.

Driven by our prior discussion, this lays out an initial policy which is
meant to be simple to understand.

After consideration, and in particular looking at the current `pip`
contribution policy[^1], I have taken us back to the original two
"columns" I suggested for our policy: "Disclosure" and "Ownership".

The policy is stated as meant for "LLM Generated Contributions". Although
during earlier discussion I suggested that we avoid singling out these
tools, on review (especially with some recent PRs), I am not sure that is
wise. I would like it to be very clear to LLM users that we have some
additional standards for them -- which I view as offsetting the ease with
which they can spam projects and do harm.

The policy states that it is "to protect our maintainers as well as
our contributors"; hopefully this is a clear hint that the maintainers
even _need_ some level of protection, and will help new contributors
understand why we have a policy.

Echoing some prior discussion about "Don't let AI speak for you" /
"Don't let AI think for you", there's a line included that draws a
distinction between "typing" and "thinking".

To give us a clear out, in case we have truly problematic github users
show up, the policy calls out "extreme cases" as spam/slop.

Finally, the policy itself links back to the original discussion as an
open invitation for anyone who wants to advocate for us refining this
policy.

[^1]: It's very short.
      See: https://pip.pypa.io/en/stable/development/contributing/
Switch to a more casual tone and expand out the disclosure and ownership
sections into paragraphs, rather than a bulleted list.

A new note is added, inspired by the FastAPI contrib doc and others, to
suggest that contributors think about what they are adding *beyond* just
prompting an LLM.

A new section is added on the "good first issue" label to explain that it
should not be fed into LLMs.

The policy discussion link is moved to be a detached link.

A new doc fragment links to the contrib one.
@sirosen sirosen force-pushed the initial-llm-usage-policy branch from fe3eeb0 to cddc504 Compare February 14, 2026 21:06
@0cjs
Copy link

0cjs commented Feb 19, 2026

Here's the TLDR of what I want to see in a policy; justification of it is below.

  1. Any change you have made includes any refactorings reasonable to decrease code size and complexity. If your change touches code that should be refactored and doesn't, we will not accept it until you add those refactorings.
  2. You understand and can justify every line of code in your change. Look at every line you add or change and, if you don't know why it's added or changed, remove it. (If your change doesn't work after that, now you can figure out why it's necessary.)

My current consulting job is essentially, "Help rescue a company that's written hundreds of thousands of lines of code with an LLM so that they can continue operating." Perhaps ironically, I'm mainly a Haskell and Python guy, but they're using JS, so I'm relying heavily on that LLM myself to help me with coding.

The biggest problem I've found is simply that LLMs produce exponential code increases: they produce "working well enough" code, albeit too much of it, and every fix for any problem adds more code. (That of course adds more, though often more subtle, problems, requiring more code to fix, and you can easily see where that goes.) TLDR: They never refactor. (If that did not make you sit up and scream, read on; if it did, you already know what's coming.) I am not the only one to point this out.

There are claims that just now, finally (literally days after the release of Opus 4.6) LLMs are able to write code to the level where good programmers will no longer be needed. (This is not the first time this claim has been made the last year or two, or the last decade, or even in the last century.) But I'm not buying it not only because of all this past experience but because I've I just upgraded the Claude I've been using for a while to that exact same model, with all the extras turned on, and I can, as an experienced programmer, easily point out where it's failing. Let me give an example. (This is simple code, but not simplified: it's actual production code that I am using every day and maintaining for the rest of my time at this company.)

Today (again with Opus 4.6 and "extra thinking" turned on) I asked it to stop a whole load of of spew from a tofu plan command, which was producing dozens of bold (in my terminal) aws_instance.foo: Refreshing state... [id=i-123456789abcdef] lines. It suggested updating the script to add 2>&1 | grep -v ... after that command (you can fill in the general idea for its ...) to that line. This is, to an experienced programmer, clearly stupid and introducing two new problems for an imperfect fix for the old one. A simple prompt of "isn't there a command line option to suppress those messages" immediately made it come back with adding just -concise to the command line.

(For those who do not understand the huge difference between the first and second solutions, in terms of long-term maintainability: the first combines stderr with stdout which means that callers of the script using this can no longer suppress stdout for other reasons without making error messages disappear, and the grep is a guess at what we should match that may randomly suppress or allow other messages. These may seem small and unimportant, but they are exactly the sort of subtle "oh, could that ever be a problem, really?" things that will come back to bite you in the ass in the future. Yes, only one in a hundred of them, but see below.)

Those of us who build large systems well know that everybody says every single week, "oh, little things like this are no big deal" and two years later you're crushed under thousands of "little things" like this that have hemmed in the directions in which you can progress (removing only 1 degree there, 0.5 degrees there) such that there's no direction in which you can move any more. This is exactly what I'm dealing with right now: things like there are literally a several thousand catch clauses that issue a 500 Internal Server Error HTTP response that I'm sure do only about a dozen different things, total, but refactoring this to a dozen instances of that instead of several thousand is a lot of work. (Assuming it's a dozen; who knows. Had they clarified their intentions at the time and refactored to that, the problem wouldn't exist.)

My feeling is that it really doesn't matter much at all if LLMs are ten times better at reading and dealing with large code bases than I am: if they're expanding the code bases such that you need someone a thousand times better than I am, they'll be just as lost as me, and sink pretty much as quickly into "every time I fix a bug I add a new bug."

Dijkstra explained all this decades ago: after a certain (very early) point programming is not a problem of being able to generate code: it's a problem of being able to simplify your intentions so you can control the complexity. And no amount of "I can create an AI to handle more complex stuff" is going to compete against, "if I do random shit I can increase complexity faster than the entire universe can handle."

All of which is just long-winded justification for my suggestion above: in another form, "how did you make our program/project less complex, or at no more complex for the additional functionality?"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants

Comments