Introduce an initial LLM Usage Policy by sirosen · Pull Request #2318 · jazzband/pip-tools

sirosen · 2026-01-31T00:20:29Z

Driven by our prior discussion, this lays out an initial policy which is meant to be simple to understand.

After consideration, and in particular looking at the current pip contribution policy¹, I have taken us back to the original two "columns" I suggested for our policy: "Disclosure" and "Ownership".

The policy is stated as meant for "LLM Generated Contributions". Although during earlier discussion I suggested that we avoid singling out these tools, on review (especially with some recent PRs), I am not sure that is wise. I would like it to be very clear to LLM users that we have some additional standards for them -- which I view as offsetting the ease with which they can spam projects and do harm.

The policy states that it is "to protect our maintainers as well as our contributors"; hopefully this is a clear hint that the maintainers even need some level of protection, and will help new contributors understand why we have a policy.

Echoing some prior discussion about "Don't let AI speak for you" / "Don't let AI think for you", there's a line included that draws a distinction between "typing" and "thinking".

To give us a clear out, in case we have truly problematic github users show up, the policy calls out "extreme cases" as spam/slop.

Finally, the policy itself links back to the original discussion as an open invitation for anyone who wants to advocate for us refining this policy.

It's very short. See: https://pip.pypa.io/en/stable/development/contributing/
While contributors may use whatever tools they like when developing a pull request, it is the contributor’s responsibility to ensure that submitted code meets the project requirements, and that they understand the submitted code well enough to respond to review comments.

In particular, we will mark LLM-generated slop as spam without additional discussion.
↩

webknjaz

Have you seen https://github.com/chaoss/wg-ai-alignment/tree/main/moderation#readme? It's got the links from my gist and some more. It's a centralized effort worth watching periodically.

I've also found Sebastián's framing interesting: https://bsky.app/profile/tiangolo.com/post/3mc6mjosfa22s / https://fastapi.tiangolo.com/contributing/#automated-code-and-ai. It's a bit less formal.

This one is quirky https://curl.se/.well-known/security.txt

Do you think we could add some informal tone to the policy?

changelog.d/2318.contrib.md

webknjaz · 2026-01-31T09:00:34Z

CONTRIBUTING.md

Suggested change

1. **Disclosure**: contributors should indicate when they have use an LLM to generate a part of their work.

1. **Disclosure**: contributors should indicate when they have used an LLM to generate a part of their work.

Also: should or must?

My thinking on LLMs as a tool is evolving. I am doing my best to mentally bucket them with other productivity enhancing tools such as an IDE or a language server. I understand LLMs are vastly different, but at the end of the day it is still a tool that requires human thought, direction, judgment, and taste to guide toward successful outcomes. It may even be a new class of tool in our career field that requires more training before use, similar to operating dangerous heavy machinery.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

If I use an LLM to rubber-duck some code or make a rough prototype, then I hammer it into its final shape and only 10-20% the original LLM-generated code remains, I have enough sweat equity in it to call it "my code" and would not want to taint it with the "LLM" label.

If code is majority LLM-generated but carefully reviewed and understood by the author, that seems worthy of full disclosure.

Wholly LLM-generated code that is not understood by the submitter is not work considering.

There are degrees to this problem. At this point it is probably ok to require any LLM use to be disclosed but we may want to reevaluate that in the future.

I 99% agree. If the submitter reworks it (line by line or even less, but in significant measure), the code is probably going to be fine. If they didn't work on it at all after it got vibe-coded for them, it's a waste product and should be discarded.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

I've experienced this as well, but not only with LLM-generated code. I've also experienced it a bit with junior engineers -- or a lot working with people who have a track record of sloppiness.

When I review code from DeveloperA, I know that she always writes solid unit tests, defines clear internal APIs, and thinks particularly hard about performance. So when I review her code, I mostly scan for anything unusual and read the unit tests to help me understand subtle behaviors.

When I review code from DeveloperB, I often see important untested cases or bugs which indicate that the code never ran. I read every line using the tiny Python interpreter in my brain (🫠) because my trust has been broken.

In this framing, what kind of developer is an LLM? Is it one that has earned your trust, and we can mostly ignore and skim the details? Or is it one that has broken your trust, that needs to always be watched for logical errors?

A lot of ink has been spilled over the ways in which LLMs trip up our heuristics for quality. But I hold that it behaves much more like an inexperienced or sloppy developer.

So yeah, I read every line of LLM-generated code much more closely. And I don't consider that something we should work on changing in ourselves. We should read it more closely. "Looking down on it", would be a problem. But not trusting the code is the correct choice.

webknjaz · 2026-01-31T09:01:36Z

CONTRIBUTING.md

Let's make the link detached like others.

webknjaz · 2026-02-02T00:31:34Z

I feel like (1) we might not want to use word "policy" but rather something along the lines of "expectations", plus maybe (2) it'd be good to have guidelines for reviewers. And have some explanatory wording of "this is how we'll perceive submissions/interactions and here's why".

Here's some more on on what I like in other policy examples:

CPython (https://devguide.python.org/getting-started/generative-ai/) is clear on what's acceptable and not, plus embeds practical recommendations. I like that it augments a "how to review a PR" (https://devguide.python.org/getting-started/pull-request-lifecycle/#reviewing)
Zulip (https://github.com/zulip/zulip/blob/main/CONTRIBUTING.md#ai-use-policy-and-guidelines) emphasizes the importance of getting familiar with the code base and how one's responsible for whatever they send in
Ladybird (https://github.com/LadybirdBrowser/ladybird/blob/master/CONTRIBUTING.md#on-usage-of-ai-and-llms) talks about humans being responsible to ensuring quality standards
Hynek just brings up the same personal responsibility point in python-attrs/attrs@6c54145
GNU/Linux (https://lore.kernel.org/ksummit/20251114183528.1239900-1-dave.hansen@linux.intel.com/) distinguishes meaningful portions of contributions vs. mechanical (like formatters) and calls out the need to be prepared to respond to review feedback (I've been seeing a pattern of contributors just pushing more and more commits with some changes when the review comments clearly invite to discuss things rather than react immediately, and don't actually communicate with the reviewers short of posting bloated LLM-made blobs of text). They also encourage disclosing the tooling in use. On a related note, I've heard of some people trying to use https://aiattribution.github.io — perhaps, we should as well. It's interesting that they explain that the maintainers may decide to reject such contributions or help out improving the prompting. Or try to make sure the contributor actually understands their doing.. I don't think I saw this in other policies.
Servo (https://book.servo.org/contributing/getting-started.html#ai-contributions) and Gentoo (https://wiki.gentoo.org/wiki/Project:Council/AI_policy) enumerate the burdens.
k8s (https://github.com/kubernetes/community/pull/8682/changes) instructs the contributors not to leave the first review to other reviewers. I think it's good framing too.
Castle Game Engine (https://castle-engine.io/ai) tells people to keep the LLM convos private and to be critical. It also makes recommendations on LLM use, like giving them really small tasks and letting them learn the conventions.
ghostty (https://github.com/ghostty-org/ghostty/blob/main/AI_POLICY.md) warns about banning "bad AI drivers", which I found clever (echoes https://curl.se/.well-known/security.txt)
And I like how Sebastián (https://fastapi.tiangolo.com/contributing/#automated-code-and-ai) frames LLM abuse as a DDoS of the maintainers, telling people that the maintainers could use LLMs more efficiently than those bad users if they wanted to, emphasising that the effort to review submissions is great and if it's evident that contributors didn't spend any effort, then they'll be banned.
OpenInfra (https://openinfra.org/legal/ai-policy) documents a reviewer checklist that encourages being more attentive to LLM submissions.
Jellyfin (https://jellyfin.org/docs/general/contributing/llm-policies/) requires rephrasing any LLM output in one's own words and warns about rejecting incoherent patches as nonsense.

On a related note, I like CPython's triage process explanations (https://devguide.python.org/triage/triaging/ / https://devguide.python.org/triage/triage-team/) and think that it's a good source to take into account and steer people towards in terms of showing that it's important to get familiar with the community and the code base. Something similar to what you @sirosen wanted to write last year regarding contributing to CPython.

sirosen · 2026-02-02T04:54:07Z

I haven't had time in the past couple of days to circle back and apply changes (and it's late in my local time today), but I wanted to drop a quick note.

I'm 👍 on all of the small changes suggested, but I want to review some of these other contrib docs before making more edits. Not all are familiar to me. e.g., I just read, Sebastian's policy for FastAPI projects, and I like it. It strikes a really good balance of brevity and explanation.

My tone was more formal, but I'll reconsider that. I'll need to try a few different versions of this out to see what works best. Possibly I'll post some samples of different possible text when I work through it, but if I really like a result I might just update the PR.

webknjaz · 2026-02-02T19:13:22Z

Oh, and I forgot one more thing: I think we should explicitly call out that things marked as a "good first issue" are best solved w/o LLMs, explaining that they are likely to be a good learning experience and generated submissions will probably harm this process.

CONTRIBUTING.md

webknjaz · 2026-02-12T10:48:46Z

@sirosen so over on the pytest discord server, @0cjs shared this:

I like the disclosure policy. In theory, it doesn't matter if the contributor used an LLM or not. In practice, if this gives them pause when they have to disclose this and check their work better, that's probably a good thing.,

I think perhaps telling folks you expect them to be able to justify every character changed might help; even if you don't actually ask them to, that they should be able to should make them think about what they're changing.,

Possibly related to #2: make the commit sequence tell a story? I won't explain that in detail here unless asked; I suspect you get it.,

Actually, probably subsuming #2 and #3: design your commits for review.

And looking at that Claude summary, I'm feeling now that perhaps some faster way to figure out if a submission isn't living up to #4 above isn't a good place to start. I might, for example, start with the output of
git log --oneline --no-decorate --reverse \
    main@{u}..origin/dev/submitter/2025-02-15/some-stuff
And say, "Sorry, I'm not seeing from this the overall story of what's being changed. Rewrite this so that we get a decent overview from that, and then we can move on with more detailed review." (This is basically just the first step of the review procedure I describe here.)

webknjaz · 2026-02-12T16:54:37Z

@sirosen so this is just crazy, was just shared in one of the discords I'm on: matplotlib/matplotlib#31132 (comment) / https://sethmlarson.dev/automated-public-shaming-of-open-source-maintainers 🤯

h/t @savannahostrowski and @sethmlarson

webknjaz · 2026-02-13T15:54:47Z

Another opinion from private spaces:

One thing I think we as maintainers of open projects have to do is to alert our management that if we have to deal with AI slop bug reports or patches then this will make us less productive, and automating responses to that will not help. If we have to face actual abuse or attacks from AI agents in blocking or removing their AI slop PRs, then that will impact our mental health.

samdoran

Adding some of my thoughts. I think it's great that this is being discussed.

samdoran · 2026-02-13T21:01:50Z

CONTRIBUTING.md

My thinking on LLMs as a tool is evolving. I am doing my best to mentally bucket them with other productivity enhancing tools such as an IDE or a language server. I understand LLMs are vastly different, but at the end of the day it is still a tool that requires human thought, direction, judgment, and taste to guide toward successful outcomes. It may even be a new class of tool in our career field that requires more training before use, similar to operating dangerous heavy machinery.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

If I use an LLM to rubber-duck some code or make a rough prototype, then I hammer it into its final shape and only 10-20% the original LLM-generated code remains, I have enough sweat equity in it to call it "my code" and would not want to taint it with the "LLM" label.

If code is majority LLM-generated but carefully reviewed and understood by the author, that seems worthy of full disclosure.

Wholly LLM-generated code that is not understood by the submitter is not work considering.

There are degrees to this problem. At this point it is probably ok to require any LLM use to be disclosed but we may want to reevaluate that in the future.

samdoran · 2026-02-13T21:05:02Z

CONTRIBUTING.md

I think this is very good guidance, almost worthy of inclusion in an issue template. The reason being that the prompt actually reveals the thinking behind the code. The code from the LLM may or may not reflect the author's original intention because it is an imperfect tool. But having a better understanding of the thinking behind code is always the most beneficial thing that comes out of code review.

Yeah, I actually started suggesting asking for the LLM name and the prompt in issue templates in ansible and aio-libs some time last year but didn't get any support as people thought this might be alienating. I'd still like that, though..

We should definitely think about changing the PR and issue templates, and I agree that some good form for "what tools did you use?" could be the path forward.

But for now I want to focus on just the contrib doc piece, and just setting down the rules we can agree upon! 🫶

CONTRIBUTING.md

webknjaz · 2026-02-14T09:08:11Z

Looks like GH is noticing the slop storm even more now: https://github.blog/open-source/maintainers/welcome-to-the-eternal-september-of-open-source-heres-what-we-plan-to-do-for-maintainers/

sirosen

Thanks much for the reviews and feedback! I've just posted a new version of the document here.

I could lean in harder on making the text structured (bulleted lists, tables, etc), but I just didn't like it when I tried. So I'm sharing, as I think I secretly knew I would, only one new revision, as an update to the PR.

I added another section in this draft, for the good first issue label. Intentionally, it's not nested under the LLM guidelines, but is the next-sibling after it in the docs.

CONTRIBUTING.md

sirosen · 2026-02-14T20:31:06Z

CONTRIBUTING.md

We should definitely think about changing the PR and issue templates, and I agree that some good form for "what tools did you use?" could be the path forward.

But for now I want to focus on just the contrib doc piece, and just setting down the rules we can agree upon! 🫶

CONTRIBUTING.md

sirosen · 2026-02-14T20:46:44Z

CONTRIBUTING.md

I 99% agree. If the submitter reworks it (line by line or even less, but in significant measure), the code is probably going to be fine. If they didn't work on it at all after it got vibe-coded for them, it's a waste product and should be discarded.

A secondary concern is something I am working on personally, which is that I find myself negatively biased towards code under review if it was LLM generated. That is both good and bad. Good because it causes me to be more vigilant and careful in the review. Bad in that I am looking down on the code as "lesser" and tend to be more critical.

I've experienced this as well, but not only with LLM-generated code. I've also experienced it a bit with junior engineers -- or a lot working with people who have a track record of sloppiness.

When I review code from DeveloperA, I know that she always writes solid unit tests, defines clear internal APIs, and thinks particularly hard about performance. So when I review her code, I mostly scan for anything unusual and read the unit tests to help me understand subtle behaviors.

When I review code from DeveloperB, I often see important untested cases or bugs which indicate that the code never ran. I read every line using the tiny Python interpreter in my brain (🫠) because my trust has been broken.

In this framing, what kind of developer is an LLM? Is it one that has earned your trust, and we can mostly ignore and skim the details? Or is it one that has broken your trust, that needs to always be watched for logical errors?

A lot of ink has been spilled over the ways in which LLMs trip up our heuristics for quality. But I hold that it behaves much more like an inexperienced or sloppy developer.

So yeah, I read every line of LLM-generated code much more closely. And I don't consider that something we should work on changing in ourselves. We should read it more closely. "Looking down on it", would be a problem. But not trusting the code is the correct choice.

CONTRIBUTING.md

Driven by our prior discussion, this lays out an initial policy which is meant to be simple to understand. After consideration, and in particular looking at the current `pip` contribution policy[^1], I have taken us back to the original two "columns" I suggested for our policy: "Disclosure" and "Ownership". The policy is stated as meant for "LLM Generated Contributions". Although during earlier discussion I suggested that we avoid singling out these tools, on review (especially with some recent PRs), I am not sure that is wise. I would like it to be very clear to LLM users that we have some additional standards for them -- which I view as offsetting the ease with which they can spam projects and do harm. The policy states that it is "to protect our maintainers as well as our contributors"; hopefully this is a clear hint that the maintainers even _need_ some level of protection, and will help new contributors understand why we have a policy. Echoing some prior discussion about "Don't let AI speak for you" / "Don't let AI think for you", there's a line included that draws a distinction between "typing" and "thinking". To give us a clear out, in case we have truly problematic github users show up, the policy calls out "extreme cases" as spam/slop. Finally, the policy itself links back to the original discussion as an open invitation for anyone who wants to advocate for us refining this policy. [^1]: It's very short. See: https://pip.pypa.io/en/stable/development/contributing/

Switch to a more casual tone and expand out the disclosure and ownership sections into paragraphs, rather than a bulleted list. A new note is added, inspired by the FastAPI contrib doc and others, to suggest that contributors think about what they are adding *beyond* just prompting an LLM. A new section is added on the "good first issue" label to explain that it should not be fed into LLMs. The policy discussion link is moved to be a detached link. A new doc fragment links to the contrib one.

0cjs · 2026-02-19T09:15:39Z

Here's the TLDR of what I want to see in a policy; justification of it is below.

Any change you have made includes any refactorings reasonable to decrease code size and complexity. If your change touches code that should be refactored and doesn't, we will not accept it until you add those refactorings.
You understand and can justify every line of code in your change. Look at every line you add or change and, if you don't know why it's added or changed, remove it. (If your change doesn't work after that, now you can figure out why it's necessary.)

My current consulting job is essentially, "Help rescue a company that's written hundreds of thousands of lines of code with an LLM so that they can continue operating." Perhaps ironically, I'm mainly a Haskell and Python guy, but they're using JS, so I'm relying heavily on that LLM myself to help me with coding.

The biggest problem I've found is simply that LLMs produce exponential code increases: they produce "working well enough" code, albeit too much of it, and every fix for any problem adds more code. (That of course adds more, though often more subtle, problems, requiring more code to fix, and you can easily see where that goes.) TLDR: They never refactor. (If that did not make you sit up and scream, read on; if it did, you already know what's coming.) I am not the only one to point this out.

There are claims that just now, finally (literally days after the release of Opus 4.6) LLMs are able to write code to the level where good programmers will no longer be needed. (This is not the first time this claim has been made the last year or two, or the last decade, or even in the last century.) But I'm not buying it not only because of all this past experience but because I've I just upgraded the Claude I've been using for a while to that exact same model, with all the extras turned on, and I can, as an experienced programmer, easily point out where it's failing. Let me give an example. (This is simple code, but not simplified: it's actual production code that I am using every day and maintaining for the rest of my time at this company.)

Today (again with Opus 4.6 and "extra thinking" turned on) I asked it to stop a whole load of of spew from a tofu plan command, which was producing dozens of bold (in my terminal) aws_instance.foo: Refreshing state... [id=i-123456789abcdef] lines. It suggested updating the script to add 2>&1 | grep -v ... after that command (you can fill in the general idea for its ...) to that line. This is, to an experienced programmer, clearly stupid and introducing two new problems for an imperfect fix for the old one. A simple prompt of "isn't there a command line option to suppress those messages" immediately made it come back with adding just -concise to the command line.

(For those who do not understand the huge difference between the first and second solutions, in terms of long-term maintainability: the first combines stderr with stdout which means that callers of the script using this can no longer suppress stdout for other reasons without making error messages disappear, and the grep is a guess at what we should match that may randomly suppress or allow other messages. These may seem small and unimportant, but they are exactly the sort of subtle "oh, could that ever be a problem, really?" things that will come back to bite you in the ass in the future. Yes, only one in a hundred of them, but see below.)

Those of us who build large systems well know that everybody says every single week, "oh, little things like this are no big deal" and two years later you're crushed under thousands of "little things" like this that have hemmed in the directions in which you can progress (removing only 1 degree there, 0.5 degrees there) such that there's no direction in which you can move any more. This is exactly what I'm dealing with right now: things like there are literally a several thousand catch clauses that issue a 500 Internal Server Error HTTP response that I'm sure do only about a dozen different things, total, but refactoring this to a dozen instances of that instead of several thousand is a lot of work. (Assuming it's a dozen; who knows. Had they clarified their intentions at the time and refactored to that, the problem wouldn't exist.)

My feeling is that it really doesn't matter much at all if LLMs are ten times better at reading and dealing with large code bases than I am: if they're expanding the code bases such that you need someone a thousand times better than I am, they'll be just as lost as me, and sink pretty much as quickly into "every time I fix a bug I add a new bug."

Dijkstra explained all this decades ago: after a certain (very early) point programming is not a problem of being able to generate code: it's a problem of being able to simplify your intentions so you can control the complexity. And no amount of "I can create an AI to handle more complex stuff" is going to compete against, "if I do random shit I can increase complexity faster than the entire universe can handle."

All of which is just long-winded justification for my suggestion above: in another form, "how did you make our program/project less complex, or at no more complex for the additional functionality?"

sirosen requested a review from webknjaz January 31, 2026 00:20

sirosen force-pushed the initial-llm-usage-policy branch from bb243f4 to 7130a18 Compare January 31, 2026 00:21

sirosen added this to the 7.5.3 milestone Jan 31, 2026

psf-chronographer bot added the bot:chronographer:provided label Jan 31, 2026

webknjaz mentioned this pull request Jan 31, 2026

pip-tools policy PR chaoss/wg-ai-alignment#22

Open

webknjaz reviewed Jan 31, 2026

View reviewed changes

sirosen modified the milestones: 7.5.3, 7.5.4 Feb 6, 2026

mr-c reviewed Feb 9, 2026

View reviewed changes

CONTRIBUTING.md Outdated Show resolved Hide resolved

samdoran reviewed Feb 13, 2026

View reviewed changes

sirosen commented Feb 14, 2026

View reviewed changes

sirosen added 2 commits February 14, 2026 15:05

sirosen force-pushed the initial-llm-usage-policy branch from fe3eeb0 to cddc504 Compare February 14, 2026 21:06

Wrap contrib text to 80 chars to pass pymarkdown

29e83cc

	1. Disclosure: contributors should indicate when they have use an LLM to generate a part of their work.
	1. Disclosure: contributors should indicate when they have used an LLM to generate a part of their work.

Uh oh!

Conversation

sirosen commented Jan 31, 2026

Footnotes

Uh oh!

webknjaz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

webknjaz commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sirosen commented Feb 2, 2026

Uh oh!

webknjaz commented Feb 2, 2026

Uh oh!

Uh oh!

webknjaz commented Feb 12, 2026

Uh oh!

webknjaz commented Feb 12, 2026

Uh oh!

webknjaz commented Feb 13, 2026

Uh oh!

samdoran left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

webknjaz commented Feb 14, 2026

Uh oh!

sirosen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

0cjs commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Comments

webknjaz commented Feb 2, 2026 •

edited

Loading

sirosen left a comment •

edited

Loading

0cjs commented Feb 19, 2026 •

edited

Loading