Fix documentation typos and grammar errors#13801
Fix documentation typos and grammar errors#13801DimitriPapadopoulos wants to merge 1 commit intopypa:mainfrom
Conversation
c5b020a to
ad16777
Compare
ichard26
left a comment
There was a problem hiding this comment.
Thanks!
(cc @notatallshaw, do we want to ask that AI authorship is removed?)
notatallshaw
left a comment
There was a problem hiding this comment.
Most of the changes are fine. But yeah I have two issues with the use of LLMs here:
-
I don't know how much value fly by typo and grammar fixes are, you don't learn anything about how to contribute to pip and if these are a real concern we should probably automate suggestions on a regular basis.
-
The way the commits are structured when we generate the authors file @DimitriPapadopoulos will not be considered an author, instead
copilot-swe-agent[bot]will be the "author". While I have zero problem with using LLMs to assist in software engineering I don't understand why a PR submitter would not want to be considered the authors of the commits. It also leaves a lot of other open questions if the PR submitter is not author of the commits that should be discussed elsewhere.
So yes, at least for now, I'm going to ask PR submitters to have commits that are their own.
So please update this PR to use commits where you are the author, and make the below change.
docs/html/ux-research-design/research-results/improving-pips-documentation.md
Outdated
Show resolved
Hide resolved
|
I agree LLM fixes should be automated, but am not sure how to best achieve that yet. Perhaps in different PRs (after a discussion in issues) ? I don't know how it works in other domains, but research ethics requires transparency about AI tool usage. See for example The ethics of using artificial intelligence in scientific research: new guidance needed for a new tool. I thought a good way to achieve that is not to endorse commits as your own when they were actually written by Copilot. Personally, I would insist commits written by Copilot appear as such. From a practical point of view, future training of LLM models would probably benefit from such a distinction — commits written by humans and commits written by an AI. If you insist, I could endorse the commits but clearly state |
|
I am a strong -1 on having LLMs appear in the Authors file. A line in the authors file doesn’t provide any nuance on our policies. There are many people with strong opinions on the use of LLMs, and to those people, that would signal “pip accepts vibe-coded junk”. IMO, we don’t want to deal with the sort of publicity that would generate, justified or not. Regarding the actual contribution. Did you review every change yourself, and confirm that you personally agree with it? If so, then it seems to me that the changes are your work, and can be credited to you. If you disagree, or if you didn’t review everything, then I don’t think the PR meets our criteria for LLM use, and should therefore be rejected in any case. Also, I agree with @ichard26 that I’m not sure of the value in fly-by typo fixes, particularly when they don’t act as “practice” for the contributor - which LLM generated ones clearly don’t. And as a final note, I’ll point out that reviewing this PR has probably taken far more time due to the LLM usage than a hand coded contribution would have, so that usage was almost certainly a net loss in productivity for the project at this point. |
pfmoore
left a comment
There was a problem hiding this comment.
The commit message here is needlessly verbose, replicating the changes made in detail. This type of message adds no value and feels typical of the verbosity LLMs introduce. Please fix.
|
I did review the commits carefully - and perhaps asked Copilot to revert or fix some changes. I disagree about the net loss. At some point, we (the project, me, others) have to address the use of LLM, but I totally agree about the verbosity of resulting commits... Is it OK to use |
d254cb8 to
4afffa0
Compare
Not for me. Be prepared to take full personal responsibility for your PR, or don't submit it. You don't say "Co-authored by: VS Code spell checker", do you? I'll let the other pip maintainers add their own views, though. I'm just giving my personal POV. |
|
It's not that I don't want to take full personal responsibility for this PR. It's just that in the context of scientific writing, it would be considered unethical to hide that parts of a scientific paper have been generated using AI. The reason is that AI is a game changer (more than tools like codespell 😄) and we want to avoid non-sensical (for now ?) scientific papers written by (instead of with help from) AIs. But then, seeing AI as a mere tool makes sense too. I will remove the |
|
In the same way that a compiler inserts in object files a comment like |
|
Understood, and I don't want to derail this discussion (as you say, it's a much wider question) but in the context of reproducibility, without the prompts (or even with them, see below) the generated code is literally all anyone has, and therefore what is relevant is:
On that second point, if there's a bug1 it's important to be able to reach out to the original author and ask for the reasoning behind the code. Or if the code is malicious, to review and potentially remove any other code by that author. If the author can say "but that code was generated by an LLM, so I can't explain the reasoning", we have a maintainability problem. To address your other point, I think the position around scientific writing isn't necessarily the same as the position for code contributions - especially with the prevalence in software of "vibe coding", where the submitter may well not have even read the code they are submitting as a PR. But even in science, surely reproducibility of results is crucial? How could a paper that claimed an LLM determined some result based on a set of evidence be credible, if it wasn't possible to reproduce that line of argument by re-running the LLM interaction? And as LLMs change their training data regularly, even just using the same prompts and the same LLM isn't a guarantee of the same results. I'll try to resist the temptation to engage further on this digression. LLM use is something I have relatively strong but incomplete opinions on, and I don't want to dominate the discussions here with my views. So please don't be offended (or assume that I agree 😉) if I ignore any further responses you make. Footnotes
|
4afffa0 to
2029ac6
Compare
Corrects grammar errors, typos, and duplicate words in documentation files.
Requesting to skip news.
Initially created from Copilot CLI via the copilot delegate command.