Skip to content

agents: add code-review skills#14873

Draft
Alizter wants to merge 1 commit into
ocaml:mainfrom
Alizter:push-ryxzxyomzzrv
Draft

agents: add code-review skills#14873
Alizter wants to merge 1 commit into
ocaml:mainfrom
Alizter:push-ryxzxyomzzrv

Conversation

@Alizter
Copy link
Copy Markdown
Collaborator

@Alizter Alizter commented Jun 2, 2026

Introduction

Code review is a skill that is difficult to codify well. When we read PRs we can have many things pop into our heads to look out for. The main limiting factors are experience and time.

Often when submitting and reviewing PRs in this repository, there is a bar in quality that we would like to pass. This is important because a reviewer will invest time in understanding a PR to make sure it is doing as claimed.

However quite often it can be difficult to meet that bar without iterating further. This is expensive if iteration is done using other reviewers time and flawed if you partake in self-review due to bias.

Agentic reviews exist to fill a gap in helping the quality of a PR increase without using up maintainer time, where it can be focused on more important aspects of the PR. This is the goal of this PR, to provide the tools necessary to review your own code (or others) using LLMs.

The mode of usage of the prompts here is two-fold. You can point your favourite LLM at these prompts or even read them yourself if so inclined. When an agent such as Claude reads and follows these instructions they will point out things that may have been missed or even encourage you to think more deeply about the problem when it matters. They don't exist to replace reviewers but to compliment their jobs.

Even before submitting a PR, iterating the reviews yourself will help improve the quality of your PR, and in the worst case, nothing will need to be changed. Our goal in dune is to make our users happy with dune, and aiding this particular bottleneck has the potential to pay off.

How it works

If you are using Claude, GitHub copilot (yes as a reviewer) or another agent, it will quickly be able to orient itself to find doc/dev/code-review. This is because it is instructed to do so from the AGENTS.md file. There it will find a file called general.md with instructions on how to carry out a review on what the user has asked.

It does this by first attempting to classify the "kind" of PR it is coming across. In the vast majority of cases this is clear from the title and description. Then according to each "kind" it will attempt a general review strategy together with some "kind"-specific things to look out for.

For example, a PR labelled "refactor" is likely a refactor and therefore should not be changing any tests or functionality. A PR labelled "test" is likely adding a test case, and ideally shouldn't be changing any code. Any user facing feature needs a changelog, etc.

The agent will essentially produce a report containing all the issues that it flagged, afterwhich it will do a review of its review to make sure its pointing out the important points.

The end goal is a report by the agent which can be used by a PR author or reviewer to survey any issues that it noticed, together with a plausible explanation. At which point, it is the job of the reviewer to turn that into amicable feedback for the persons in question or iterate on their own creation.

Further improvement

The current prompts are not perfect. It doesn't necessarily capture what a perfect reviewer would have done and that's because that is not it's job. It's job is to complement a review and aid those who are working on the PR improve the quality of the submission.

Therefore it is likely that prompts can be improved, changed or removed as time goes on as we deem it necessary. And I would encourage people to experiment in this direction together with explanations on why they believe it to be the case.

The current code-review/ "skill" for example has 3 voices inspired from the style of reviewers in this repository. They range from direct and Socratic to giving the feeling of a pair programmer. The net effect of these voices however is effectively nil, in my own testing, they haven't meaningfully differed in content, but only in how the report reads, which is just as important if people are to make use of it.

Reviews by agents are also plausibly likely to be wrong! This is why its important to iterate work with LLMs. They might not give a correct answer the first time 100% of the time, but you can squeeze those numbers if you run it a few times and get it to look itself over.

Closing remarks

So please try it out and give feedback. When this PR is merged it will be easier to do so and should help improve the quality of our work faster.

Signed-off-by: Ali Caglayan <alizter@gmail.com>

## Voice

Pick one voice from `voice/` at random unless the user names a specific one in
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we want this picked at random?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you prefer?

@shonfeder
Copy link
Copy Markdown
Member

shonfeder commented Jun 2, 2026

The net effect of these voices however is effectively nil, in my own testing, they haven't meaningfully differed in content, but only in how the report reads, which is just as important if people are to make use of it.

Why not just have one consistent voice then? This gives me slot machine vibes and looks like unnecessary introduction of noise and context consumption.

This comment was marked as resolved.

@Alizter
Copy link
Copy Markdown
Collaborator Author

Alizter commented Jun 2, 2026

The net effect of these voices however is effectively nil, in my own testing, they haven't meaningfully differed in content, but only in how the report reads, which is just as important if people are to make use of it.

Why not just have one consistent voice then? This gives me slot machine vibes and looks like unnecessary introduction of noise and context consumption.

I'm in favour of doing that, I just don't know which voice people will like better yet or find most useful. I don't agree with the "slot machine vibes" as I didn't see any meaningful difference in content.

Do you think this will be a big issue with these prompts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants