File count validator #44

LonMcGregor · 2025-12-10T15:45:56Z

Adds a file validator for PR submissions.

The directory files should be changed in is defined in an issue body, in a comment, like so:

<!---
CHANGE_DIR=^Sprint-1/
--->

If it is not present, the bot does not do this check
If it is present, this regexp is checked against every changed file present in a PR submission
If any fail to match, i.e. something in a wrong directory was changed, then the bot issues a warning
If no files are submitted at all, the bot issues a warning
I propose any metadata that needs checked be included in these style of comments, and this metadata will need to be added for every task submission before the bot will be able to perform checks
Currently only supports checking a single directory, but could be extended to have multiple, could also extend to do things like checking max number of committed files or any other metadata visible to PRs

You can see an example of output of this check here: CodeYourFuture/Module-Structuring-and-Testing-Data#873

LonMcGregor · 2025-12-10T16:06:32Z

@illicitonion Here is the validator for checking the right files are committed. Let me know what you think, in particular if I've made any faux-pas in rust, re the matching and error handling. I'm still getting used to that.

illicitonion

Sorry for the delay here! This looks good, but I left a few comments about some general Rust things :)

illicitonion · 2025-12-16T11:07:30Z

src/bin/pr-metadata-validator.rs

    CouldNotMatch,
    BadTitleFormat { reason: String },
    UnknownRegion,
+    WrongFiles { files: String },


I wasn't sure at the call-site whether this was the expected files or the incorrect files:

Suggested change

WrongFiles { files: String },

WrongFiles { expected_files_pattern: String },

illicitonion · 2025-12-16T13:15:32Z

src/bin/pr-metadata-validator.rs

+    let task_issue_body = match task_issue.body {
+        Some(body) => body,
+        None => return Ok(None), // Task is empty, nothing left to check
+    };


This works, but I may consider:

Suggested change

let task_issue_body = match task_issue.body {

Some(body) => body,

None => return Ok(None), // Task is empty, nothing left to check

};

let task_issue_body = task_issue.body.unwrap_or_default();

To end up with an empty string which we can just handle as if the issue body was the empty string.

I'm not sure why issue bodies are Option<String> rather than just String - I can't imagine an issue without a body, but maybe some APIs don't return them or something...

illicitonion · 2025-12-16T13:24:11Z

src/bin/pr-metadata-validator.rs

+    // Get all of the changed files
+    let pr_files_pages = octocrab
+        .pulls(org_name, module_name)
+        .list_files(pr_number)
+        .await
+        .context("Failed to get changed files")?;
+    if pr_files_pages.items.is_empty() {
+        return Ok(Some(ValidationResult::NoFiles)); // no files committed
+    }
+    let pr_files_all = octocrab
+        .all_pages(pr_files_pages)
+        .await
+        .context("Failed to list all changed files")?;
+    let pr_files = pr_files_all.into_iter();


We have a handy util for this in trainee_tracker::octocrab::all_pages - you'll probably need to use it and make it pub instead of pub(crate):

Suggested change

// Get all of the changed files

let pr_files_pages = octocrab

.pulls(org_name, module_name)

.list_files(pr_number)

.await

.context("Failed to get changed files")?;

if pr_files_pages.items.is_empty() {

return Ok(Some(ValidationResult::NoFiles)); // no files committed

}

let pr_files_all = octocrab

.all_pages(pr_files_pages)

.await

.context("Failed to list all changed files")?;

let pr_files = pr_files_all.into_iter();

let pr_files = all_pages("changed files", octocrab, async || {

octocrab

.pulls(org_name, module_name)

.list_files(pr_number)

.await

}).await?;

if pr_files.is_empty() {

return Ok(Some(ValidationResult::NoFiles)); // no files committed

}

illicitonion · 2025-12-16T13:26:14Z

src/bin/pr-metadata-validator.rs

+        Some(body) => body,
+        None => return Ok(None), // Task is empty, nothing left to check
+    };
+    let directory_description = Regex::new("CHANGE_DIR=(.+)\\n").unwrap();


I generally follow a convention that unwrap calls should either:

Have a comment above describing why they can't fail (e.g. // UNWRAP: Statically known good regex), or

Be replaced with an .expect("explanation") (e.g. .expect("Statically known regex failed to parse"))

Same applies below on line 296.

illicitonion · 2025-12-16T13:27:26Z

src/bin/pr-metadata-validator.rs

+        None => return Ok(None), // There is no match defined for this task, don't do any more checks
+    };
+    let directory_matcher = Regex::new(directory_description_regex)
+        .context("Invalid regex for task directory match")?;


I'd add a link to the GitHub issue to this context, as well, so we know where to look/fix:

Suggested change

.context("Invalid regex for task directory match")?;

.with_context(|| format!("Invalid regex for task directory match in issue {}", task_issue.html__url))?;

illicitonion · 2025-12-16T13:29:16Z

src/course.rs


+// Given a vector of sprints, and a target pr number, for a given person
+// return the issue ID for the associated assignment descriptor
+pub fn get_descriptor_id_for_pr(sprints: Vec<SprintWithSubmissions>, target_pr_number: u64) -> u64 {


Rather than defaulting to 0, I'd expect this to either:

Return an Option<u64> so that we can explicitly signal "We couldn't find the descriptor ID"

Return an Option<&Submission>

illicitonion · 2025-12-16T13:36:01Z

src/course.rs


+// Given a vector of sprints, and a target pr number, for a given person
+// return the issue ID for the associated assignment descriptor
+pub fn get_descriptor_id_for_pr(sprints: Vec<SprintWithSubmissions>, target_pr_number: u64) -> u64 {


Because you're just returning a number, you don't need to take ownership of this Vec - and if you do, you shouldn't need to clone the submissions below on line 1030.

In general there are two ways of iterating over things - with .iter() (which doesn't require ownership, but means you may need to copy things if you're returning them), or with .into_iter() (which consumes the value, but avoids the need to copy things).

Here are two different ways to write the code you have at the moment:

Taking ownership - note the use of into_iter() and the lack of clone():

// Given a vector of sprints, and a target pr number, for a given person // return the issue ID for the associated assignment descriptor pub fn get_descriptor_id_for_pr(sprints: Vec<SprintWithSubmissions>, target_pr_number: u64) -> u64 { match sprints .into_iter() .flat_map(|sprint_with_subs| sprint_with_subs.submissions) .filter_map(|missing_or_submission| match missing_or_submission { SubmissionState::Some(s) => Some(s), _ => None, }) .find(|submission| match submission { Submission::PullRequest { pull_request, .. } => pull_request.number == target_pr_number, _ => false, }) { Some(Submission::PullRequest { assignment_descriptor, .. }) => assignment_descriptor, _ => 0, } }

Not taking ownership:

// Given a vector of sprints, and a target pr number, for a given person // return the issue ID for the associated assignment descriptor pub fn get_descriptor_id_for_pr(sprints: &Vec<SprintWithSubmissions>, target_pr_number: u64) -> u64 { match sprints .iter() .flat_map(|sprint_with_subs| sprint_with_subs.submissions.iter()) // Note: We explicitly call `.iter()` here to make sure we're borrowing when iterating over this, rather than taking ownership - by default there are "just iterate this for me" style coercions which will implicitly use `.into_iter()` by default, so we need to be explicit here to avoid the clones) .filter_map(|missing_or_submission| match missing_or_submission { SubmissionState::Some(s) => Some(s), _ => None, }) .find(|submission| match submission { Submission::PullRequest { pull_request, .. } => pull_request.number == target_pr_number, _ => false, }) { Some(Submission::PullRequest { assignment_descriptor, .. }) => *assignment_descriptor, // Note the * - this is equivalent to calling `.clone()` but because `u64` is `Copy` we can just dereference to copy it. This is cheap. Personally I don't love using `*assignment_descriptor` and would prefer `assignment_descriptor.clone()` but the language community doesn't, so... _ => 0, } }

Extending 2 a little bit, we can actually write the signature:

pub fn get_descriptor_id_for_pr(sprints: &[SprintWithSubmissions], target_pr_number: u64) -> u64 {

replacing the &Vec<SprintWithSubmissions> with &[SprintWithSubmissions] because a Vec can be treated like a slice - this is slightly more general, because it allows us to pass more types to this function.

I'd recommend avoiding both taking ownership where not needed, and the extra clones, by doing approach 2 :)

illicitonion · 2025-12-16T13:39:27Z

src/bin/pr-metadata-validator.rs

+    task_issue_number: u64,
+) -> Result<Option<ValidationResult>, Error> {
+    // Get the Sprint Task's description of expected changes
+    let task_issue = match octocrab


match works here (and throughout), but FYI there's also a concept of guard clases - you could also write this:

let Ok(task_issue) = octocrab .issues(org_name, module_name) .get(task_issue_number) .await else { return Ok(Some(ValidationResult::CouldNotMatch)); // Failed to find the right task };

I don't have a strong preference either way here - each can be more clear in different contexts, but it's good to know this is possible.

l added 10 commits December 1, 2025 12:31

count changed files

126792f

basic file match check

e3da750

get metadata from an issue

8c8bdae

expand all pages of files

c331b6f

add message and dynamically get issue

41f0262

move get descriptor logic to course

f9c16c6

Simplify get issue num logic

0a26bb6

cleanup comment

87c7330

revert unchanged file

246a706

resolve linter remarks

5fa207d

LonMcGregor marked this pull request as ready for review December 10, 2025 16:02

rustfmt

40c29bb

illicitonion reviewed Dec 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

File count validator #44

File count validator #44

Uh oh!

LonMcGregor commented Dec 10, 2025 •

edited

Loading

Uh oh!

LonMcGregor commented Dec 10, 2025

Uh oh!

illicitonion left a comment

Uh oh!

illicitonion Dec 16, 2025

Uh oh!

illicitonion Dec 16, 2025

Uh oh!

illicitonion Dec 16, 2025

Uh oh!

illicitonion Dec 16, 2025

Uh oh!

illicitonion Dec 16, 2025

Uh oh!

illicitonion Dec 16, 2025

Uh oh!

illicitonion Dec 16, 2025

Uh oh!

illicitonion Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	WrongFiles { files: String },
	WrongFiles { expected_files_pattern: String },

	.context("Invalid regex for task directory match")?;
	.with_context(\|\| format!("Invalid regex for task directory match in issue {}", task_issue.html__url))?;

Uh oh!

File count validator #44

Are you sure you want to change the base?

File count validator #44

Uh oh!

Conversation

LonMcGregor commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LonMcGregor commented Dec 10, 2025

Uh oh!

illicitonion left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

LonMcGregor commented Dec 10, 2025 •

edited

Loading