K-Means Clustering Algorithm (With Constraints) by ChloeW125 · Pull Request #60 · uwblueprint/food4kids

ChloeW125 · 2025-11-17T02:24:37Z

JIRA ticket link

https://f4kblueprint.atlassian.net/jira/software/projects/F4KRP/boards/1?selectedIssue=F4KRP-105
Clustering

Implementation description

K-means clustering algorithm implementation
Includes handling for optional max locations per cluster and max boxes per cluster constraints
Note: it was determined that timeout does not need to be enforced right now

Steps to test

I (with Chat's help for formatting) made a testing file (called k_means_test.py), if you go to lines 75-77 you'll see you can modify 3 params related to the algorithm. Feel free to play with them to test out how the algorithm performs in different situations! (And then you can run it via: docker-compose exec backend python -m app.services.implementations.k_means_test)

What should reviewers focus on?

Implementation (Does it make sense? Any edge cases not yet covered?)
NOTE 1: Linter is currently mad because the Clustering Protocol signature does not match that of the currently-implemented algorithm (current algorithm includes max-boxes-per-cluster handling, but the protocol doesn't YET (changes pending))
NOTE 2: For now, we can assume that at most one of the constraints (i.e. one of max boxes and max locations per cluster) will be applied when the algorithm is called - that is, there should not be a situation where the user calls the algorithm with both max boxes and max locations per cluster constraints

Checklist

My PR name is descriptive and in imperative tense
My commit messages are descriptive and in imperative tense. My commits are atomic and trivial commits are squashed or fixup'd into non-trivial commits
I have requested a review from the PL, as well as other devs who have background knowledge on this PR or who will be building on top of this PR

…r-cluster handling

Copilot

Pull Request Overview

This PR implements a K-Means clustering algorithm with support for optional constraints on maximum locations and boxes per cluster. The implementation uses scikit-learn's KMeans with a greedy assignment strategy to handle constraints when specified.

Key Changes:

Implements K-Means clustering with constraint handling for max locations/boxes per cluster
Adds numpy, scikit-learn, and scikit-learn-extra dependencies with pinned versions
Includes a test script to verify clustering functionality with real database locations

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
backend/python/requirements.txt	Pins versions for numpy (1.26.4), scikit-learn (1.3.2), and adds scikit-learn-extra (0.2.0)
backend/python/app/services/implementations/k_means_test.py	Adds test script for K-Means clustering with database location queries
backend/python/app/services/implementations/k_means_clustering_algorithm.py	Implements KMeans clustering with constraint handling via greedy assignment algorithm

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

backend/python/app/services/implementations/k_means_test.py

backend/python/app/services/implementations/k_means_clustering_algorithm.py

ludavidca · 2025-11-21T00:42:16Z

Lint errors are kinda Ruff

ludavidca

Things ran on my machine and looked fine, great work! Only a few small nits and I will approve the PR. I also added a method to plot the output on seaborn

backend/python/app/services/implementations/k_means_clustering_algorithm.py

ludavidca · 2025-11-21T01:15:58Z

backend/python/app/services/implementations/k_means_clustering_algorithm.py

+
+        # If no locations to cluster, return empty list
+        if not locations:
+            return [[] for _ in range(num_clusters)]


Similar to above, not sure if we want to display k means if there are no locations lol

Asked Colin and the consensus was to just leave it like this and let the route gen algo do the empty locations list error handling

ludavidca · 2025-11-21T01:23:10Z

backend/python/app/services/implementations/k_means_test.py

+        statement = (
+            select(Location)
+            .where(Location.latitude is not None, Location.longitude is not None)
+            .limit(20)


Make sure we can edit the number 20 in test file (ie have them be capitalized parameter variables at the top of the file)

Yeah i think constants at the top might be nice

backend/python/app/services/implementations/k_means_test.py

backend/python/app/services/implementations/k_means_clustering_algorithm.py

ColinToft

Good stuff, LGTM!!

backend/python/app/services/implementations/k_means_clustering_algorithm.py

ColinToft · 2025-11-21T01:44:21Z

backend/python/app/services/implementations/k_means_clustering_algorithm.py

+            max_locations_per_cluster: Maximum number of locations
+                per cluster. Validates that the clustering is
+                possible and raises an error if violated.
+            timeout_seconds: Not enforced in this


We should change this to say that we raise an error. Sorry I realize I probably wasn't super clear in the convo before, I def agree that we don't need to worry that much about timeout which is why we should just raise an error instead of trying to have a quicker fallback approach (this is pretty quick anyway)

ColinToft · 2025-11-23T19:24:27Z

backend/python/app/services/implementations/k_means_clustering_algorithm.py

+
+            if total_locations > max_possible:
+                raise ValueError(
+                    "Max locations per cluster + number of clusters clustering parameters cannot be simultaneously satisfied"


W edge case

ColinToft · 2025-11-24T01:18:23Z

backend/python/app/services/implementations/k_means_test.py

+        statement = (
+            select(Location)
+            .where(Location.latitude is not None, Location.longitude is not None)
+            .limit(20)


Yeah i think constants at the top might be nice

backend/python/app/services/implementations/k_means_test.py

ludavidca

LGTM

ChloeW125 added 5 commits November 16, 2025 12:36

k means clustering implementation

35df617

k means algorithm with optional max-location-per-cluster handling

05354e9

adjustments to k means clustering algorithm to allow for max-boxes-pe…

a9e7092

…r-cluster handling

k means clustering algorithm debugging

f5bb2fd

formatting and linting

e206bd5

ChloeW125 requested review from ColinToft and ludavidca November 17, 2025 02:24

ColinToft requested a review from Copilot November 17, 2025 02:47

Copilot AI reviewed Nov 17, 2025

View reviewed changes

Partial linting fix

bf54a03

Continue resolving linting errors

35cfd0b

ludavidca reviewed Nov 21, 2025

View reviewed changes

ludavidca added 2 commits November 20, 2025 20:30

Added ability to get a nice looking graphs (we love graphs)

eb9028c

fix lint

efe614e

ColinToft approved these changes Nov 24, 2025

View reviewed changes

ChloeW125 added 4 commits November 23, 2025 21:03

Implemented review comments

57c2925

Merge branch 'main' into clustering-algorithm

3a210be

fix linting errors

8928b02

linter fix (please)

c882e02

ludavidca self-requested a review November 28, 2025 00:50

ChloeW125 merged commit 542b5b1 into main Nov 28, 2025
2 checks passed

ludavidca reviewed Nov 28, 2025

View reviewed changes

Conversation

ChloeW125 commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

JIRA ticket link

Implementation description

Steps to test

What should reviewers focus on?

Checklist

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ludavidca commented Nov 21, 2025

Uh oh!

ludavidca left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ludavidca Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ChloeW125 Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

ludavidca Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ColinToft Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ColinToft left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ColinToft Nov 21, 2025

Choose a reason for hiding this comment

Uh oh!

ColinToft Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

ColinToft Nov 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ludavidca left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ChloeW125 commented Nov 17, 2025 •

edited

Loading