feat: implement REST API for program-model component #142

evandowning · 2025-07-08T14:55:42Z

This PR implements a comprehensive REST API for the program-model component to improve isolation between services.

Changes

Add FastAPI server with endpoints for all CodeQuery operations
Create HTTP client and compatibility layer for seamless migration
Update patcher and seed-gen to use REST API instead of direct execution
Remove codequery/cscope dependencies from Dockerfiles
Move harness discovery functions to program-model component
Add comprehensive unit tests and API documentation
Provide CLI script for starting the API server

Resolves #116

🤖 Generated with Claude Code

TODOs:

Run make test
Run libpng challenge
Confirm modified patcher and seed-gen unit tests make sense.
Merge Build cscope image locally instead of pulling #148 into main and rebase

- Add FastAPI server with comprehensive endpoints for code analysis - Create HTTP client and compatibility layer for seamless migration - Update patcher and seed-gen to use REST API instead of direct execution - Remove codequery/cscope dependencies from Dockerfiles - Move harness discovery functions to program-model component - Add comprehensive unit tests and API documentation - Provide CLI script for starting the API server This change improves isolation between components by replacing direct execution of codequery/cscope tools with HTTP-based communication. Resolves #116 🤖 Generated with [Claude Code](https://claude.ai/code) Co-authored-by: Evan Downing <[email protected]>

reytchison · 2025-07-22T18:06:03Z

seed-gen/src/buttercup/seed_gen/task.py

            fallbacks.append(fallback)
        return llm.with_fallbacks(fallbacks)

    def get_harness_source(self) -> HarnessInfo | None:


seed-gen currently fails to find harnesses:

2025-07-22 17:59:05,389 - buttercup.common.utils - INFO - Getting diffs from /node_data/crs_scratch/ebd1e20c-ebe2-41c1-985d-3b2162a582d4/seedgen-jrt7y991/ebd1e20c-ebe2-41c1-985d-3b2162a582d4/ebd1e20c-ebe2-41c1-985d-3b2162a582 d4-3318dfe0-f58b-4e-7jsg71jl/diff/diff 2025-07-22 17:59:05,389 - buttercup.seed_gen.vuln_base_task - INFO - Doing vuln-discovery for challenge libpng (mode: delta) 2025-07-22 17:59:05,395 - buttercup.seed_gen.vuln_base_task - ERROR - Failed vuln-discovery for challenge libpng: ('No harness found for challenge %s', 'libpng') Traceback (most recent call last): File "/app/seed-gen/.venv/lib/python3.12/site-packages/buttercup/seed_gen/vuln_base_task.py", line 323, in do_task state = self._init_state(out_dir, current_dir) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/app/seed-gen/.venv/lib/python3.12/site-packages/buttercup/seed_gen/vuln_discovery_delta.py", line 96, in _init_state raise ValueError("No harness found for challenge %s", self.package_name) ValueError: ('No harness found for challenge %s', 'libpng')

It appears that all seed-gen tasks are failing because of this

I'm not sure it is worth making harness lookup over the REST API. It is program contextualization but seed-gen is the only component that uses it. Also, the current PR doesn't free seed-gen from the 1 harness lookup dependency (ripgrep) because it allows seed-gen to lookup harnesses without the API.

reytchison · 2025-07-22T18:19:17Z

program-model/src/buttercup/program_model/harness_models.py

+
+
+@dataclass
+class HarnessInfo:


This should be internal to seed-gen because it's seed-gen prompting. (see the str method).

reytchison · 2025-07-22T18:27:03Z

seed-gen/src/buttercup/seed_gen/task.py

 from buttercup.common.llm import ButtercupLLM, create_default_llm, get_langfuse_callbacks
 from buttercup.common.project_yaml import ProjectYaml
-from buttercup.program_model.codequery import CodeQueryPersistent
+from buttercup.program_model.rest_client import CodeQueryPersistentRest as CodeQueryPersistent


Nit: import as CodeQueryPersistentRest instead of renaming to CodeQueryPersistent

reytchison · 2025-07-22T18:54:39Z

seed-gen/src/buttercup/seed_gen/seed_gen_bot.py

            logger.info("Initializing codequery")
            try:
-                codequery = CodeQueryPersistent(challenge_task, work_dir=Path(self.wdir))
+                codequery = CodeQueryPersistentRest(challenge_task, work_dir=Path(self.wdir))


I don't think it makes sense for the client to specify the working directory for the server? The work_dir is passed to the server via a TaskInitRequest, which the server then opens. Shouldn't this be transparent to the client? It also lets the client overwrite arbitrary directories in the server.

reytchison · 2025-07-22T19:21:04Z

program-model/src/buttercup/program_model/api/server.py

+)
+
+# Global storage for CodeQuery instances
+_codequery_instances: Dict[str, CodeQueryPersistent] = {}


I don't think uvicorn shares memory across workers so if there are multiple workers (which can be configured in the CLI) then this will be per worker and each worker will have to maintain its own codequerypersistent instance.

Also this means if there are multiple workers then the behavior of the search methods will depend on whether that worker has initialized a CodeQueryPersistent for that task. If it hasn't, then the search will fail.

Also this will not be shared across the program model replicas (there are 4 in the prod configuration)

reytchison · 2025-07-22T19:25:30Z

program-model/src/buttercup/program_model/api/server.py

+) -> TaskInitResponse:
+    """Initialize a CodeQuery instance for a task."""
+    try:
+        if task_id in _codequery_instances:


Ignores the workdir and challenge_task dir in the request if a CodeQueryPersistent instance already exists for the task ID. I think this another reason for just passing task_id instead of workdir and challenge_task dir.

reytchison · 2025-07-22T19:29:36Z

program-model/src/buttercup/program_model/api/server.py

+        # Initialize CodeQuery
+        logger.info("Creating CodeQueryPersistent...")
+        try:
+            codequery = CodeQueryPersistent(


If a codequery db doesn't exist yet then I think the API server will create the codequery DB, similar to what seed-gen and the patcher did before. If the server is a single worker then this will block lookups for other task IDs, which may be initialized.

reytchison · 2025-07-22T19:36:28Z

deployment/k8s/charts/program-model/templates/api-deployment.yaml

@@ -0,0 +1,46 @@
+{{- if .Values.api.enabled }}


Might need to scale the API for a production deployment? The production deployment has 4 program model replicas and each replica has 1 API pod with 1 uvicorn worker. However the prod deployment has 20 seed-gen workers and 28 patchers. It seems like they could saturate the API workers

reytchison · 2025-07-22T19:39:28Z

program-model/src/buttercup/program_model/api/server.py

+    logger.info("Docker socket available: %s", Path("/var/run/docker.sock").exists())
+
+    # Check if we can access common directories
+    common_dirs = ["/crs_scratch", "/tasks_storage", "/node_data"]


I'm not sure if this check is necessary and these paths should be hardcoded?

reytchison · 2025-07-22T19:43:01Z

program-model/src/buttercup/program_model/api/server.py

+
+
+@app.get("/tasks/{task_id}/container-src-dir")
+async def download_container_src_dir(


This endpoint is unused in the client/elsewhere.

evandowning self-assigned this Jul 8, 2025

evandowning added the contextualizer label Jul 8, 2025

evandowning force-pushed the claude/issue-116-20250708_133034 branch 6 times, most recently from 088ebcb to 587f243 Compare July 10, 2025 15:43

evandowning requested review from ret2libc and reytchison July 10, 2025 19:01

claude bot and others added 2 commits July 15, 2025 18:11

just reformat

7929b29

evandowning force-pushed the claude/issue-116-20250708_133034 branch 2 times, most recently from 865eab2 to 75a7f1b Compare July 15, 2025 19:15

Update patcher unit tests

e06c444

evandowning force-pushed the claude/issue-116-20250708_133034 branch from 75a7f1b to e06c444 Compare July 15, 2025 19:28

evandowning added 2 commits July 18, 2025 15:29

Update deployment

b1abd33

Update challenge.sh with new urls

79ee8b5

evandowning force-pushed the claude/issue-116-20250708_133034 branch from 1dfb5b4 to 79ee8b5 Compare July 18, 2025 15:30

evandowning added 2 commits July 18, 2025 15:42

Update unit tests

bf9d27a

Add harness REST API for seed-gen

8d14be1

evandowning marked this pull request as ready for review July 21, 2025 12:53

evandowning requested a review from michaelbrownuc July 22, 2025 14:04

reytchison reviewed Jul 22, 2025

View reviewed changes



		@app.get("/tasks/{task_id}/container-src-dir")
		async def download_container_src_dir(



		@dataclass
		class HarnessInfo:

feat: implement REST API for program-model component #142

Are you sure you want to change the base?

feat: implement REST API for program-model component #142

Uh oh!

Conversation

evandowning commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

evandowning commented Jul 8, 2025 •

edited

Loading