Skip to content

Conversation

@evandowning
Copy link
Collaborator

@evandowning evandowning commented Jul 8, 2025

This PR implements a comprehensive REST API for the program-model component to improve isolation between services.

Changes

  • Add FastAPI server with endpoints for all CodeQuery operations
  • Create HTTP client and compatibility layer for seamless migration
  • Update patcher and seed-gen to use REST API instead of direct execution
  • Remove codequery/cscope dependencies from Dockerfiles
  • Move harness discovery functions to program-model component
  • Add comprehensive unit tests and API documentation
  • Provide CLI script for starting the API server

Resolves #116

🤖 Generated with Claude Code


TODOs:

@evandowning evandowning self-assigned this Jul 8, 2025
@evandowning evandowning force-pushed the claude/issue-116-20250708_133034 branch 6 times, most recently from 088ebcb to 587f243 Compare July 10, 2025 15:43
claude bot and others added 2 commits July 15, 2025 18:11
- Add FastAPI server with comprehensive endpoints for code analysis
- Create HTTP client and compatibility layer for seamless migration
- Update patcher and seed-gen to use REST API instead of direct execution
- Remove codequery/cscope dependencies from Dockerfiles
- Move harness discovery functions to program-model component
- Add comprehensive unit tests and API documentation
- Provide CLI script for starting the API server

This change improves isolation between components by replacing direct
execution of codequery/cscope tools with HTTP-based communication.

Resolves #116

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-authored-by: Evan Downing <[email protected]>
@evandowning evandowning force-pushed the claude/issue-116-20250708_133034 branch 2 times, most recently from 865eab2 to 75a7f1b Compare July 15, 2025 19:15
@evandowning evandowning force-pushed the claude/issue-116-20250708_133034 branch from 75a7f1b to e06c444 Compare July 15, 2025 19:28
@evandowning evandowning force-pushed the claude/issue-116-20250708_133034 branch from 1dfb5b4 to 79ee8b5 Compare July 18, 2025 15:30
@evandowning evandowning marked this pull request as ready for review July 21, 2025 12:53
fallbacks.append(fallback)
return llm.with_fallbacks(fallbacks)

def get_harness_source(self) -> HarnessInfo | None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seed-gen currently fails to find harnesses:

2025-07-22 17:59:05,389 - buttercup.common.utils - INFO - Getting diffs from /node_data/crs_scratch/ebd1e20c-ebe2-41c1-985d-3b2162a582d4/seedgen-jrt7y991/ebd1e20c-ebe2-41c1-985d-3b2162a582d4/ebd1e20c-ebe2-41c1-985d-3b2162a582
d4-3318dfe0-f58b-4e-7jsg71jl/diff/diff                                                                                                                                                                                           
2025-07-22 17:59:05,389 - buttercup.seed_gen.vuln_base_task - INFO - Doing vuln-discovery for challenge libpng (mode: delta)                                                                                                     
2025-07-22 17:59:05,395 - buttercup.seed_gen.vuln_base_task - ERROR - Failed vuln-discovery for challenge libpng: ('No harness found for challenge %s', 'libpng')                                                                
Traceback (most recent call last):                                                                                                                                                                                               
  File "/app/seed-gen/.venv/lib/python3.12/site-packages/buttercup/seed_gen/vuln_base_task.py", line 323, in do_task                                                                                                             
    state = self._init_state(out_dir, current_dir)                                                                                                                                                                               
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                               
  File "/app/seed-gen/.venv/lib/python3.12/site-packages/buttercup/seed_gen/vuln_discovery_delta.py", line 96, in _init_state                                                                                                    
    raise ValueError("No harness found for challenge %s", self.package_name)                                                                                                                                                     
ValueError: ('No harness found for challenge %s', 'libpng')   

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that all seed-gen tasks are failing because of this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure it is worth making harness lookup over the REST API. It is program contextualization but seed-gen is the only component that uses it. Also, the current PR doesn't free seed-gen from the 1 harness lookup dependency (ripgrep) because it allows seed-gen to lookup harnesses without the API.



@dataclass
class HarnessInfo:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be internal to seed-gen because it's seed-gen prompting. (see the str method).

from buttercup.common.llm import ButtercupLLM, create_default_llm, get_langfuse_callbacks
from buttercup.common.project_yaml import ProjectYaml
from buttercup.program_model.codequery import CodeQueryPersistent
from buttercup.program_model.rest_client import CodeQueryPersistentRest as CodeQueryPersistent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: import as CodeQueryPersistentRest instead of renaming to CodeQueryPersistent

logger.info("Initializing codequery")
try:
codequery = CodeQueryPersistent(challenge_task, work_dir=Path(self.wdir))
codequery = CodeQueryPersistentRest(challenge_task, work_dir=Path(self.wdir))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it makes sense for the client to specify the working directory for the server? The work_dir is passed to the server via a TaskInitRequest, which the server then opens. Shouldn't this be transparent to the client? It also lets the client overwrite arbitrary directories in the server.

)

# Global storage for CodeQuery instances
_codequery_instances: Dict[str, CodeQueryPersistent] = {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think uvicorn shares memory across workers so if there are multiple workers (which can be configured in the CLI) then this will be per worker and each worker will have to maintain its own codequerypersistent instance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this means if there are multiple workers then the behavior of the search methods will depend on whether that worker has initialized a CodeQueryPersistent for that task. If it hasn't, then the search will fail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this will not be shared across the program model replicas (there are 4 in the prod configuration)

) -> TaskInitResponse:
"""Initialize a CodeQuery instance for a task."""
try:
if task_id in _codequery_instances:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignores the workdir and challenge_task dir in the request if a CodeQueryPersistent instance already exists for the task ID. I think this another reason for just passing task_id instead of workdir and challenge_task dir.

# Initialize CodeQuery
logger.info("Creating CodeQueryPersistent...")
try:
codequery = CodeQueryPersistent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a codequery db doesn't exist yet then I think the API server will create the codequery DB, similar to what seed-gen and the patcher did before. If the server is a single worker then this will block lookups for other task IDs, which may be initialized.

@@ -0,0 +1,46 @@
{{- if .Values.api.enabled }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to scale the API for a production deployment? The production deployment has 4 program model replicas and each replica has 1 API pod with 1 uvicorn worker. However the prod deployment has 20 seed-gen workers and 28 patchers. It seems like they could saturate the API workers

logger.info("Docker socket available: %s", Path("/var/run/docker.sock").exists())

# Check if we can access common directories
common_dirs = ["/crs_scratch", "/tasks_storage", "/node_data"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this check is necessary and these paths should be hardcoded?



@app.get("/tasks/{task_id}/container-src-dir")
async def download_container_src_dir(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This endpoint is unused in the client/elsewhere.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create REST api interface to program-model

3 participants