Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This pull request refactors the backend architecture to separate the Search service into its own microservice with direct HTTP access, replacing PyTorch and transformers with ONNX Runtime for significantly faster cold starts and reduced memory footprint. The changes streamline the architecture by giving the Search service its own ASGI endpoint, eliminating the need to route search requests through the main server gateway.
Changes:
- Separated Search service with its own FastAPI router (SearchFastAPIRouter) and ASGI endpoint, removing search from the server router
- Replaced PyTorch/transformers-based text embedding with ONNX Runtime implementation, reducing dependencies by ~2GB and eliminating 5-8s of import overhead
- Updated frontend configuration and API endpoint URLs to route search requests directly to the Search service instead of through the server
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| frontend/streamlit/config.py | Updated API endpoint URLs to separate server and search base URLs, with distinct endpoints for each service |
| frontend/streamlit/pages/search_demo.py | Updated to use new SEARCH_SEARCH_URL and SERVER_* prefixed URLs matching the microservices architecture |
| backend/tests/integration/test_api_endpoints.py | Removed search endpoint test from server router tests, updated router import from FastAPIRouter to ServerFastAPIRouter |
| backend/shared/images.py | Added ONNX model export function, refactored image definitions to install/uninstall torch strategically, and updated dev image to include ONNX support |
| backend/shared/init.py | Added proper module exports for shared utilities including config and image functions |
| backend/services/search_service.py | Added FastAPI app creation and ASGI endpoint, replaced modal.method with internal implementation callable directly by router |
| backend/services/http_server.py | Removed search_service_cls parameter from create_fastapi_app, updated to use ServerFastAPIRouter |
| backend/search/text_embedder.py | Complete rewrite to use ONNX Runtime and raw tokenizers instead of PyTorch and transformers for text embedding |
| backend/embeddings/init.py | Updated import to reference renamed video_embedder module |
| backend/api/server_fastapi_router.py | Renamed from FastAPIRouter, removed search endpoint and search_service_cls parameter |
| backend/api/search_fastapi_router.py | New router for Search service with /health and /search endpoints |
| backend/api/init.py | Updated exports to include both ServerFastAPIRouter and SearchFastAPIRouter |
| backend/apps/search_app.py | Added enable_memory_snapshot configuration, updated documentation about ONNX optimization |
| backend/apps/dev_combined.py | Updated to create separate DevSearchService ASGI app, removed search_service_cls from server creation |
| backend/apps/init.py | Updated documentation to clarify app entry points structure |
Comments suppressed due to low confidence (1)
backend/shared/images.py:207
- Missing "api" package from local Python sources. The SearchService imports SearchFastAPIRouter from api.search_fastapi_router (line 68 in services/search_service.py), but the "api" package is not included in the add_local_python_source call. This will cause an ImportError when the Search service attempts to create its FastAPI app.
Add "api" to the list of packages in add_local_python_source.
model_name = "openai/clip-vit-base-patch32"
CLIPModel.from_pretrained(model_name)
CLIPProcessor.from_pretrained(model_name, use_fast=True)
def get_processing_image() -> modal.Image:
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This pull request refactors the backend API and search infrastructure to separate the Search service into its own FastAPI router and ASGI app, optimizing for lower latency and faster cold starts. The Search service now bypasses the main server gateway, exposes its own HTTP endpoints, and uses an ONNX-based CLIP text embedder to eliminate heavy dependencies. These changes streamline the architecture, improve development and production performance, and clarify service boundaries.
API and Router Refactoring
ServerFastAPIRouterfor server endpoints andSearchFastAPIRouterfor search endpoints. The search endpoint is removed from the server router and exposed directly by the Search service for lower latency.create_fastapi_appmethod in the server no longer takes a search service class, reflecting the separation of concerns.Search Service Improvements
Development and Deployment Enhancements
dev_combined.py), the Search service exposes its own ASGI app, and memory snapshots are enabled for both the search and processing services to accelerate subsequent cold starts.__init__.pyfiles and comments.Summary of Most Important Changes
API Separation and Routing
ServerFastAPIRouterandSearchFastAPIRouteras distinct routers, removing the search endpoint from the server and exposing it directly via the Search service.Search Service Optimization
Development and Deployment Enhancements