Skip to content

CLI-70: Microservices#67

Merged
eshaan-mehta merged 7 commits intostagingfrom
microservices
Jan 26, 2026
Merged

CLI-70: Microservices#67
eshaan-mehta merged 7 commits intostagingfrom
microservices

Conversation

@eshaan-mehta
Copy link
Contributor

This pull request refactors the backend API and search infrastructure to separate the Search service into its own FastAPI router and ASGI app, optimizing for lower latency and faster cold starts. The Search service now bypasses the main server gateway, exposes its own HTTP endpoints, and uses an ONNX-based CLIP text embedder to eliminate heavy dependencies. These changes streamline the architecture, improve development and production performance, and clarify service boundaries.

API and Router Refactoring

  • The main server and search services now have separate FastAPI routers: ServerFastAPIRouter for server endpoints and SearchFastAPIRouter for search endpoints. The search endpoint is removed from the server router and exposed directly by the Search service for lower latency.
  • The create_fastapi_app method in the server no longer takes a search service class, reflecting the separation of concerns.

Search Service Improvements

  • The Search service now provides its own ASGI app and FastAPI router, enabling direct HTTP access and bypassing the server gateway for search requests.
  • The Search service uses an ONNX-based CLIP text embedder, removing the need for PyTorch and transformers, significantly reducing cold start times and memory usage. The embedder uses the raw tokenizers library and ONNX Runtime for efficient inference.

Development and Deployment Enhancements

  • In development mode (dev_combined.py), the Search service exposes its own ASGI app, and memory snapshots are enabled for both the search and processing services to accelerate subsequent cold starts.
  • Documentation and file organization are improved for clarity, including updates to package __init__.py files and comments.

Summary of Most Important Changes

  • API Separation and Routing

    • Added ServerFastAPIRouter and SearchFastAPIRouter as distinct routers, removing the search endpoint from the server and exposing it directly via the Search service.
    • Updated server and app initialization to reflect the new separation, removing references to the search service in the server router and app creation.
  • Search Service Optimization

    • Replaced the PyTorch/transformers-based text embedder with a lightweight ONNX-based implementation using raw tokenizers and ONNX Runtime, greatly reducing cold start time and memory requirements.
    • Search service now exposes its own ASGI app for direct, low-latency HTTP access, bypassing the main server gateway.
  • Development and Deployment Enhancements

    • Enabled memory snapshots for the Search and Processing services in development and production to speed up subsequent cold starts.
    • Improved documentation and clarified the purpose of each app and router in the codebase.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request refactors the backend architecture to separate the Search service into its own microservice with direct HTTP access, replacing PyTorch and transformers with ONNX Runtime for significantly faster cold starts and reduced memory footprint. The changes streamline the architecture by giving the Search service its own ASGI endpoint, eliminating the need to route search requests through the main server gateway.

Changes:

  • Separated Search service with its own FastAPI router (SearchFastAPIRouter) and ASGI endpoint, removing search from the server router
  • Replaced PyTorch/transformers-based text embedding with ONNX Runtime implementation, reducing dependencies by ~2GB and eliminating 5-8s of import overhead
  • Updated frontend configuration and API endpoint URLs to route search requests directly to the Search service instead of through the server

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
frontend/streamlit/config.py Updated API endpoint URLs to separate server and search base URLs, with distinct endpoints for each service
frontend/streamlit/pages/search_demo.py Updated to use new SEARCH_SEARCH_URL and SERVER_* prefixed URLs matching the microservices architecture
backend/tests/integration/test_api_endpoints.py Removed search endpoint test from server router tests, updated router import from FastAPIRouter to ServerFastAPIRouter
backend/shared/images.py Added ONNX model export function, refactored image definitions to install/uninstall torch strategically, and updated dev image to include ONNX support
backend/shared/init.py Added proper module exports for shared utilities including config and image functions
backend/services/search_service.py Added FastAPI app creation and ASGI endpoint, replaced modal.method with internal implementation callable directly by router
backend/services/http_server.py Removed search_service_cls parameter from create_fastapi_app, updated to use ServerFastAPIRouter
backend/search/text_embedder.py Complete rewrite to use ONNX Runtime and raw tokenizers instead of PyTorch and transformers for text embedding
backend/embeddings/init.py Updated import to reference renamed video_embedder module
backend/api/server_fastapi_router.py Renamed from FastAPIRouter, removed search endpoint and search_service_cls parameter
backend/api/search_fastapi_router.py New router for Search service with /health and /search endpoints
backend/api/init.py Updated exports to include both ServerFastAPIRouter and SearchFastAPIRouter
backend/apps/search_app.py Added enable_memory_snapshot configuration, updated documentation about ONNX optimization
backend/apps/dev_combined.py Updated to create separate DevSearchService ASGI app, removed search_service_cls from server creation
backend/apps/init.py Updated documentation to clarify app entry points structure
Comments suppressed due to low confidence (1)

backend/shared/images.py:207

  • Missing "api" package from local Python sources. The SearchService imports SearchFastAPIRouter from api.search_fastapi_router (line 68 in services/search_service.py), but the "api" package is not included in the add_local_python_source call. This will cause an ImportError when the Search service attempts to create its FastAPI app.

Add "api" to the list of packages in add_local_python_source.

    model_name = "openai/clip-vit-base-patch32"
    CLIPModel.from_pretrained(model_name)
    CLIPProcessor.from_pretrained(model_name, use_fast=True)


def get_processing_image() -> modal.Image:

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@eshaan-mehta eshaan-mehta changed the title Microservices CLI-70: Microservices Jan 26, 2026
@eshaan-mehta eshaan-mehta merged commit 4d4f5f3 into staging Jan 26, 2026
7 checks passed
@eshaan-mehta eshaan-mehta deleted the microservices branch January 26, 2026 15:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant