This repository contains a custom fork of the llm-d-inference-scheduler with modifications to add chat completions preprocessing functionality.
- Original Repository:
https://github.com/llm-d/llm-d-inference-scheduler.git - Fork Location:
/Users/guygirmonsky/llm-d-build/llm-d-inference-scheduler - Custom Image:
ghcr.io/guygir/llm-d-inference-scheduler:latest - Base Commit:
82f0cf2(Makefile fixes #322)
- Docker/Podman
- Go 1.25+
- Python 3.12 development headers
cd /Users/guygirmonsky/llm-d-build/llm-d-inference-scheduler
TARGETARCH=amd64 TARGETOS=linux make image-buildgateway-api-inference-extension/- Local clone of the Gateway API Inference Extensionllm-d-kv-cache-manager/- Local clone of the KV Cache Manager with preprocessing capabilities
Why Local Clones Were Necessary: The upstream repositories had structural differences, missing functionality, and version compatibility issues that prevented successful builds:
- Package Structure Mismatch: Upstream had
pkg/epp/config/loaderbut code expectedpkg/epp/common/config/loader - API Version Issues: Code expected
api/v1alpha2but upstream hadapix/v1alpha2 - Missing Preprocessing Code: Chat completions preprocessing code wasn't available in upstream
- Version Compatibility: Upstream v0.5.1 didn't have required functionality
// Added replace directives to point to local clones:
replace github.com/llm-d/llm-d-kv-cache-manager => ./llm-d-kv-cache-manager
replace sigs.k8s.io/gateway-api-inference-extension => ./gateway-api-inference-extension- Added Python 3.12 development tools and runtime
- Integrated Python dependencies for chat completions preprocessing
- Added CGO environment variables for Python integration
- Modified build process to include Python libraries
- Installed Python 3.12 runtime in final image
- Copied Python wrapper files and dependencies
- Set proper environment variables for Python library discovery
- Fixed plugin registration to use custom plugins
- Removed call to non-existent
runner.RegisterAllPlugins()
- Added chat completions preprocessing import
- Modified
Scorefunction to preprocess requests before cache lookup - Added
preprocessRequestfunction for chat completion handling
- Added same preprocessing functionality as the scorer
- Modified
Pickfunction to use preprocessed prompts for calculations - Ensures consistent preprocessing across all components
-
Python.h not found during build
- Ensure Python 3.12 development headers are installed
- Check that python3.12-devel package is available
-
Import errors during Go build
- Verify local repository clones are present
- Check that go.mod replace directives are correct
- Ensure all required files are copied to the build context
-
Runtime Python errors
- Verify Python 3.12 runtime is installed in the final image
- Check that PYTHONPATH environment variable is set correctly
- Ensure all Python dependencies are properly installed