This guide provides everything you need to contribute to the MLPerf Inference Endpoint Benchmarking System.
- Python: 3.12+ (Python 3.12 is recommended for optimal performance)
- Git: Latest version
- Virtual Environment: Python venv or conda
- IDE: VS Code, PyCharm, or your preferred editor
# 1. Fork https://github.com/mlcommons/endpoints on GitHub, then clone your fork
git clone https://github.com/YOUR_USERNAME/endpoints.git
cd endpoints
# 2. Add the upstream repo as a remote
git remote add upstream https://github.com/mlcommons/endpoints.git
# 3. Create virtual environment (Python 3.12+ required)
python3.12 -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 4. Install development dependencies
pip install -e ".[dev,test]"
# 5. Install pre-commit hooks
pre-commit install
# 6. Verify installation
inference-endpoint --version
pytest --versionendpoints/
├── src/inference_endpoint/ # Main package source
│ ├── main.py # Entry point and CLI app
│ ├── exceptions.py # Project-wide exception types
│ ├── async_utils/ # Event loop, ZMQ transport, pub/sub
│ ├── commands/ # CLI command implementations
│ ├── config/ # Configuration and schema management
│ ├── core/ # Core types and orchestration
│ ├── dataset_manager/ # Dataset handling and loading
│ ├── endpoint_client/ # HTTP/ZMQ endpoint communication
│ ├── evaluation/ # Accuracy evaluation and scoring
│ ├── load_generator/ # Load generation and scheduling
│ ├── metrics/ # Performance measurement and reporting
│ ├── openai/ # OpenAI API compatibility
│ ├── plugins/ # Plugin system
│ ├── profiling/ # Performance profiling tools
│ ├── sglang/ # SGLang API adapter
│ ├── testing/ # Test utilities (echo server, etc.)
│ └── utils/ # Common utilities
├── tests/ # Test suite
│ ├── unit/ # Unit tests
│ ├── integration/ # Integration tests
│ ├── performance/ # Performance tests
│ └── datasets/ # Test datasets
├── docs/ # Documentation
├── examples/ # Usage examples
└── scripts/ # Utility scripts
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Run specific test categories
pytest -m unit # Unit tests only
pytest -m integration # Integration tests only
pytest -m performance # Performance tests only (no timeout)
# Run tests in parallel
pytest -n auto
# Run tests with verbose output
pytest -v
# Run specific test file
pytest tests/unit/test_core_types.py
# Run with output to file (recommended)
pytest -v 2>&1 | tee test_results.log- Unit Tests (
tests/unit/): Test individual components in isolation - Integration Tests (
tests/integration/): Test component interactions with real servers - Performance Tests (
tests/performance/): Test performance characteristics (marked with @pytest.mark.performance, no timeout) - Test Datasets (
tests/datasets/): Sample datasets for testing (dummy_1k.jsonl, squad_pruned/)
import pytest
from inference_endpoint.core.types import Query
class TestQuery:
@pytest.mark.unit
def test_query_creation(self):
"""Test creating a basic query."""
query = Query(data={"prompt": "Test", "model": "test-model"})
assert query.data["prompt"] == "Test"
assert query.data["model"] == "test-model"
@pytest.mark.unit
@pytest.mark.asyncio(mode="strict")
async def test_async_operation(self):
"""Test async operations."""
# Your async test here
passThe project uses pre-commit hooks to ensure code quality.
Hooks that run automatically on commit:
- trailing-whitespace, end-of-file-fixer, check-yaml, check-merge-conflict, debug-statements
ruff(lint + autofix) andruff-formatmypytype checkingprettierfor YAML/JSON/Markdown- License header enforcement (Apache 2.0 SPDX header required on all Python files, added by
scripts/add_license_header.py)
Always run pre-commit run --all-files before committing.
# Install hooks (done during setup)
pre-commit install
# Run all hooks on staged files
pre-commit run
# Run all hooks on all files
pre-commit run --all-filesConfiguration: ruff (line-length 88, target Python 3.12), ruff-format (double quotes, space indent).
# Format code with ruff
ruff format src/ tests/
# Check formatting without changing files
ruff format --check src/ tests/# Run ruff linter
ruff check src/ tests/
# Run mypy for type checking
mypy src/
# Run all quality checks
pre-commit run --all-files# Sync your fork with upstream before starting
git fetch upstream
git checkout main
git merge upstream/main
# Create a feature branch on your fork
git checkout -b feature/your-feature-name
# Make changes and test
pytest
pre-commit run --all-files
# Commit changes
git add .
git commit -m "feat: add your feature description"
# Push to your fork and open a PR against mlcommons/endpoints
git push origin feature/your-feature-nameWhen developing a new component:
- Create the component directory in
src/inference_endpoint/ - Add
__init__.pywith component description - Implement the component following the established patterns
- Add tests in the corresponding
tests/unit/directory - Update main package
__init__.pyif needed - Add dependencies to
pyproject.tomlunder[project.dependencies]or[project.optional-dependencies]
- Unit Tests: >90% coverage required
- Integration Tests: Test component interactions
- Performance Tests: Ensure no performance regressions
- Documentation: Update docs for new features
- Code Comments: Add comments only where the why is not obvious from the code; avoid restating what the code does
- README Updates: Update README.md for user-facing changes
- Examples: Provide usage examples for new features
- Async First: Use async/await for I/O operations
- Memory Efficiency: Minimize object creation in hot paths
- Profiling: Use pytest-benchmark for performance testing
- Monitoring: Add performance metrics for critical operations
# Run performance tests
pytest -m performance
# Run benchmarks
pytest --benchmark-only
# Compare with previous runs
pytest --benchmark-compare- Import Errors: Ensure
src/is in Python path - Test Failures: Check test data and mock objects
- Performance Issues: Use profiling tools to identify bottlenecks
- Async Issues: Ensure proper event loop handling
# Run with debug logging
inference-endpoint --verbose
# Run tests with debug output
pytest -s -v
# Use Python debugger
python -m pdb -m pytest test_file.pyConfig templates in src/inference_endpoint/config/templates/ are auto-generated from schema defaults. When you change config/schema.py, regenerate them:
python scripts/regenerate_templates.pyThe pre-commit hook auto-regenerates templates when schema.py, config.py, or regenerate_templates.py change. CI validates templates are up to date via --check mode.
Two variants are generated per mode (offline, online, concurrency):
_template.yaml— minimal: only required fields + placeholders_template_full.yaml— all fields with schema defaults + inline# options:comments
Add dependencies to pyproject.toml (always pin to exact versions with ==):
- Runtime dependencies:
[project.dependencies] - Optional groups (dev, test, etc.):
[project.optional-dependencies]
Install after updating:
pip install -e ".[dev,test]"Pre-commit hooks failing:
# Update pre-commit
pre-commit autoupdate
# Skip hooks temporarily
git commit --no-verifyTests failing:
# Clear Python cache
find . -type d -name "__pycache__" -delete
find . -type f -name "*.pyc" -delete
# Reinstall package
pip install -e .Import errors:
# Check Python path
python -c "import sys; print(sys.path)"
# Ensure src is in path
export PYTHONPATH="${PYTHONPATH}:$(pwd)/src"- Fork
mlcommons/endpointson GitHub - Clone your fork and add
upstreamas a remote (see Development Environment Setup) - Sync with upstream (
git fetch upstream && git merge upstream/main) before starting work - Create a feature branch on your fork (
git checkout -b feature/your-feature-name) - Make your changes following the coding standards
- Add tests for new functionality
- Update documentation as needed
- Run all checks locally:
pytestandpre-commit run --all-files - Push to your fork and open a PR against
mlcommons/endpoints:main - Address review comments promptly
Use conventional commit format:
type(scope): description
feat(core): add query lifecycle management
fix(api): resolve endpoint connection issue
docs(readme): update installation instructions
test(loadgen): add performance benchmarks
Allowed types: feat, fix, docs, test, chore, refactor, perf, ci.
- Code follows style guidelines
- Tests pass and coverage is adequate
- Documentation is updated
- Performance impact is considered
- Security implications are reviewed
- Error handling is appropriate
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Check this guide and project docs
- Team: Reach out to the development team