A production-grade framework for optimizing LLM inference through intelligent test-time compute allocation.
Elamonica implements state-of-the-art test-time compute optimization strategies from recent research, enabling developers to achieve better inference quality through strategic compute allocation.
Community Edition (Open Source)
- Best-of-N sampling with configurable parameters
- Sequential revision strategies
- Beam search optimization
- Command-line interface
- Comprehensive benchmarking tools
Pro Edition
- Adaptive compute allocation
- Process Reward Model (PRM) integration
- Web-based dashboard
- REST API server
- License management system
Enterprise Edition
- White-label customization
- SSO integration
- Custom PRM training pipeline
- Multi-tenant architecture
- Kubernetes operator
Elamonica is built on three core optimization strategies:
- Parallel Strategies: Generate multiple candidate responses and select the best
- Sequential Strategies: Iteratively refine responses through multiple passes
- Search-Based Strategies: Use guided search with reward models
pip install elamonicagit clone https://github.com/AntonioVFranco/elamonica.git
cd elamonica
pip install -e community/from elamonica import OptimizationPipeline
# Initialize pipeline with best-of-N strategy
pipeline = OptimizationPipeline(
model="deepseek-ai/deepseek-r1-distill-qwen-32b",
strategy="best_of_n",
n_samples=5
)
# Optimize inference
result = pipeline.optimize(
prompt="Solve the following problem: ...",
max_compute_budget=100
)
print(result.best_response)- Community Edition: Apache 2.0
- Pro Edition: Commercial License
- Enterprise Edition: Commercial License
See LICENSE for details.
@software{elamonica2025,
title={Elamonica: Test-Time Compute Optimization Framework},
author={Antonio Silva},
year={2025},
url={https://github.com/AntonioVFranco/elamonica}
}Feel free to contact me via email for any needs: contact@antoniovfranco.com
- Documentation: docs.elamonica
- Issues: GitHub Issues
- Discussions: GitHub Discussions