Skip to content

sattensil/Hyperbolic

Repository files navigation

Hyperbolic Model Comparison Tool

A command-line tool for comparing different LLM models hosted on the Hyperbolic API using MMLU benchmarks.

Features

  • Compare two models on MMLU benchmarks with customizable test parameters
  • Measure comprehensive metrics:
    • Speed: Time to first token, total latency, tokens per second
    • Accuracy: MMLU score based on correct answers
    • Quality: Consistency and text similarity scores
    • Cost: Token usage costs and cost-performance ratio
  • Test model consistency with efficient text similarity metrics
  • Generate detailed side-by-side comparison reports
  • Robust error handling with exponential backoff for API rate limits

Installation

pip install -e .

Usage

python hypercompare.py "deepseek-ai/DeepSeek-V3-0324" "Qwen/QwQ-32B"

Optional Parameters

  • -s, --subjects: Number of subjects to test (default: 2)
  • -q, --questions: Questions per subject (default: 3)
  • -p, --prompts: Number of prompts for consistency testing (default: 2)
  • -r, --runs: Number of runs per prompt for consistency (default: 3)
  • --rate-limit-delay: Delay between API calls to avoid rate limiting (default: 1.0)
  • --max-retries: Maximum number of retries for API calls (default: 3)
  • -v, --verbose: Enable verbose output

Example with Custom Parameters

python hypercompare.py "deepseek-ai/DeepSeek-V3-0324" "Qwen/QwQ-32B" -s 1 -q 1 -p 1 -r 1 --rate-limit-delay 2.0

Requirements

  • Python 3.10+
  • Dependencies (install via pip):
    • openai
    • python-dotenv
    • numpy
    • difflib

Setup

  1. Clone this repository
  2. Create a .env file in the project root with your Hyperbolic API key:
    HYPERBOLIC_API_KEY=your_api_key_here
    
  3. Install dependencies:
    pip install -r requirements.txt

Output

The tool provides a detailed comparison summary including:

  • Speed metrics (time to first token, total latency, tokens/sec)
  • Accuracy metrics (MMLU score, quality assessment)
  • Cost analysis (token costs, cost-performance ratio)

Results are also saved to a JSON file for further analysis.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages