Skip to content

GTLLMZoo: A comprehensive framework that aggregates LLM benchmark data from multiple sources with an interactive UI for efficient model comparison, filtering, and evaluation across performance, safety, and efficiency metrics.

Notifications You must be signed in to change notification settings

git-disl/GTLLMZoo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Leaderboard Explorer

An interactive dashboard for exploring and visualizing merged data from LLM leaderboards, built with Gradio. Check it out our deployed HuggingFace Space: Link

📊 Overview

This application provides an interactive interface to view, filter, and compare Large Language Models (LLMs) based on aggregated data from prominent leaderboard sources:

  • LiveBench: Features performance metrics like Global Average, Reasoning, Coding, Mathematics, Data Analysis, Language, and Instruction Following scores.
  • LMSYS Chatbot Arena: Includes community-based Elo ratings (Arena Score), rankings, and voting data.

The dashboard allows users to easily navigate and compare models across various metrics and categories.

✨ Features

  • Interactive Data Tables: View LLM data organized into tabs:
    • Performance Metrics: Core benchmark scores from LiveBench.
    • Model Details: Information like Organization, License, Knowledge Cutoff, and links.
    • Community Stats: Data from the Chatbot Arena Leaderboard (Ranks, Score, Votes).
    • Model Mapping: Shows the unified model name alongside original names from LiveBench and Arena.
  • Filtering: Dynamically filter the displayed models by:
    • Search term (searches Model Name and Organization).
    • Organization.
    • Minimum Global Average score.
  • Detailed Model Card: Click on any row in the data tables to view a comprehensive card summarizing all metrics for that specific model.
  • Visualizations Tab:
    • Bar Chart: Compare the top 15 models based on a user-selected metric (e.g., Global Average, Arena Score, Coding Average).
    • Radar Chart: Select multiple models (up to 5) to compare their performance profile across key metrics (Reasoning, Coding, Math, Data Analysis, Language, IF Average, and scaled Arena Score).

💾 Data

The application uses a pre-merged CSV file (data/merged_leaderboards.csv) containing data aggregated from the sources mentioned above.

🚀 Getting Started

Prerequisites

  • Python 3.9+
  • pip (Python package installer)

Installation

  1. Clone the repository (Optional):

    # If you have the code in a git repository
    git clone <your-repo-url>
    cd <your-repo-directory>

    If you just have the files, navigate to the project directory in your terminal.

  2. Install Dependencies: Create a requirements.txt file with the following content:

    gradio==4.9.0
    pandas
    plotly
    numpy
    

    Then, install the requirements:

    pip install -r requirements.txt

Running the Application

To run the application locally:

python app.py

The application will typically be available at http://127.0.0.1:7860 in your web browser.

📁 Project Structure

GTLLMZoo2
├─ app.py                  # Main Gradio application entry point
├─ requirements.txt        # Python dependencies
├─ data
│  └─ merged_leaderboards.csv # Merged leaderboard data
└─ src
   ├─ data_processing.py  # Data loading and filtering logic
   └─ ui.py               # Gradio UI definition and logic

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request if you have improvements or bug fixes.

📄 License

MIT License

About

GTLLMZoo: A comprehensive framework that aggregates LLM benchmark data from multiple sources with an interactive UI for efficient model comparison, filtering, and evaluation across performance, safety, and efficiency metrics.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages