Thank you for your interest in contributing to LegalEvalHub!
You can contribute in three primary ways:
- Submitting a new task
- Submitting an evaluation run for an existing task
- Submitting a new leaderboard
All contributions are made via pull requests to this GitHub repository. If you would like to contribute via alternative means, please reach out to nguha@cs.stanford.edu.
To add a new evaluation task to LegalEvalHub:
- Create a JSON file at
tasks/<task_id>.json - Follow the format below.
- Open a pull request with a description of the task.
The task JSON file should have the following format:
{
"task_id": "your_task_id",
"name": "Your Task Name",
"family": "LegalBench",
"description": "Brief description of what the task evaluates",
"dataset_url": "https://example.com/your_dataset.csv",
"num_samples": 500,
"tags": ["contract law", "interpretation"],
"document_type": "contract clause",
"min_input_length": 100,
"max_input_length": 1000,
"metrics": [
{"name": "accuracy", "direction": "maximize"},
{"name": "f1_macro", "direction": "maximize"},
{"name": "balanced_accuracy", "direction": "maximize"}
],
"task_type": "Binary classification",
"legal_reasoning_type": "Interpretation",
"contributed_by_name": "Your Name",
"contributed_by_email": "your.name@institution.edu",
"paper_url": "https://example.com/paper.pdf",
"paper_title": "Paper Title",
"paper_authors": ["Author 1", "Author 2"],
"source": "Dataset source",
"license": "CC BY 4.0"
}task_id: Unique identifier for the task (use lowercase with underscores)name: Human-readable task namefamily: Task family or benchmark suite (typically "LegalBench")description: Clear description of what the task evaluatesdataset_url: URL where the dataset can be accessednum_samples: Total number of examples in the datasettags: List of descriptive tags (avoid generic tags like "classification")- Examples: "contract law", "interpretation", "rule application", "tax law"
document_type: Type of legal document analyzed- Examples: "contract clause", "statute", "judicial opinion", "privacy policy"
min_input_length&max_input_length: Token counts (using LLaMA tokenizer)metrics: List of evaluation metrics with direction- Common metrics: accuracy, f1_macro, f1_micro, balanced_accuracy
- Direction: "maximize" for most metrics, "minimize" for error metrics
task_type: Classification type- Options: "Binary classification", "Multiclass classification", "Text generation", "Numeric prediction"
legal_reasoning_type: Type of legal reasoning required- Examples: "Interpretation", "Rule application", "Issue spotting"
contributed_by_name&contributed_by_email: Your contact informationpaper_url,paper_title,paper_authors: Associated research papersource: Original dataset sourcelicense: Data license (e.g., "CC BY 4.0")
To submit an evaluation run for an existing task:
- Create a JSON file at
eval_runs/<task_id>/<submission_id>.json - Follow the format below.
- Open a pull request with a description of the evaluation run.
The evaluation run JSON file should have the following format:
{
"submission_id": "run_2025_07_03_123456_abc123",
"task_id": "hearsay",
"model_name": "gpt-4",
"prompt_id": "base",
"submitter": "Your Name",
"submission_time": "2025-07-03T12:34:56Z",
"metrics": {
"accuracy": 0.87,
"balanced_accuracy": 0.85,
"f1_macro": 0.86,
"f1_micro": 0.87,
"valid_predictions_ratio": 1.0,
"n_samples": 100
},
"predictions_url": "https://storage.googleapis.com/legal-eval-runs/predictions/hearsay/gpt-4/base/run_2025_07_03_123456_abc123_predictions.json"
}submission_id: Unique identifier (format:run_YYYY_MM_DD_HHMMSS_RANDOM)task_id: Must match an existing task IDmodel_name: Name and version of the model evaluatedprompt_id: Identifier for the prompt template used (typically "base")submitter: Your name or organizationsubmission_time: ISO 8601 timestampmetrics: Dictionary containing all evaluation metrics- Must include all metrics specified in the task definition
- Include
n_samplesto show how many examples were evaluated - Include
valid_predictions_ratioto show prediction quality
predictions_url: URL to hosted predictions file (optional but recommended)
To add a new aggregate leaderboard to LegalEvalHub, open web/task_presets.json and add your new leaderboard configuration:
{
"presets": {
"your_leaderboard_id": {
"name": "Your Leaderboard Name",
"description": "A clear description of what this leaderboard evaluates and why these tasks are grouped together.",
"tasks": [
"task_id_1",
"task_id_2",
"task_id_3"
]
}
}
}