Name	Name	Last commit message	Last commit date
parent directory ..
dataset	dataset
metrics	metrics
README.md	README.md
main.py	main.py

Name

Last commit message

Last commit date

Evaluation

This directory contains the evaluation code to reproduce the results from the SAM-Audio paper. The evaluation framework supports multiple datasets, prompting modes (text-only, span, visual), and metrics.

Setup

Before running evaluation, ensure you have:

Installed the SAM-Audio package and its dependencies
Authenticated with Hugging Face to access the model checkpoints (see main README)

Quick Start

Run evaluation on the default setting (instr-pro):

python main.py

You can also use multiple GPUs to speed up evaluation:

torchrun --nproc_per_node=<ngpus> python main.py

Evaluate on a specific setting:

python main.py --setting sfx

Evaluate on multiple settings:

python main.py --setting sfx speech music

Available Evaluation Settings

Run python main.py --help to see all available settings

Command Line Options

python main.py [OPTIONS]

Options:

-s, --setting - Which setting(s) to evaluate (default: instr-pro)
- Choices: See available settings above
- Can specify multiple settings: --setting sfx speech music
--cache-path - Where to cache downloaded datasets (default: ~/.cache/sam_audio)
-p, --checkpoint-path - Model checkpoint to evaluate (default: facebook/sam-audio-1b)
- Can use local path or Hugging Face model ID
-b, --batch-size - Batch size for evaluation (default: 1)
-w, --num-workers - Number of data loading workers (default: 4)
-c, --candidates - Number of reranking candidates (default: 8)

Evaluation Metrics

The evaluation framework computes the following metrics:

Judge - SAM Audio Judge quality assessment metric
Aesthetic - Aesthetic quality metric
CLAP - Audio-text alignment metric (CLAP similarity)
ImageBind - Audio-video alignment metric (for visual settings only)

Output

Results are saved to the results/ directory as JSON files, one per setting:

results/
├── sfx.json
├── speech.json
└── music.json

Each JSON file contains the averaged metric scores across all samples in that setting.

Example output:

{
    "JudgeOverall": "4.386",
    "JudgeFaithfulness": "4.708",
    "JudgeRecall": "4.934",
    "JudgePrecision": "4.451",
    "ContentEnjoyment": "5.296",
    "ContentUsefulness": "6.903",
    "ProductionComplexity": "4.301",
    "ProductionQuality": "7.100",
    "CLAPSimilarity": "0.271"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Evaluation

Setup

Quick Start

Available Evaluation Settings

Command Line Options

Options:

Evaluation Metrics

Output

FilesExpand file tree

eval

Directory actions

More options

Directory actions

More options

Latest commit

History

eval

Folders and files

parent directory

README.md

Evaluation

Setup

Quick Start

Available Evaluation Settings

Command Line Options

Options:

Evaluation Metrics

Output