Skip to content

Commit d194cbf

Browse files
committed
Add max_tokens support
1 parent b5d38c4 commit d194cbf

33 files changed

+6323
-15
lines changed

CLAUDE.md

Lines changed: 25 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,27 @@
1-
Mark tests requiring GPUs with `@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available")`.
1+
Always test your changes by running the appropriate script or CLI command.
2+
3+
## Project Structure and Conventions
4+
5+
Keep __main__.py clean - it should primarily provide documentation and routing for the available CLI commands and their configs.
6+
7+
Consider writing a new library file if you add a standalone, complex feature used in more than one place.
8+
9+
When you write a script that launches a CLI command via a subprocess, print the CLI command so it can be easily reproduced.
10+
11+
Use dataclasses for config, and use simple_parsing to parse the CLI configs dataclasses. Never call a config class `cfg`, always something specific like foo_cfg, e.g. run_cfg/RunConfig. Arguments should use underscores and not dashes like `--example_arg`.
12+
13+
Never save logs, scripts, and other random development into the root of a project. Create an appropriate directory such as runs/ or scripts/ and add it to the .gitignore.
14+
15+
# Development
16+
17+
You can call CLI commands without prefixing `python -m`, like `bergson build`.
218

319
Use `pre-commit run --all-files` if you forget to install pre-commit and it doesn't run in the hook.
20+
21+
### Tests
22+
23+
Mark tests requiring GPUs with `@pytest.mark.skipif(not torch.cuda.is_available(), reason="CUDA not available")`.
24+
25+
### Environment Setup
26+
27+
If you use need to use a venv, create and/or activate it with `python3 -m venv .venv && source .venv/bin/activate && pip install pytest`.

benchmarks/README.md

Lines changed: 149 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,149 @@
1+
# Bergson Benchmarks
2+
3+
This directory contains benchmarking scripts for measuring Bergson's performance across different models and configurations.
4+
5+
## Benchmark Scripts
6+
7+
### Core Benchmarks
8+
9+
- **`benchmark_bergson.py`** - Programmatic benchmarks for Bergson
10+
- `run` - In-memory benchmark using `InMemoryCollector` (fast, single GPU)
11+
- `run-disk` - Disk-based benchmark using real `build()`, `reduce()`, `score_dataset()` (single GPU)
12+
13+
- **`benchmark_bergson_cli.py`** - CLI-based benchmark using subprocess
14+
- Tests the actual CLI commands (`bergson build`, `bergson reduce`, `bergson score`)
15+
- Supports multi-GPU via `--num-gpus`
16+
17+
### Comparison Benchmarks
18+
19+
- **`benchmark_dattri.py`** - Dattri influence function benchmark
20+
- **`kronfluence_benchmark.py`** - Kronfluence influence function benchmark
21+
22+
### Utilities
23+
24+
- **`benchmark_utils.py`** - Shared utilities for all benchmarks
25+
- Model specifications
26+
- Token parsing
27+
- Path generation
28+
- Timestamp utilities
29+
- `load_benchmark_dataset()` - Load on-disk tokenized dataset with filtering
30+
31+
- **`save_to_disk.py`** - Utility for preprocessing and saving tokenized datasets to disk
32+
33+
### Analysis
34+
35+
- **`plot_cli_benchmark.py`** - Plot benchmark results
36+
- Automatically separates plots by num_gpus and hardware
37+
- Generates `cli_benchmark_1gpu.png`, `cli_benchmark_8gpu.png`, etc.
38+
- Each PNG only contains results from the same GPU/hardware configuration
39+
- **`run_full_benchmark.py`** - Orchestrate full benchmark suite
40+
41+
## Usage Examples
42+
43+
### Loading the Benchmark Dataset
44+
45+
All benchmarks should use the pre-tokenized on-disk dataset for consistency:
46+
47+
```python
48+
from benchmarks.benchmark_utils import load_benchmark_dataset
49+
50+
# Load and filter to sequences >= 1024 tokens
51+
ds = load_benchmark_dataset()
52+
```
53+
54+
Or test it directly:
55+
```bash
56+
python -m benchmarks.test_load_dataset
57+
```
58+
59+
This will:
60+
- Load the tokenized dataset from `data/EleutherAI/SmolLM2-135M-10B-tokenized`
61+
- Filter out sequences shorter than 1024 tokens (for even batching)
62+
- Print statistics about total tokens available
63+
64+
### In-Memory Benchmark (fastest)
65+
```bash
66+
python -m benchmarks.benchmark_bergson run pythia-14m 1M 100K
67+
```
68+
69+
### Disk-Based Benchmark (tests real code paths)
70+
```bash
71+
python -m benchmarks.benchmark_bergson run-disk pythia-14m 1M 100K
72+
```
73+
74+
### CLI Benchmark (multi-GPU support)
75+
76+
Single GPU (default):
77+
```bash
78+
python -m benchmarks.benchmark_bergson_cli pythia-70m 10M
79+
```
80+
81+
Multi-GPU (8 GPUs):
82+
```bash
83+
python -m benchmarks.benchmark_bergson_cli pythia-70m 10M --num_gpus 8
84+
```
85+
86+
### Running Full Benchmark Suites
87+
88+
**Small models (1 GPU):**
89+
```bash
90+
./benchmarks/run_small_models_cli_benchmark.sh
91+
```
92+
93+
**Small models (8 GPUs):**
94+
```bash
95+
./benchmarks/run_small_models_8gpu.sh
96+
```
97+
98+
**Large models (1 GPU):**
99+
```bash
100+
./benchmarks/run_large_models_cli_benchmark.sh
101+
```
102+
103+
**Large models (8 GPUs):**
104+
```bash
105+
./benchmarks/run_large_models_8gpu.sh
106+
```
107+
108+
### Generating Plots
109+
110+
The plotting script automatically separates results by GPU count and hardware:
111+
112+
```bash
113+
python -m benchmarks.plot_cli_benchmark
114+
```
115+
116+
This will:
117+
- Load all benchmark results from `runs/bergson_cli_benchmark/`
118+
- Group by (num_gpus, hardware) combination
119+
- Generate separate plots for each configuration:
120+
- `figures/cli_benchmark_1gpu.png` - Single GPU results
121+
- `figures/cli_benchmark_8gpu.png` - 8 GPU results
122+
- `runs/benchmarks/cli_benchmark_1gpu.csv` - Single GPU data
123+
- `runs/benchmarks/cli_benchmark_8gpu.csv` - 8 GPU data
124+
125+
Each plot only contains results from the same GPU/hardware configuration, making comparisons fair and meaningful.
126+
127+
## Benchmark Comparison
128+
129+
| Benchmark | Method | Multi-GPU | Disk I/O | Use Case |
130+
|-----------|--------|-----------|----------|----------|
131+
| `run` | In-memory collector | No (FSDP only) | None | Quick memory scaling tests |
132+
| `run-disk` | Real build/reduce/score | No | Yes | Test production code paths |
133+
| CLI (1 GPU) | Subprocess CLI commands | No | Yes | Single GPU baseline |
134+
| CLI (8 GPU) | Subprocess CLI commands | Yes | Yes | Full multi-GPU distributed |
135+
136+
## Benchmark Records
137+
138+
All benchmarks now include:
139+
- **num_gpus**: Number of GPUs used for the run
140+
- **hardware**: Hardware information (node name + GPU type/count)
141+
142+
This allows proper comparison between single-GPU and multi-GPU runs.
143+
144+
## Adding New Benchmarks
145+
146+
1. Add your benchmark script to this directory
147+
2. Import from `benchmarks.benchmark_utils` for shared functionality
148+
3. Follow the existing pattern for saving results (JSON records)
149+
4. Update this README with your benchmark's purpose and usage

benchmarks/__init__.py

Whitespace-only changes.

0 commit comments

Comments
 (0)