Skip to content

Commit a85f172

Browse files
Luodianclaude
andcommitted
Add comprehensive test suite for CI/CD
Features: - Unit tests for throughput metrics calculations - Integration tests for chat models - API component tests - Test runner script for different test suites - GitHub Actions workflow for automated testing - pytest configuration with fixtures - Test dependencies in pyproject.toml Test structure: - test/test_throughput_metrics_unit.py: TPOT/speed calculation tests - test/test_chat_models.py: Chat model integration tests - test/test_api_components.py: Core API component tests - test/run_suite.py: Test suite runner - test/conftest.py: pytest fixtures and configuration CI/CD integration: - .github/workflows/test.yml: Automated testing workflow - Matrix testing across Python 3.9, 3.10, 3.11 - Separate jobs for lint, unit, integration, and coverage - Test dependencies in pyproject.toml [test] optional group 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
1 parent 9fc62cb commit a85f172

10 files changed

Lines changed: 1038 additions & 0 deletions

.github/workflows/test.yml

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
name: Tests
2+
3+
on:
4+
push:
5+
branches: [ main, dev/v0d4, feature/* ]
6+
pull_request:
7+
branches: [ main, dev/v0d4 ]
8+
9+
jobs:
10+
lint:
11+
runs-on: ubuntu-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
with:
15+
submodules: true
16+
fetch-depth: 0
17+
18+
- name: Set up Python
19+
uses: actions/setup-python@v4
20+
with:
21+
python-version: '3.9'
22+
23+
- name: Install dependencies
24+
run: |
25+
python -m pip install --upgrade pip
26+
python -m pip install black isort pytest
27+
28+
- name: Run linting
29+
run: |
30+
cd test
31+
python run_suite.py lint
32+
33+
unit-tests:
34+
runs-on: ubuntu-latest
35+
strategy:
36+
matrix:
37+
python-version: ['3.9', '3.10', '3.11']
38+
39+
steps:
40+
- uses: actions/checkout@v4
41+
with:
42+
submodules: true
43+
fetch-depth: 0
44+
45+
- name: Set up Python ${{ matrix.python-version }}
46+
uses: actions/setup-python@v4
47+
with:
48+
python-version: ${{ matrix.python-version }}
49+
50+
- name: Install dependencies
51+
run: |
52+
python -m pip install --upgrade pip
53+
python -m pip install pytest pytest-cov
54+
python -m pip install -e .
55+
56+
- name: Run unit tests
57+
run: |
58+
cd test
59+
python run_suite.py unit
60+
61+
integration-tests:
62+
runs-on: ubuntu-latest
63+
steps:
64+
- uses: actions/checkout@v4
65+
with:
66+
submodules: true
67+
fetch-depth: 0
68+
69+
- name: Set up Python
70+
uses: actions/setup-python@v4
71+
with:
72+
python-version: '3.9'
73+
74+
- name: Install dependencies
75+
run: |
76+
python -m pip install --upgrade pip
77+
python -m pip install pytest pytest-mock
78+
python -m pip install -e .
79+
80+
- name: Run integration tests
81+
run: |
82+
cd test
83+
python run_suite.py integration
84+
85+
throughput-tests:
86+
runs-on: ubuntu-latest
87+
steps:
88+
- uses: actions/checkout@v4
89+
with:
90+
submodules: true
91+
fetch-depth: 0
92+
93+
- name: Set up Python
94+
uses: actions/setup-python@v4
95+
with:
96+
python-version: '3.9'
97+
98+
- name: Install dependencies
99+
run: |
100+
python -m pip install --upgrade pip
101+
python -m pip install pytest
102+
# Install minimal dependencies for throughput testing
103+
python -m pip install loguru time
104+
105+
- name: Run throughput tests
106+
run: |
107+
cd test
108+
python run_suite.py throughput
109+
110+
test-coverage:
111+
runs-on: ubuntu-latest
112+
steps:
113+
- uses: actions/checkout@v4
114+
with:
115+
submodules: true
116+
fetch-depth: 0
117+
118+
- name: Set up Python
119+
uses: actions/setup-python@v4
120+
with:
121+
python-version: '3.9'
122+
123+
- name: Install dependencies
124+
run: |
125+
python -m pip install --upgrade pip
126+
python -m pip install pytest pytest-cov
127+
python -m pip install -e .
128+
129+
- name: Run tests with coverage
130+
run: |
131+
cd test
132+
python -m pytest --cov=../lmms_eval --cov-report=xml --cov-report=html
133+
134+
- name: Upload coverage reports
135+
uses: codecov/codecov-action@v3
136+
with:
137+
file: ./test/coverage.xml
138+
fail_ci_if_error: false

pyproject.toml

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,13 @@ dependencies = [
7777
]
7878

7979
[project.optional-dependencies]
80+
test = [
81+
"pytest>=7.0.0",
82+
"pytest-cov>=4.0.0",
83+
"pytest-mock>=3.0.0",
84+
"pytest-xdist>=3.0.0",
85+
"coverage>=6.0.0",
86+
]
8087
audio = [
8188
"more-itertools",
8289
"editdistance",

test/README.md

Lines changed: 178 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,178 @@
1+
# Testing Framework for lmms-eval
2+
3+
This directory contains the test suite for lmms-eval, designed for CI/CD integration and comprehensive testing of the codebase.
4+
5+
## Structure
6+
7+
```
8+
test/
9+
├── __init__.py # Test package initialization
10+
├── conftest.py # pytest fixtures and configuration
11+
├── requirements-test.txt # Testing dependencies
12+
├── run_suite.py # Test suite runner
13+
├── test_api_components.py # Core API component tests
14+
├── test_chat_models.py # Chat model integration tests
15+
├── test_throughput_metrics.py # Original throughput demo script
16+
└── test_throughput_metrics_unit.py # Unit tests for throughput metrics
17+
```
18+
19+
## Test Categories
20+
21+
### Unit Tests
22+
- **test_throughput_metrics_unit.py**: Tests for TPOT and inference speed calculations
23+
- **test_api_components.py**: Tests for core API components (Instance, registries, metrics)
24+
25+
### Integration Tests
26+
- **test_chat_models.py**: Integration tests for chat models with throughput metrics
27+
28+
### Throughput Tests
29+
- **test_throughput_metrics.py**: Demo script showing throughput calculations
30+
- **test_throughput_metrics_unit.py**: Comprehensive unit tests for timing logic
31+
32+
## Running Tests
33+
34+
### Using the Test Runner
35+
```bash
36+
# Run all tests
37+
python test/run_suite.py all
38+
39+
# Run specific test suites
40+
python test/run_suite.py unit
41+
python test/run_suite.py integration
42+
python test/run_suite.py throughput
43+
python test/run_suite.py lint
44+
```
45+
46+
### Using pytest Directly
47+
```bash
48+
# Install test dependencies
49+
pip install -r test/requirements-test.txt
50+
51+
# Run all tests
52+
pytest test/
53+
54+
# Run specific test files
55+
pytest test/test_throughput_metrics_unit.py -v
56+
57+
# Run with coverage
58+
pytest test/ --cov=lmms_eval --cov-report=html
59+
```
60+
61+
### Using unittest
62+
```bash
63+
# Run individual test files
64+
python test/test_throughput_metrics_unit.py
65+
python test/test_api_components.py
66+
```
67+
68+
## CI/CD Integration
69+
70+
### GitHub Actions
71+
The test suite is integrated with GitHub Actions through `.github/workflows/test.yml`:
72+
73+
- **Lint Check**: Runs black and isort formatting checks
74+
- **Unit Tests**: Runs on Python 3.9, 3.10, 3.11
75+
- **Integration Tests**: Tests model integration with mocks
76+
- **Throughput Tests**: Validates throughput metric calculations
77+
- **Coverage**: Generates test coverage reports
78+
79+
### Pre-commit Hooks
80+
Tests are automatically run through pre-commit hooks:
81+
```bash
82+
pre-commit install
83+
pre-commit run --all-files
84+
```
85+
86+
## Test Design Principles
87+
88+
### 1. Fast Unit Tests
89+
- Mock external dependencies (models, APIs)
90+
- Test core logic without heavy I/O
91+
- Focus on edge cases and error handling
92+
93+
### 2. Comprehensive Integration Tests
94+
- Test real component interactions
95+
- Use minimal mocking for integration points
96+
- Validate end-to-end workflows
97+
98+
### 3. Throughput-Specific Tests
99+
- Validate TPOT formula: `(e2e_latency - TTFT) / (num_output_tokens - 1)`
100+
- Test inference speed calculation: `1 / TPOT`
101+
- Verify timing measurement accuracy
102+
- Test batch processing scenarios
103+
104+
### 4. Maintainable Test Code
105+
- Use fixtures for common test data
106+
- Clear test names describing what's being tested
107+
- Comprehensive error message assertions
108+
- Clean separation between test categories
109+
110+
## Adding New Tests
111+
112+
### For New Features
113+
1. Add unit tests in appropriate `test_*.py` file
114+
2. Add integration tests if feature involves multiple components
115+
3. Update `run_suite.py` if new test categories are needed
116+
4. Update CI workflow if special setup is required
117+
118+
### For Throughput Metrics
119+
1. Add calculation tests to `test_throughput_metrics_unit.py`
120+
2. Add integration tests to `test_chat_models.py`
121+
3. Ensure timing accuracy tests cover edge cases
122+
123+
### Test Naming Convention
124+
- Test files: `test_<component>.py`
125+
- Test classes: `Test<Component>`
126+
- Test methods: `test_<specific_behavior>`
127+
128+
## Dependencies
129+
130+
### Core Testing
131+
- `pytest`: Test framework
132+
- `pytest-cov`: Coverage reporting
133+
- `pytest-mock`: Mocking utilities
134+
135+
### Code Quality
136+
- `black`: Code formatting
137+
- `isort`: Import sorting
138+
- `coverage`: Coverage analysis
139+
140+
### Optional
141+
- `torch`: For model-related tests
142+
- `transformers`: For HuggingFace model tests
143+
- `openai`: For API model tests
144+
145+
## Best Practices
146+
147+
### Writing Tests
148+
- Keep tests focused on single behaviors
149+
- Use descriptive assertions with clear error messages
150+
- Mock external dependencies appropriately
151+
- Test both success and failure cases
152+
153+
### Performance Testing
154+
- Use timing measurements for throughput validation
155+
- Allow reasonable variance in timing tests
156+
- Test edge cases (zero tokens, single token, large batches)
157+
158+
### CI/CD Considerations
159+
- Tests should be deterministic and reliable
160+
- Avoid network dependencies in CI
161+
- Use matrix testing for multiple Python versions
162+
- Generate coverage reports for code quality tracking
163+
164+
## Troubleshooting
165+
166+
### Common Issues
167+
1. **Import Errors**: Ensure lmms-eval is installed with `pip install -e .`
168+
2. **Missing Dependencies**: Install test requirements with `pip install -r test/requirements-test.txt`
169+
3. **Timing Test Failures**: Check system load; timing tests may be sensitive to CPU usage
170+
171+
### Debug Mode
172+
```bash
173+
# Run tests with detailed output
174+
pytest test/ -v -s
175+
176+
# Run specific test with pdb debugging
177+
pytest test/test_throughput_metrics_unit.py::TestThroughputMetrics::test_tpot_calculation -v -s --pdb
178+
```

test/__init__.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
"""
2+
Test suite for lmms-eval
3+
"""

test/conftest.py

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,50 @@
1+
"""
2+
pytest configuration and fixtures for lmms-eval tests
3+
"""
4+
import os
5+
import tempfile
6+
from unittest.mock import Mock, patch
7+
8+
import pytest
9+
10+
11+
@pytest.fixture
12+
def mock_model():
13+
"""Mock model for testing without actual model loading"""
14+
mock = Mock()
15+
mock.generate.return_value = "test response"
16+
mock.tokenizer = Mock()
17+
mock.tokenizer.encode.return_value = [1, 2, 3, 4, 5]
18+
mock.tokenizer.decode.return_value = "test response"
19+
return mock
20+
21+
22+
@pytest.fixture
23+
def temp_cache_dir():
24+
"""Temporary directory for cache files"""
25+
with tempfile.TemporaryDirectory() as temp_dir:
26+
yield temp_dir
27+
28+
29+
@pytest.fixture
30+
def mock_task_dict():
31+
"""Mock task dictionary for testing"""
32+
return {
33+
"test_task": {
34+
"test": [
35+
{
36+
"question": "What is 2+2?",
37+
"answer": "4",
38+
"image": None,
39+
"doc_id": 0,
40+
}
41+
]
42+
}
43+
}
44+
45+
46+
@pytest.fixture
47+
def mock_eval_logger():
48+
"""Mock evaluation logger"""
49+
with patch("lmms_eval.api.model.eval_logger") as mock_logger:
50+
yield mock_logger

0 commit comments

Comments
 (0)