|
| 1 | +# Testing Framework for lmms-eval |
| 2 | + |
| 3 | +This directory contains the test suite for lmms-eval, designed for CI/CD integration and comprehensive testing of the codebase. |
| 4 | + |
| 5 | +## Structure |
| 6 | + |
| 7 | +``` |
| 8 | +test/ |
| 9 | +├── __init__.py # Test package initialization |
| 10 | +├── conftest.py # pytest fixtures and configuration |
| 11 | +├── requirements-test.txt # Testing dependencies |
| 12 | +├── run_suite.py # Test suite runner |
| 13 | +├── test_api_components.py # Core API component tests |
| 14 | +├── test_chat_models.py # Chat model integration tests |
| 15 | +├── test_throughput_metrics.py # Original throughput demo script |
| 16 | +└── test_throughput_metrics_unit.py # Unit tests for throughput metrics |
| 17 | +``` |
| 18 | + |
| 19 | +## Test Categories |
| 20 | + |
| 21 | +### Unit Tests |
| 22 | +- **test_throughput_metrics_unit.py**: Tests for TPOT and inference speed calculations |
| 23 | +- **test_api_components.py**: Tests for core API components (Instance, registries, metrics) |
| 24 | + |
| 25 | +### Integration Tests |
| 26 | +- **test_chat_models.py**: Integration tests for chat models with throughput metrics |
| 27 | + |
| 28 | +### Throughput Tests |
| 29 | +- **test_throughput_metrics.py**: Demo script showing throughput calculations |
| 30 | +- **test_throughput_metrics_unit.py**: Comprehensive unit tests for timing logic |
| 31 | + |
| 32 | +## Running Tests |
| 33 | + |
| 34 | +### Using the Test Runner |
| 35 | +```bash |
| 36 | +# Run all tests |
| 37 | +python test/run_suite.py all |
| 38 | + |
| 39 | +# Run specific test suites |
| 40 | +python test/run_suite.py unit |
| 41 | +python test/run_suite.py integration |
| 42 | +python test/run_suite.py throughput |
| 43 | +python test/run_suite.py lint |
| 44 | +``` |
| 45 | + |
| 46 | +### Using pytest Directly |
| 47 | +```bash |
| 48 | +# Install test dependencies |
| 49 | +pip install -r test/requirements-test.txt |
| 50 | + |
| 51 | +# Run all tests |
| 52 | +pytest test/ |
| 53 | + |
| 54 | +# Run specific test files |
| 55 | +pytest test/test_throughput_metrics_unit.py -v |
| 56 | + |
| 57 | +# Run with coverage |
| 58 | +pytest test/ --cov=lmms_eval --cov-report=html |
| 59 | +``` |
| 60 | + |
| 61 | +### Using unittest |
| 62 | +```bash |
| 63 | +# Run individual test files |
| 64 | +python test/test_throughput_metrics_unit.py |
| 65 | +python test/test_api_components.py |
| 66 | +``` |
| 67 | + |
| 68 | +## CI/CD Integration |
| 69 | + |
| 70 | +### GitHub Actions |
| 71 | +The test suite is integrated with GitHub Actions through `.github/workflows/test.yml`: |
| 72 | + |
| 73 | +- **Lint Check**: Runs black and isort formatting checks |
| 74 | +- **Unit Tests**: Runs on Python 3.9, 3.10, 3.11 |
| 75 | +- **Integration Tests**: Tests model integration with mocks |
| 76 | +- **Throughput Tests**: Validates throughput metric calculations |
| 77 | +- **Coverage**: Generates test coverage reports |
| 78 | + |
| 79 | +### Pre-commit Hooks |
| 80 | +Tests are automatically run through pre-commit hooks: |
| 81 | +```bash |
| 82 | +pre-commit install |
| 83 | +pre-commit run --all-files |
| 84 | +``` |
| 85 | + |
| 86 | +## Test Design Principles |
| 87 | + |
| 88 | +### 1. Fast Unit Tests |
| 89 | +- Mock external dependencies (models, APIs) |
| 90 | +- Test core logic without heavy I/O |
| 91 | +- Focus on edge cases and error handling |
| 92 | + |
| 93 | +### 2. Comprehensive Integration Tests |
| 94 | +- Test real component interactions |
| 95 | +- Use minimal mocking for integration points |
| 96 | +- Validate end-to-end workflows |
| 97 | + |
| 98 | +### 3. Throughput-Specific Tests |
| 99 | +- Validate TPOT formula: `(e2e_latency - TTFT) / (num_output_tokens - 1)` |
| 100 | +- Test inference speed calculation: `1 / TPOT` |
| 101 | +- Verify timing measurement accuracy |
| 102 | +- Test batch processing scenarios |
| 103 | + |
| 104 | +### 4. Maintainable Test Code |
| 105 | +- Use fixtures for common test data |
| 106 | +- Clear test names describing what's being tested |
| 107 | +- Comprehensive error message assertions |
| 108 | +- Clean separation between test categories |
| 109 | + |
| 110 | +## Adding New Tests |
| 111 | + |
| 112 | +### For New Features |
| 113 | +1. Add unit tests in appropriate `test_*.py` file |
| 114 | +2. Add integration tests if feature involves multiple components |
| 115 | +3. Update `run_suite.py` if new test categories are needed |
| 116 | +4. Update CI workflow if special setup is required |
| 117 | + |
| 118 | +### For Throughput Metrics |
| 119 | +1. Add calculation tests to `test_throughput_metrics_unit.py` |
| 120 | +2. Add integration tests to `test_chat_models.py` |
| 121 | +3. Ensure timing accuracy tests cover edge cases |
| 122 | + |
| 123 | +### Test Naming Convention |
| 124 | +- Test files: `test_<component>.py` |
| 125 | +- Test classes: `Test<Component>` |
| 126 | +- Test methods: `test_<specific_behavior>` |
| 127 | + |
| 128 | +## Dependencies |
| 129 | + |
| 130 | +### Core Testing |
| 131 | +- `pytest`: Test framework |
| 132 | +- `pytest-cov`: Coverage reporting |
| 133 | +- `pytest-mock`: Mocking utilities |
| 134 | + |
| 135 | +### Code Quality |
| 136 | +- `black`: Code formatting |
| 137 | +- `isort`: Import sorting |
| 138 | +- `coverage`: Coverage analysis |
| 139 | + |
| 140 | +### Optional |
| 141 | +- `torch`: For model-related tests |
| 142 | +- `transformers`: For HuggingFace model tests |
| 143 | +- `openai`: For API model tests |
| 144 | + |
| 145 | +## Best Practices |
| 146 | + |
| 147 | +### Writing Tests |
| 148 | +- Keep tests focused on single behaviors |
| 149 | +- Use descriptive assertions with clear error messages |
| 150 | +- Mock external dependencies appropriately |
| 151 | +- Test both success and failure cases |
| 152 | + |
| 153 | +### Performance Testing |
| 154 | +- Use timing measurements for throughput validation |
| 155 | +- Allow reasonable variance in timing tests |
| 156 | +- Test edge cases (zero tokens, single token, large batches) |
| 157 | + |
| 158 | +### CI/CD Considerations |
| 159 | +- Tests should be deterministic and reliable |
| 160 | +- Avoid network dependencies in CI |
| 161 | +- Use matrix testing for multiple Python versions |
| 162 | +- Generate coverage reports for code quality tracking |
| 163 | + |
| 164 | +## Troubleshooting |
| 165 | + |
| 166 | +### Common Issues |
| 167 | +1. **Import Errors**: Ensure lmms-eval is installed with `pip install -e .` |
| 168 | +2. **Missing Dependencies**: Install test requirements with `pip install -r test/requirements-test.txt` |
| 169 | +3. **Timing Test Failures**: Check system load; timing tests may be sensitive to CPU usage |
| 170 | + |
| 171 | +### Debug Mode |
| 172 | +```bash |
| 173 | +# Run tests with detailed output |
| 174 | +pytest test/ -v -s |
| 175 | + |
| 176 | +# Run specific test with pdb debugging |
| 177 | +pytest test/test_throughput_metrics_unit.py::TestThroughputMetrics::test_tpot_calculation -v -s --pdb |
| 178 | +``` |
0 commit comments