Skip to content

Document version extraction requirements for new processors #47

Description

@grdumas

Summary

Document best practices and requirements for version extraction when creating new processors or wrappers.

Problem

The version conflation bug (issues #41-#45) affected 83% of processors because there was no clear guidance on:

  1. The difference between wrapper version and benchmark version
  2. Where each version should be stored
  3. How to extract benchmark versions from results

Required Documentation

1. Processor Development Guide

Add section: "Handling Versions"

## Handling Versions in Processors

### Two Types of Versions

Every test has TWO distinct versions that must be tracked separately:

1. **Wrapper Version** (`test.wrapper_version`)
   - Version of the wrapper script repository
   - Example: `streams-wrapper v2.8`
   - Extracted automatically from orchestrator's `test_info` file
   - Handled by `BaseProcessor.build_test_info()`

2. **Benchmark Version** (`test.version`)
   - Version of the actual benchmark being run
   - Example: `STREAMS 5.10`, `fio-3.36`, `CoreMark v1.01`
   - Must be extracted from benchmark results
   - Requires processor override of `build_test_info()`

### Why Both Matter

- Wrapper can update (v2.7 → v2.8) independently of benchmark (still 5.10)
- Performance changes may be due to wrapper improvements OR benchmark upgrades
- Historical analysis requires tracking both separately

### Implementation Pattern

Every processor for a benchmark with its own version should:

1. **Extract benchmark version during parsing**
   ```python
   def parse_runs(self, extracted_result: Dict[str, Any]) -> Dict[str, Any]:
       # Parse results...
       
       # Extract benchmark version from results
       version = self._extract_benchmark_version(results_file)
       self._benchmark_version = version
       
       return runs
  1. Override build_test_info()

    def build_test_info(self) -> TestInfo:
        base_info = super().build_test_info()
        
        benchmark_version = getattr(self, '_benchmark_version', None)
        
        return TestInfo(
            name=self.get_test_name(),
            version=benchmark_version or base_info.version,
            wrapper_version=base_info.wrapper_version
        )
  2. Add tests

    def test_version_extraction():
        processor = MyProcessor(sample_dir)
        result = processor.process()
        
        # Benchmark version should be extracted
        assert result.test.version == "expected_benchmark_version"
        # Wrapper version should be different
        assert result.test.wrapper_version == "v1.0"
        assert result.test.version != result.test.wrapper_version

Common Version Locations

Format Version Location Example
JSON Top-level field {"fio version": "fio-3.36"}
CSV Comment header # streams_version_# 5.10
Log Output line CoreMark 1.01
File version or VERSION file See parse_version_file()

Checklist for New Processors

  • Does this benchmark have its own version independent of the wrapper?
  • If yes, where is the version in the results? (JSON/CSV/log/file)
  • Extract version during parse_runs()
  • Store in self._benchmark_version
  • Override build_test_info() to use it
  • Add test verifying test.version != test.wrapper_version
  • Update processor documentation

### 2. Wrapper Development Guide

Add section: **"Version Reporting"**

```markdown
## Reporting Benchmark Version

Wrappers should make the benchmark version easily discoverable:

### Option 1: Include in Results (Preferred)
Add version to results metadata:
```json
{"benchmark_version": "3.36", ...}

or

# benchmark_version: 5.10

Option 2: Create VERSION File

echo "${BENCHMARK_VERSION}" > "${results_dir}/BENCHMARK_VERSION"

Option 3: Log to Stdout

Make sure version appears in captured logs:

echo "Running STREAMS version ${STREAMS_VERSION}"

What NOT to Do

  • Don't rely on wrapper version in test_info for benchmark version
  • Don't assume orchestrator will extract version
  • Don't embed version only in wrapper code (not in results)

### 3. Schema Documentation

Update `schema.py` docstrings:

```python
@dataclass
class TestInfo:
    """Test information.
    
    Attributes:
        name: Test name (e.g., 'streams', 'fio')
        version: Benchmark version (e.g., '5.10', 'fio-3.36')
                 NOT the wrapper version. This is the version of the
                 actual benchmark software being run.
        wrapper_version: Wrapper script version (e.g., 'v2.8')
                        Extracted from orchestrator's test_info file.
        schema_version: Optional result schema version
    """

4. Add to Troubleshooting Guide

## Wrong version in test.version field

**Symptom**: `test.version` shows wrapper version instead of benchmark version

**Cause**: Processor doesn't override `build_test_info()`

**Fix**: See "Handling Versions in Processors" in development guide

Files to Update

  • docs/processor_development.md (create if needed)
  • docs/wrapper_development.md (create if needed)
  • src/chronicler/schema.py (docstrings)
  • README.md (link to guides)
  • CONTRIBUTING.md (version extraction checklist)

Dependencies

Success Criteria

  • Clear documentation preventing future version conflation bugs
  • Examples from actual fixes
  • Checklist for reviewers
  • Easy to find (linked from README)

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions