Skip to content

STREAMS processor does not extract benchmark version from CSV comments #42

Description

@grdumas

Summary

StreamsProcessor does not extract the STREAMS benchmark version from CSV file comments, resulting in missing/incorrect test.version values in OpenSearch.

Current Behavior

The STREAMS benchmark version (5.10) is present in the CSV metadata:

# Test meta data start
# streams_version_# 5.10
# Test meta data end

But StreamsProcessor._parse_streams_csv() ignores this field. It only extracts:

  • Optimization level ✓
  • Array sizes ✓
  • Operation results (Copy, Scale, Add, Triad) ✓
  • Timestamps ✓

Does NOT extract:

  • streams_version_#

Impact

All STREAMS documents in OpenSearch have:

  • test.version = "v2.8" (wrapper version, wrong)
  • test.wrapper_version = "v2.8" (correct)
  • Benchmark version (5.10) is lost

Queries that break:

{"query": {"term": {"test.version": "5.10"}}}  // Returns 0 results

Example Data

Sample file: sample_data/rhel_9.8/rhel/aws/m7i.4xlarge_0/streams_2026.05.03-01.48.55/streams_results/results_streams_opt_O2.csv

# Test meta data start
# Optimization level: O2
# kernel_rev  	  --meta_output numa_nodes
# number_cpus 
# Core\(s\)_per_socket 
# Model_name 
# streams_version_# 5.10      <-- This is NOT extracted
# Test meta data end
Function,Best Rate MB/s,Avg time,Min time,Max time
Copy,474445.8,0.020337,0.020242,0.020405
...

Recommended Fix

Modify StreamsProcessor._parse_streams_csv() to extract benchmark version:

def _parse_streams_csv(self, csv_file: Path) -> Dict[str, Any]:
    """Parse the results_streams.csv summary file including version metadata."""
    runs = {}
    benchmark_version = None
    
    with open(csv_file, 'r') as f:
        lines = f.readlines()
    
    for line in lines:
        line_stripped = line.strip()
        
        # Extract version from comments
        if line_stripped.startswith('#'):
            if 'streams_version_#' in line_stripped:
                match = re.search(r'streams_version_#\s+(\S+)', line_stripped)
                if match:
                    benchmark_version = match.group(1)
            continue
        
        # ... rest of parsing logic ...
    
    # Store for use by build_test_info()
    self._benchmark_version = benchmark_version
    return runs

Then override build_test_info():

def build_test_info(self) -> TestInfo:
    """Build test information with benchmark version from CSV."""
    base_info = super().build_test_info()
    
    benchmark_version = getattr(self, '_benchmark_version', None)
    
    return TestInfo(
        name="streams",
        version=benchmark_version or base_info.version,
        wrapper_version=base_info.wrapper_version
    )

Related Issues

Files to Modify

  • src/chronicler/processors/streams_processor.py
  • Add tests for version extraction

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions