Summary
StreamsProcessor does not extract the STREAMS benchmark version from CSV file comments, resulting in missing/incorrect test.version values in OpenSearch.
Current Behavior
The STREAMS benchmark version (5.10) is present in the CSV metadata:
# Test meta data start
# streams_version_# 5.10
# Test meta data end
But StreamsProcessor._parse_streams_csv() ignores this field. It only extracts:
- Optimization level ✓
- Array sizes ✓
- Operation results (Copy, Scale, Add, Triad) ✓
- Timestamps ✓
Does NOT extract:
Impact
All STREAMS documents in OpenSearch have:
test.version = "v2.8" (wrapper version, wrong)
test.wrapper_version = "v2.8" (correct)
- Benchmark version (
5.10) is lost
Queries that break:
{"query": {"term": {"test.version": "5.10"}}} // Returns 0 results
Example Data
Sample file: sample_data/rhel_9.8/rhel/aws/m7i.4xlarge_0/streams_2026.05.03-01.48.55/streams_results/results_streams_opt_O2.csv
# Test meta data start
# Optimization level: O2
# kernel_rev --meta_output numa_nodes
# number_cpus
# Core\(s\)_per_socket
# Model_name
# streams_version_# 5.10 <-- This is NOT extracted
# Test meta data end
Function,Best Rate MB/s,Avg time,Min time,Max time
Copy,474445.8,0.020337,0.020242,0.020405
...
Recommended Fix
Modify StreamsProcessor._parse_streams_csv() to extract benchmark version:
def _parse_streams_csv(self, csv_file: Path) -> Dict[str, Any]:
"""Parse the results_streams.csv summary file including version metadata."""
runs = {}
benchmark_version = None
with open(csv_file, 'r') as f:
lines = f.readlines()
for line in lines:
line_stripped = line.strip()
# Extract version from comments
if line_stripped.startswith('#'):
if 'streams_version_#' in line_stripped:
match = re.search(r'streams_version_#\s+(\S+)', line_stripped)
if match:
benchmark_version = match.group(1)
continue
# ... rest of parsing logic ...
# Store for use by build_test_info()
self._benchmark_version = benchmark_version
return runs
Then override build_test_info():
def build_test_info(self) -> TestInfo:
"""Build test information with benchmark version from CSV."""
base_info = super().build_test_info()
benchmark_version = getattr(self, '_benchmark_version', None)
return TestInfo(
name="streams",
version=benchmark_version or base_info.version,
wrapper_version=base_info.wrapper_version
)
Related Issues
Files to Modify
src/chronicler/processors/streams_processor.py
- Add tests for version extraction
Summary
StreamsProcessordoes not extract the STREAMS benchmark version from CSV file comments, resulting in missing/incorrecttest.versionvalues in OpenSearch.Current Behavior
The STREAMS benchmark version (
5.10) is present in the CSV metadata:# Test meta data start # streams_version_# 5.10 # Test meta data endBut
StreamsProcessor._parse_streams_csv()ignores this field. It only extracts:Does NOT extract:
streams_version_#✗Impact
All STREAMS documents in OpenSearch have:
test.version = "v2.8"(wrapper version, wrong)test.wrapper_version = "v2.8"(correct)5.10) is lostQueries that break:
{"query": {"term": {"test.version": "5.10"}}} // Returns 0 resultsExample Data
Sample file:
sample_data/rhel_9.8/rhel/aws/m7i.4xlarge_0/streams_2026.05.03-01.48.55/streams_results/results_streams_opt_O2.csvRecommended Fix
Modify
StreamsProcessor._parse_streams_csv()to extract benchmark version:Then override
build_test_info():Related Issues
ROOT_CAUSE_ANALYSIS.md,VERSION_CONFLATION_IMPACT.mdFiles to Modify
src/chronicler/processors/streams_processor.py