Summary
After fixing version extraction in processors (issues #41-#45), all historical data in OpenSearch needs to be re-processed to populate correct test.version values.
Current State
~83% of documents in OpenSearch have incorrect test.version values:
- Fully affected tests:
test.version = wrapper version or "unknown"
- Partially affected tests:
test.version = wrapper version, benchmark version in config
Required Steps
1. Pre-Migration Validation
Before re-processing:
2. Migration Strategy Options
Option A: Full Re-ingest (Recommended)
- Re-process all raw result files
- Generate new documents with correct versions
- Replace existing documents
Pros: Clean, consistent results
Cons: Time-intensive, requires raw files
Option B: Partial Update
- Query existing documents
- Extract benchmark version from
runs[].configuration (for partially affected)
- Update
test.version field
Pros: Faster for partially affected tests
Cons: Cannot fix fully affected tests without raw data
Option C: Hybrid
- Re-ingest fully affected tests (no version in config)
- Update partially affected tests (version in config)
3. Migration Script
Create script to:
- Identify affected documents by
test.name
- Re-process raw results OR extract from config
- Update
test.version field
- Validate results
# Pseudo-code
affected_tests = {
'fully': ['coremark', 'coremark_pro', 'phoronix', 'specjbb', 'streams', 'uperf'],
'partial': ['autohpl', 'fio', 'passmark', 'speccpu2017']
}
for test_name in affected_tests['fully']:
# Must re-ingest from raw files
documents = opensearch.search(test_name=test_name)
for doc in documents:
raw_file = locate_raw_result(doc)
new_doc = process_with_fixed_processor(raw_file)
opensearch.update(doc.id, new_doc)
for test_name in affected_tests['partial']:
# Can extract from existing config
documents = opensearch.search(test_name=test_name)
for doc in documents:
version = extract_version_from_config(doc, test_name)
opensearch.update(doc.id, {'test.version': version})
4. Validation
After migration:
# Validation queries
# 1. Check version distribution
GET /chronicler-runs/_search
{
"aggs": {
"by_test": {
"terms": {"field": "test.name.keyword"},
"aggs": {
"versions": {"terms": {"field": "test.version.keyword"}}
}
}
}
}
# 2. Verify no conflation (version != wrapper_version)
GET /chronicler-runs/_search
{
"query": {
"script": {
"script": "doc['test.version.keyword'].value == doc['test.wrapper_version.keyword'].value"
}
}
}
5. Rollback Plan
If issues found:
- Restore from backup
- Fix processor bugs
- Re-run migration
Dependencies
Estimated Impact
Assuming uniform distribution:
- Documents affected: ~83% of chronicler-runs index
- Re-processing time: Depends on data volume and strategy
- Downtime: None (can update in-place)
References
- Analysis:
VERSION_CONFLATION_IMPACT.md
- Example queries in impact document
Summary
After fixing version extraction in processors (issues #41-#45), all historical data in OpenSearch needs to be re-processed to populate correct
test.versionvalues.Current State
~83% of documents in OpenSearch have incorrect
test.versionvalues:test.version = wrapper versionor"unknown"test.version = wrapper version, benchmark version inconfigRequired Steps
1. Pre-Migration Validation
Before re-processing:
2. Migration Strategy Options
Option A: Full Re-ingest (Recommended)
Pros: Clean, consistent results
Cons: Time-intensive, requires raw files
Option B: Partial Update
runs[].configuration(for partially affected)test.versionfieldPros: Faster for partially affected tests
Cons: Cannot fix fully affected tests without raw data
Option C: Hybrid
3. Migration Script
Create script to:
test.nametest.versionfield4. Validation
After migration:
test.version != test.wrapper_versionfor affected tests5. Rollback Plan
If issues found:
Dependencies
Estimated Impact
Assuming uniform distribution:
References
VERSION_CONFLATION_IMPACT.md