Re-process historical data after fixing version extraction

## Summary

After fixing version extraction in processors (issues #41-#45), all historical data in OpenSearch needs to be re-processed to populate correct `test.version` values.

## Current State

~83% of documents in OpenSearch have incorrect `test.version` values:
- **Fully affected tests**: `test.version = wrapper version` or `"unknown"`
- **Partially affected tests**: `test.version = wrapper version`, benchmark version in `config`

## Required Steps

### 1. Pre-Migration Validation

Before re-processing:
- [ ] Verify all processor fixes are merged
- [ ] Test fixes with sample data
- [ ] Validate new documents have correct versions
- [ ] Create backup of OpenSearch index

### 2. Migration Strategy Options

#### Option A: Full Re-ingest (Recommended)
- Re-process all raw result files
- Generate new documents with correct versions
- Replace existing documents

**Pros**: Clean, consistent results  
**Cons**: Time-intensive, requires raw files

#### Option B: Partial Update
- Query existing documents
- Extract benchmark version from `runs[].configuration` (for partially affected)
- Update `test.version` field

**Pros**: Faster for partially affected tests  
**Cons**: Cannot fix fully affected tests without raw data

#### Option C: Hybrid
- Re-ingest fully affected tests (no version in config)
- Update partially affected tests (version in config)

### 3. Migration Script

Create script to:
1. Identify affected documents by `test.name`
2. Re-process raw results OR extract from config
3. Update `test.version` field
4. Validate results

```python
# Pseudo-code
affected_tests = {
    'fully': ['coremark', 'coremark_pro', 'phoronix', 'specjbb', 'streams', 'uperf'],
    'partial': ['autohpl', 'fio', 'passmark', 'speccpu2017']
}

for test_name in affected_tests['fully']:
    # Must re-ingest from raw files
    documents = opensearch.search(test_name=test_name)
    for doc in documents:
        raw_file = locate_raw_result(doc)
        new_doc = process_with_fixed_processor(raw_file)
        opensearch.update(doc.id, new_doc)

for test_name in affected_tests['partial']:
    # Can extract from existing config
    documents = opensearch.search(test_name=test_name)
    for doc in documents:
        version = extract_version_from_config(doc, test_name)
        opensearch.update(doc.id, {'test.version': version})
```

### 4. Validation

After migration:
- [ ] Query counts match pre-migration
- [ ] `test.version != test.wrapper_version` for affected tests
- [ ] Spot-check documents have correct benchmark versions
- [ ] Run test queries to verify version filtering works

```python
# Validation queries
# 1. Check version distribution
GET /chronicler-runs/_search
{
  "aggs": {
    "by_test": {
      "terms": {"field": "test.name.keyword"},
      "aggs": {
        "versions": {"terms": {"field": "test.version.keyword"}}
      }
    }
  }
}

# 2. Verify no conflation (version != wrapper_version)
GET /chronicler-runs/_search
{
  "query": {
    "script": {
      "script": "doc['test.version.keyword'].value == doc['test.wrapper_version.keyword'].value"
    }
  }
}
```

### 5. Rollback Plan

If issues found:
1. Restore from backup
2. Fix processor bugs
3. Re-run migration

## Dependencies

- #41 - BaseProcessor fix
- #42 - STREAMS fix
- #43 - FIO fix
- #44 - CoreMark fix
- #45 - Remaining processors

## Estimated Impact

Assuming uniform distribution:
- **Documents affected**: ~83% of chronicler-runs index
- **Re-processing time**: Depends on data volume and strategy
- **Downtime**: None (can update in-place)

## References

- Analysis: `VERSION_CONFLATION_IMPACT.md`
- Example queries in impact document

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Re-process historical data after fixing version extraction #46

Summary

Current State

Required Steps

1. Pre-Migration Validation

2. Migration Strategy Options

Option A: Full Re-ingest (Recommended)

Option B: Partial Update

Option C: Hybrid

3. Migration Script

4. Validation

5. Rollback Plan

Dependencies

Estimated Impact

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Re-process historical data after fixing version extraction #46

Description

Summary

Current State

Required Steps

1. Pre-Migration Validation

2. Migration Strategy Options

Option A: Full Re-ingest (Recommended)

Option B: Partial Update

Option C: Hybrid

3. Migration Script

4. Validation

5. Rollback Plan

Dependencies

Estimated Impact

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions