Description
The streams processor produces malformed document-level status strings containing multiple "RAN" values separated by newlines, instead of a single status value.
Affected documents: 47 documents in production OpenSearch (0.6% of total data)
Evidence from Production
Query of production OpenSearch found 47 streams documents with status containing newlines:
- Example:
status = "RAN\nRAN\nRAN\nRAN" (should be single "RAN" or "PASS")
- Run-level status is correct (
"PASS"), but document-level status is malformed
- Occurs across different RHEL versions and instance types
- Other variations:
"RAN\nRAN", "RAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN"
Example Malformed Document
{
"test": {"name": "streams"},
"results": {
"status": "RAN\nRAN\nRAN\nRAN",
"total_runs": 1,
"runs": {
"run_0": {
"status": "PASS",
"configuration": {
"array_sizes": ["", "33792k", "67584k", "135168k", "270336k"],
"optimization_level": "O2"
},
"metrics": { ... }
}
}
}
}
Impact
- Data quality: 47 documents have invalid status values
- Query accuracy: OpenSearch queries filtering by status may miss these documents
- Aggregation pollution: Status field aggregations show multiple
"RAN\nRAN" variations
- Migration blocker: Schema migration to v2 (RPOPC-1267) may reject multiline status strings
Root Cause (Suspected)
In src/chronicler/processors/streams_processor.py:
- Likely concatenating multiple "RAN" statuses instead of deduplicating
- Status parsing doesn't strip/validate for single-line values
- May be related to multiple optimization levels (O2/O3) or array sizes producing multiple status outputs
Suggested Fix
- Validation: Strip newlines from status field before assignment
- Deduplication: If multiple status values exist, deduplicate them
- Severity hierarchy: If multiple distinct statuses, choose most severe:
FAIL > UNKNOWN > RAN > PASS
- Unit test: Add test case for multiline status handling
Files to Check
src/chronicler/processors/streams_processor.py (primary)
- Other processors: Verify this issue is unique to streams (production data shows only streams affected)
Context
Discovered during RPOPC-1267 schema migration validation when analyzing production OpenSearch edge cases.
Related to: RPOPC-1267 (OpenSearch schema migration)
Description
The streams processor produces malformed document-level status strings containing multiple "RAN" values separated by newlines, instead of a single status value.
Affected documents: 47 documents in production OpenSearch (0.6% of total data)
Evidence from Production
Query of production OpenSearch found 47 streams documents with status containing newlines:
status = "RAN\nRAN\nRAN\nRAN"(should be single "RAN" or "PASS")"PASS"), but document-level status is malformed"RAN\nRAN","RAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN\nRAN"Example Malformed Document
{ "test": {"name": "streams"}, "results": { "status": "RAN\nRAN\nRAN\nRAN", "total_runs": 1, "runs": { "run_0": { "status": "PASS", "configuration": { "array_sizes": ["", "33792k", "67584k", "135168k", "270336k"], "optimization_level": "O2" }, "metrics": { ... } } } } }Impact
"RAN\nRAN"variationsRoot Cause (Suspected)
In
src/chronicler/processors/streams_processor.py:Suggested Fix
FAIL > UNKNOWN > RAN > PASSFiles to Check
src/chronicler/processors/streams_processor.py(primary)Context
Discovered during RPOPC-1267 schema migration validation when analyzing production OpenSearch edge cases.
Related to: RPOPC-1267 (OpenSearch schema migration)