Skip to content

FIO results exceed OpenSearch 5000 field limit (91,746 fields) #19

Description

@grdumas

Problem

FIO benchmark results cannot be indexed to OpenSearch due to exceeding the 5,000 field limit.

Error: Field limit of 5000 exceeded
Actual field count: 91,746 fields (18.3× over limit)
Impact: ALL FIO results fail to index - blocking critical functionality

Root Cause

The FIO processor violates the schema's "no nested arrays" design principle by storing per-job data as arrays within timeseries points.

Location: src/chronicler/processors/fio_processor.py line 759

timeseries[create_sequence_key(i)] = TimeSeriesPoint(
    timestamp=sample_time.strftime("%Y-%m-%dT%H:%M:%SZ"),
    metrics={
        'total_bandwidth_kbps': total_bw,
        'total_iops': total_iops,
        'avg_latency_ns': avg_lat,
        'avg_clat_ns': avg_clat,
        'avg_slat_ns': avg_slat,
        'jobs': jobs_data  # ❌ NESTED ARRAY - creates field explosion
    }
)

Why this breaks:

  • Sample test: 48 runs × 120 timeseries points × 8 jobs = 5,760 sequences
  • Each jobs array creates dynamic fields in OpenSearch for every element
  • OpenSearch creates fields for: jobs.job_number, jobs.bandwidth_kbps, etc.
  • Result: 34,560+ fields just from the jobs arrays alone

Schema Violation

From schema.py lines 8-11:

"""
Key Design Decisions:
- Object-based structure with dynamic keys (runs.run_1, runs.run_2)
- Timestamps as keys for time series data
- No nested arrays (avoids OpenSearch performance issues)  # ❌ FIO VIOLATES
- Fully denormalized (all SUT metadata embedded)

FIO is the ONLY benchmark that violates this principle. Other benchmarks (CoreMark, PyPerf, uperf, PassMark) follow the "no nested arrays" rule and work fine.

Evidence

Test case: cpttest_RHEL-10.3_20260510.1

  • Runs: 48 workload combinations
  • Timeseries points per run: 120
  • Jobs per run: 8 (varies by test)
  • Total fields: 91,746

Field breakdown:

  • Base document: ~50 fields
  • 48 run keys (run_0 to run_47)
  • Per run: 35 metrics + 16 config + 120 timeseries sequences
  • Per sequence: 6 aggregated metrics + jobs array (8 jobs × 6 fields each)

Proposed Solution

Option 1: Separate Per-Job Timeseries Index (RECOMMENDED)

Create a new index zathras-fio-job-timeseries for per-job granular data, similar to how zathras-timeseries exists for general timeseries data.

Structure

Summary document (zathras-results):

  • Aggregated metrics only in timeseries
  • Per-job summary at run level (kept)
  • No nested arrays

Per-job timeseries (zathras-fio-job-timeseries):

{
  "parent_document_id": "fio_abc123...",
  "run_key": "run_0",
  "sequence": 0,
  "timestamp": "2026-05-10T12:00:00Z",
  "job_number": 0,
  "device": "/dev/sda",
  "bandwidth_kbps": 10000,
  "iops": 500,
  "latency_ns": 2000,
  "clat_ns": 1800,
  "slat_ns": 200
}

Benefits

  • ✅ Stays under 5,000 field limit
  • ✅ Preserves per-job timeseries granularity (requirement confirmed)
  • ✅ Follows established pattern (separate timeseries index already exists)
  • ✅ Enables per-job analysis via queries on job-timeseries index
  • ✅ Can reprocess all existing FIO archives (no new data collection needed)

Implementation Tasks

  1. Modify FIO processor (fio_processor.py)

    • Remove jobs array from timeseries metrics
    • Generate separate job-timeseries documents
    • Return both summary and job-timeseries document lists
  2. Update OpenSearch exporter (opensearch_exporter.py)

    • Support exporting to multiple indices
    • Index job-timeseries documents to zathras-fio-job-timeseries
  3. Create index template

    • Define mapping for zathras-fio-job-timeseries
  4. Update documentation

    • Query patterns for per-job analysis
    • Migration guide for existing FIO results
  5. Add validation

    • Schema validation to catch nested arrays in TimeSeriesPoint.metrics
    • Prevent future violations

Alternative Solutions Considered

Option 2: Object-based jobs - Still 46× over limit (insufficient)
Option 3: Remove jobs from timeseries - Loses per-job granularity (rejected - granularity required)
Option 4: Sample per-job data - Still over limit unless very aggressive (insufficient)

Related Work

This issue was discovered during the schema review analysis (branch review/schema-analysis-2026-06-03). See:

  • fio_field_limit_diagnosis.md - Full diagnosis with field count analysis
  • schema_review_2026-06-03.md - Validates importance of "no nested arrays" principle

Acceptance Criteria

  • FIO results successfully index to OpenSearch without field limit errors
  • Per-job timeseries granularity is preserved
  • Existing FIO archives can be reprocessed
  • Field count stays under 5,000 for all FIO documents
  • Query patterns documented for per-job analysis
  • Schema validation prevents future nested array violations

Priority

CRITICAL - Blocking all FIO result indexing to OpenSearch

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions