Skip to content

Build crashes on oversized transcripts exceeding model context window #80

@marklubin

Description

@marklubin

Bug

synix build hard-fails when an episode summary LLM call exceeds the model's context window, rather than gracefully handling the oversized input.

Observed Behavior

When a transcript exceeds the model's max context (e.g. 200K tokens for Haiku), the build crashes with:

Pipeline failed: RuntimeError at llm_client.py:148
  LLM API error processing episode ep-696eaceb-...:
  Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error',
  'message': 'prompt is too long: 200308 tokens > 200000 maximum'}}

This kills the entire pipeline, losing all progress on subsequent layers even though 1,718 of 1,730 episodes had already completed.

Details

In a corpus of 1,730 ChatGPT + Claude conversations, 12 transcripts exceeded 200K tokens (ranging from 590K to 2M characters). These are long-running mega-conversations that are common in real-world exports.

Expected Behavior

The build should handle oversized transcripts gracefully. Options:

  1. Truncate to fit: Trim the transcript to fit within max_tokens - prompt_overhead, preserving the beginning and end of the conversation
  2. Chunk and merge: Split the transcript into context-sized chunks, summarize each, then merge summaries
  3. Skip with warning: Log a warning and continue the build, marking the artifact as skipped rather than crashing the entire pipeline
  4. Respect context_budget: The context_budget parameter exists on CoreSynthesis — a similar mechanism could apply to EpisodeSummary to prevent oversized inputs

At minimum, a single oversized transcript should not crash the entire pipeline.

Workaround

Manually move oversized transcripts out of layer0-transcripts/ before rebuilding.

Environment

  • synix 0.15.0
  • claude-haiku-4-5-20251001 (200K context)
  • 1,730 transcripts from ChatGPT + Claude exports

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions