Skip to content

Measure bracket survival on real PDF→markdown before committing the hard scope filter #810

Description

@medelman17

Measure bracket survival on real PDF→markdown before committing the hard scope filter

Version: eyecite-ts@0.28.1

Summary

The scope-before-recency design treats parenthetical scope as a hard candidate filter, which assumes the bracket structure of (quoting …) parentheticals survives extraction. If ( / ) are frequently dropped or garbled by PDF→markdown, abstains would be driven by parse failure rather than true ambiguity, and a hard filter could hide correct antecedents.

This is the single highest-priority unknown before committing the hard filter.

Measure

  1. Bracket survival — on a real corpus sample, how often does an inner (quoting …) cite retain intact, balanced delimiters after extraction? Break down by source (native-text vs OCR'd PDF).
  2. Error split — of observed Id. / supra misattributions, what fraction are scope errors (wrong candidate set) vs intra-scope salience errors (right set, wrong rank)? This gates whether a learned ranker is ever needed beyond a thin deterministic scorer.

Outcome

Decides (a) hard vs. degrade-to-soft-on-balance-failure for the scope filter, and (b) whether to invest in stage-3 ranking at all.

Related

  • Design backbone: docs/research/2026-06-02-shortform-resolution-02-scope-and-binding.md, …-03-bracket-parsing-error-recovery.md.

Metadata

Metadata

Assignees

No one assigned

    Labels

    claudeCreated by Claude CodeenhancementNew feature or requestready-for-humanNeeds human implementation (judgment/design/manual testing)resolutionCitation resolution issues

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions