Skip to content

Feature: Full citation span including plaintiff and parenthetical #9

@medelman17

Description

@medelman17

Problem

Citation spans only cover the core volume-reporter-page portion, not the full citation including case name, pin cite, and court/year parenthetical.

"Commonwealth v. Gibson, 561 A.2d 1240 (Pa.Super. 1992)"
                         ^^^^^^^^^^^^^^
                         current span only covers this

Users who want to mask or annotate full citations (e.g., replace with "[CITATION]") need the complete span from plaintiff through closing parenthetical.

Current Behavior

span.originalStart / span.originalEnd point only to the core citation text matched by the tokenizer pattern. Plaintiff/defendant, pin cites, and court/year parentheticals are parsed in the extractor but their positions aren't tracked.

Expected Behavior

Add a fullSpan field (or similar) that covers the entire citation from case name through closing parenthesis:

interface FullCaseCitation {
  span: Span           // core: "561 A.2d 1240"
  fullSpan?: Span      // full: "Commonwealth v. Gibson, 561 A.2d 1240 (Pa.Super. 1992)"
}

Use Cases

  • Citation masking for ML dataset preparation
  • Full-text annotation/highlighting
  • Citation extraction for bibliography generation

Upstream Reference

Python eyecite #135

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions