Problem
Citation spans only cover the core volume-reporter-page portion, not the full citation including case name, pin cite, and court/year parenthetical.
"Commonwealth v. Gibson, 561 A.2d 1240 (Pa.Super. 1992)"
^^^^^^^^^^^^^^
current span only covers this
Users who want to mask or annotate full citations (e.g., replace with "[CITATION]") need the complete span from plaintiff through closing parenthetical.
Current Behavior
span.originalStart / span.originalEnd point only to the core citation text matched by the tokenizer pattern. Plaintiff/defendant, pin cites, and court/year parentheticals are parsed in the extractor but their positions aren't tracked.
Expected Behavior
Add a fullSpan field (or similar) that covers the entire citation from case name through closing parenthesis:
interface FullCaseCitation {
span: Span // core: "561 A.2d 1240"
fullSpan?: Span // full: "Commonwealth v. Gibson, 561 A.2d 1240 (Pa.Super. 1992)"
}
Use Cases
- Citation masking for ML dataset preparation
- Full-text annotation/highlighting
- Citation extraction for bibliography generation
Upstream Reference
Python eyecite #135
Problem
Citation spans only cover the core volume-reporter-page portion, not the full citation including case name, pin cite, and court/year parenthetical.
Users who want to mask or annotate full citations (e.g., replace with "[CITATION]") need the complete span from plaintiff through closing parenthetical.
Current Behavior
span.originalStart/span.originalEndpoint only to the core citation text matched by the tokenizer pattern. Plaintiff/defendant, pin cites, and court/year parentheticals are parsed in the extractor but their positions aren't tracked.Expected Behavior
Add a
fullSpanfield (or similar) that covers the entire citation from case name through closing parenthesis:Use Cases
Upstream Reference
Python eyecite #135