This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
pnpm install # Install dependencies (uses corepack, pinned in packageManager field)
pnpm test # Run all tests (vitest, watch mode)
pnpm exec vitest run # Run all tests once (no watch)
pnpm exec vitest run tests/extract/extractCase.test.ts # Run a single test file
pnpm exec vitest run -t "extracts volume" # Run tests matching name pattern
pnpm build # Build with tsdown (ESM + CJS + DTS)
pnpm typecheck # Type-check with tsc --noEmit
pnpm lint # Lint with Biome
pnpm format # Format with Biome (auto-fix)
pnpm size # Check bundle size limits
pnpm changeset # Create a changeset for the next releaseThis is a TypeScript port of Python eyecite — a legal citation extraction library with zero runtime dependencies.
Citations flow through a 4-stage pipeline: clean → tokenize → extract → (resolve)
- Clean (
src/clean/): Strip HTML, normalize whitespace/Unicode, fix smart quotes. Builds aTransformationMapto track position shifts. - Tokenize (
src/tokenize/): Apply regex patterns fromsrc/patterns/to find citation candidates. Intentionally broad — captures potential matches without validation. - Extract (
src/extract/): Parse metadata from tokens (volume, reporter, page, court, year). Each citation type has its own extractor (extractCase.ts,extractStatute.ts, etc.). The main orchestrator isextractCitations.ts.extractCase.tsalso handles case name backward search (extractCaseName), full span calculation (findParentheticalEnd), unified parenthetical parsing (parseParenthetical), and disposition extraction.dates.tsprovides date parsing utilities (parseMonth,parseDate,toIsoDate) for structured date extraction from parentheticals.
- Resolve (
src/resolve/): Link short-form citations (Id., supra, short-form case) to their full antecedents.DocumentResolveruses scope boundaries and Levenshtein matching.
Opt-in via extractCitations(text, { detectFootnotes: true }). Runs before cleaning on the raw text to preserve newline structure. Two strategies:
- HTML (
src/footnotes/htmlDetector.ts): Regex-based tag scanner for<footnote>,<fn>, and elements with footnote class/id attributes. No DOM dependency. - Plain text (
src/footnotes/textDetector.ts): Finds separator lines (5+ dashes/underscores) followed by numbered markers (1.,FN1.,[1],n.1).
detectFootnotes(text) selects the strategy (HTML first, text fallback) and returns a FootnoteMap (array of { start, end, footnoteNumber } zones). The pipeline maps zones through TransformationMap to clean-text coordinates, then tags citations with inFootnote/footnoteNumber via binary search. The "footnote" scope strategy in the resolver enforces zone-based isolation: Id. is strict (same zone only), supra/shortFormCase can cross from footnotes to body.
Annotation (src/annotate/) and reporter data (src/data/) are separate entry points to enable tree-shaking.
The Span type carries dual positions: cleanStart/cleanEnd (for internal parsing) and originalStart/originalEnd (for user-facing results). TransformationMap maps between them using a lookahead algorithm (maxLookAhead=20) in cleanText.ts:rebuildPositionMaps.
fullSpan(optional) extends from case name through final closing parenthetical (including chained parens and subsequent history). The corespanfield remains citation-core-only for backward compatibility.
Citations use a discriminated union on the type field: case | statute | journal | neutral | publicLaw | federalRegister | statutesAtLarge | id | supra | shortFormCase. All share CitationBase (text, span, confidence, matchedText, processTimeMs). Switch on citation.type for type-safe field access.
- Volume is
number | string— numeric for standard volumes, string for hyphenated (e.g., "1984-1")
Three package entry points configured in tsdown.config.ts and package.json:
eyecite-ts→src/index.ts(core extraction + resolution)eyecite-ts/data→src/data/index.ts(reporter database, lazy-loaded)eyecite-ts/annotate→src/annotate/index.ts(text annotation)
@/* maps to src/* in both tsconfig.json and vitest.config.ts.
- Formatter/Linter: Biome 2.x — spaces, 100-char line width, double quotes, trailing commas, semicolons as needed
noAssignInExpressions: off— regex exec loops use assignment-in-while patternnoExplicitAny: errorandnoImplicitAnyLet: error— strict typing enforcednoForEach: off— forEach is allowed- Patterns are defined in
src/patterns/with aPatterninterface (id,regex,description,type) - Regex patterns must avoid nested quantifiers to prevent ReDoS
Tests mirror source in tests/ with the same directory structure. Integration tests live in tests/integration/. Vitest 4 is used — test options go as the second argument: it(name, { timeout }, fn).
- CI: GitHub Actions — lint, typecheck, test (Node 18/20/22 matrix), build + size check
- Coverage: Vitest
--coveragerequires Node 20+ (node:inspector/promises). CI only runs coverage on Node 22. - Releases: Changesets —
pnpm changesetto add, merge to main creates "Version Packages" PR, merging that publishes to npm with provenance - Package manager: pnpm 10 via corepack. Build script allowlist in
pnpm-workspace.yaml. - Each fix/feature branch needs a changeset:
pnpm changeset→ select patch/minor/major → write summary