AGENTS.md

This file provides guidance to Codex (Codex.ai/code) when working with code in this repository.

Commands

pnpm install                      # Install dependencies (uses corepack, pinned in packageManager field)
pnpm test                         # Run all tests (vitest, watch mode)
pnpm exec vitest run              # Run all tests once (no watch)
pnpm exec vitest run tests/extract/extractCase.test.ts  # Run a single test file
pnpm exec vitest run -t "extracts volume"               # Run tests matching name pattern
pnpm build                        # Build with tsdown (ESM + CJS + DTS)
pnpm typecheck                    # Type-check with tsc --noEmit
pnpm lint                         # Lint with Biome
pnpm format                       # Format with Biome (auto-fix)
pnpm size                         # Check bundle size limits
pnpm changeset                    # Create a changeset for the next release

Architecture

This is a TypeScript port of Python eyecite — a legal citation extraction library with zero runtime dependencies.

Pipeline

Citations flow through a 4-stage pipeline: clean → tokenize → extract → (resolve)

Clean (src/clean/): Strip HTML, normalize whitespace/Unicode, fix smart quotes. Builds a TransformationMap to track position shifts.
Tokenize (src/tokenize/): Apply regex patterns from src/patterns/ to find citation candidates. Intentionally broad — captures potential matches without validation.
Extract (src/extract/): Parse metadata from tokens (volume, reporter, page, court, year). Each citation type has its own extractor (extractCase.ts, extractStatute.ts, etc.). The main orchestrator is extractCitations.ts.
- Case extraction is split into parser/semantic modules (caseCore, caseEnvelope, casePostfix, caseParentheticals, caseNameScanner, caseNameSemantics, casePartySemantics, caseReporterSemantics, caseCitationDraft). extractCase.ts should stay an orchestrator: parse syntax, interpret semantics, apply semantics to the draft, then finalize.
- dates.ts provides date parsing utilities (parseMonth, parseDate, toIsoDate) for structured date extraction from parentheticals.
Resolve (src/resolve/): Link short-form citations (Id., supra, short-form case) to their full antecedents. DocumentResolver uses scope boundaries and Levenshtein matching.

Footnote Detection

Opt-in via extractCitations(text, { detectFootnotes: true }). Runs before cleaning on the raw text to preserve newline structure. Two strategies:

HTML (src/footnotes/htmlDetector.ts): Regex-based tag scanner for <footnote>, <fn>, and elements with footnote class/id attributes. No DOM dependency.
Plain text (src/footnotes/textDetector.ts): Finds separator lines (5+ dashes/underscores) followed by numbered markers (1., FN1., [1], n.1).

detectFootnotes(text) selects the strategy (HTML first, text fallback) and returns a FootnoteMap (array of { start, end, footnoteNumber } zones). The pipeline maps zones through TransformationMap to clean-text coordinates, then tags citations with inFootnote/footnoteNumber via binary search. The "footnote" scope strategy in the resolver enforces zone-based isolation: Id. is strict (same zone only), supra/shortFormCase can cross from footnotes to body.

Annotation (src/annotate/) and reporter data (src/data/) are separate entry points to enable tree-shaking.

Position Tracking

The Span type carries dual positions: cleanStart/cleanEnd (for internal parsing) and originalStart/originalEnd (for user-facing results). TransformationMap maps between them using a lookahead algorithm (maxLookAhead=20) in cleanText.ts:rebuildPositionMaps.

fullSpan (optional) extends from case name through final closing parenthetical (including chained parens and subsequent history). The core span field remains citation-core-only for backward compatibility.

Type System

Volume is number | string — numeric for standard volumes, string for hyphenated (e.g., "1984-1")

Entry Points

Three package entry points configured in tsdown.config.ts and package.json:

eyecite-ts → src/index.ts (core extraction + resolution)
eyecite-ts/data → src/data/index.ts (reporter database, lazy-loaded)
eyecite-ts/annotate → src/annotate/index.ts (text annotation)

Path Aliases

@/* maps to src/* in both tsconfig.json and vitest.config.ts.

Code Style

Formatter/Linter: Biome 2.x — spaces, 100-char line width, double quotes, trailing commas, semicolons as needed
noAssignInExpressions: off — regex exec loops use assignment-in-while pattern
noExplicitAny: error and noImplicitAnyLet: error — strict typing enforced
noForEach: off — forEach is allowed
Patterns are defined in src/patterns/ with a Pattern interface (id, regex, description, type)
Regex patterns must avoid nested quantifiers to prevent ReDoS

Test Structure

Tests mirror source in tests/ with the same directory structure. Integration tests live in tests/integration/. Vitest 4 is used — test options go as the second argument: it(name, { timeout }, fn).

CI & Releases

CI: GitHub Actions — lint, typecheck, test (Node 18/20/22 matrix), build + size check
Coverage: Vitest --coverage requires Node 20+ (node:inspector/promises). CI only runs coverage on Node 22.
Releases: Changesets — pnpm changeset to add, merge to main creates "Version Packages" PR, merging that publishes to npm with provenance
Package manager: pnpm 10 via corepack. Build script allowlist in pnpm-workspace.yaml.
Each fix/feature branch needs a changeset: pnpm changeset → select patch/minor/major → write summary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AGENTS.md

Commands

Architecture

Pipeline

Footnote Detection

Position Tracking

Type System

Entry Points

Path Aliases

Code Style

Test Structure

CI & Releases

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

Commands

Architecture

Pipeline

Footnote Detection

Position Tracking

Type System

Entry Points

Path Aliases

Code Style

Test Structure

CI & Releases