Skip to content

Bug: Supra pattern fails when space precedes comma (HTML cleaning artifact) #14

@medelman17

Description

@medelman17

Problem

When HTML tags are cleaned from text, a space can be introduced before the comma separating a party name from "supra". This breaks the supra pattern match.

"In Twombly, supra, at 553"    → supra citation found ✓
"In Twombly , supra, at 553"   → 0 results ✗

The space before the comma is caused by HTML cleaning:

"In <em>Twombly</em>, supra, at 553"  →  "In Twombly , supra, at 553"

Root Cause

The supra regex requires the comma directly after the name:

/\b([A-Z][a-zA-Z]+...),?\s+supra/
//                      ^^ comma must be adjacent to the word

The pattern doesn't allow optional whitespace before the comma.

Fix

Allow optional whitespace before the comma:

/\b([A-Z][a-zA-Z]+...)\s*,?\s+supra/

Upstream Reference

Python eyecite #210

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions