Skip to content

WI scraper text parsing weirdness #718

@stucka

Description

@stucka

Some junky HTML is coming in with a United States Cellular Corporation entry; but if I try to replace the text or split on it within _clean_text, it fails. If I try to even just log lines with "Cellular" or "Corporation" I don't see them. I don't know if there's a Unicode vs. ASCII thing or something cooking here, the actual CSV output has that cell wrapped in regular quote marks, though the HTML inside is unescaped and contains several quote marks.

I tried.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions