Skip to content
This repository was archived by the owner on May 22, 2024. It is now read-only.
This repository was archived by the owner on May 22, 2024. It is now read-only.

Improve Narrative Section Extraction #69

Open
@cragwolfe

Description

@cragwolfe

Right now, a SECSection regex is used to identify a TOC section in get_section_narrative. That generally works pretty well. The matching TOC title text is then used to look for the section in the content but rather than sticking with the original regex, a more lenient match condition is ultimately used in 10-K’s and 10-Q’s with match_10k_toc_title_to_section. The better thing to do is likely stick with the original matching regex.

The lenient post-TOC match is why the EHC test fails for the BUSINESS section, and may be the reason for other failures as well.

Definition of Done

  • Updated section extraction logic such that fewer tests are marked as xfailed, in particular the EHC case mentioned above.

Metadata

Metadata

Assignees

No one assigned

    Labels

    help wantedExtra attention is neededpythonPull requests that update Python code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions