Skip to content

ENH: pdfly extract-tables using the camelot lib #185

@Lucas-C

Description

@Lucas-C

We could provide a new extract-tables subcommand, thats uses the camelot to extract tables from PDF files.

The PR implementing this should include:

  • unit tests
  • documentation: docstrings & a new page in docs/user/
  • the command output should display the .parsing_report from camelot
  • it should be possible to target specific PDF pages
  • various export options should be possible, using the corresponding camelot methods: to_csv() , to_json(), to_excel(), to_html(), to_markdown() & to_sqlite().
  • other options could be implemented immediately or in further PRs: --password for decryption, --flavor, --parallel, --split-text

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions