Skip to content

Conversation

@qiu-tiandev
Copy link

@qiu-tiandev qiu-tiandev commented Nov 20, 2025

Added get_pdf_title.py to try to get the pdf title from the following order:

  1. title metadata
  2. first line in the first 2 pages if less than 30% of its content being numbers (prevent dates)
  3. First readable line in the first 2 pages
  4. First readable line in the entire doc
  5. Unititled_SHA1HASH
    From issue Generate Better PDF Titles #32
    @kylebd99

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant