Skip to content

Conversation

vku-ibm
Copy link
Contributor

@vku-ibm vku-ibm commented Sep 18, 2025

Adds extraction of the meta-data for uspto-backend that handles parsing of uspto patents in xml form.

Issue resolved by this Pull Request:
Resolves #2273

Checklist:

  • Documentation has been updated, if necessary.
  • Examples have been added, if necessary.
  • Tests have been added, if necessary.

Copy link
Contributor

DCO Check Passed

Thanks @vku-ibm, all your commits are properly signed off. 🎉

Copy link

mergify bot commented Sep 18, 2025

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

def supported_formats(cls) -> Set["InputFormat"]:
pass

@abstractmethod
Copy link
Contributor

@cau-git cau-git Sep 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This @abstractmethod tag should not be necessary since you are providing a default implementation.

@PeterStaar-IBM
Copy link
Contributor

@vku-ibm This is a really good addition! Could you, for the PDF pipelines,

  1. Extract the meta-data through the docling-parse metadata extraction methods
  2. Add the Table-of-contents from the pdf (if it has any) via docling-parse?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Metadata in ConversionResult

3 participants