feat: capture picture description API usage#3632
Open
FrigaZzz wants to merge 7 commits into
Open
Conversation
Contributor
|
✅ DCO Check Passed Thanks @FrigaZzz, all your commits are properly signed off. 🎉 |
Contributor
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
Signed-off-by: frigazzz <frigato.luca97@gmail.com>
11f5ae4 to
afad01f
Compare
5 tasks
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
Signed-off-by: frigazzz <frigato.luca97@gmail.com>
dolfim-ibm
reviewed
Jun 17, 2026
Signed-off-by: frigazzz <frigato.luca97@gmail.com>
Author
There was a test conflict, so I went ahead and resolved it. |
cau-git
previously approved these changes
Jun 19, 2026
`picture_description_api_usage.py` required a positional PDF arg and a reachable VLM endpoint, so the light examples job exited with code 2 and failed CI whenever it was selected. - Add `picture_description_api_usage` to EXAMPLES_UNSUPPORTED_IN_CI in `.github/workflows/checks.yml`, matching the convention used for the other API-only examples (`pictures_description_api`, `vlm_pipeline_api_model`, ...). - Make the example safe to run standalone: the `pdf` arg is now optional and defaults to `tests/data/pdf/2206.01062.pdf`, and `main()` exits 0 with a warning when neither `PICTURE_DESCRIPTION_API_URL` nor `AZURE_API_BASE` is set. Signed-off-by: frigazzz <frigato.luca97@gmail.com>
…hub.com/FrigaZzz/docling into feature/token-usage-picture-description
Author
|
Hi! Had to add a small CI fix: run-examples-light was failing on the new picture_description_api_usage.py example because it needs a PDF arg and an API endpoint, so it exited with code 2 in CI. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduces standardized capture and propagation of raw usage metadata, such as token counters, from OpenAI/VLM-compatible picture description backends within Docling.
Context and Motivation
References: #2271, #2402, #2403, #2445
Docling can already describe document pictures through local and remote VLM backends, but API-backed picture description calls did not expose provider usage metadata to downstream users. This made it difficult to:
Initial work was validated as a third-party plugin (#2403) to test the end-to-end flow without modifying core. Based on that validation, this PR integrates usage capture directly into Docling's image API request and picture description runtime.
The implementation intentionally preserves the raw provider payload instead of forcing it into a Docling-specific token schema, because token accounting differs across providers.
What's Changed
1. Image API request results can carry usage metadata
ApiImageRequestResult, a small result object for image API calls.api_image_request()for existing callers.usagemetadata alongside generated text, token count, and stop reason.usagetoVlmPredictionso VLM API runtimes can propagate provider usage as well.2. Usage extraction from OpenAI-compatible responses
api_image_request()now parses the raw JSON response before validating the OpenAI-compatible completion payload.usageresponse field.PictureDescriptionApiOptions.usage_response_keylets users select another response key or dotted path, such asproviderUsageormeta.usage.token_extract_keyalias is still supported for compatibility with existing usage-capture experiments.3. Picture description metadata stores captured usage
PictureDescriptionBaseModelnow accepts either plain strings orApiImageRequestResultoutputs from_annotate_images().strcontinue to work unchanged.4. Runtime integration across API picture description paths
PictureDescriptionApiModelpassesusage_response_keythrough toapi_image_request().VlmPrediction.5. Documentation and examples
docs/examples/picture_description_api_usage.py.Backward Compatibility
This PR is designed to avoid behavior changes for existing users.
_annotate_images()implementations that returnstrcontinue to work.api_image_request()as a 3-tuple continue to work becauseApiImageRequestResultpreserves tuple-like iteration, indexing, length, and tuple equality for the historical fields.Breaking Changes
No intentional user-facing breaking changes.
Custom picture description subclasses may optionally return
ApiImageRequestResultwhen they want to provide usage metadata, but returning plain strings remains supported.Example:
Limitations and Next Steps
Current limitation: usage is stored as namespaced custom metadata on
PictureItem.meta.description, because the canonical description annotation data model lives indocling_core.Potential follow-ups:
docling_core.doclingand migrate away from custom metadata storage if desired.usage.Testing
token_extract_keycompatibility alias.Validated locally with:
Documentation
docs/usage/enrichments.mdwith usage metadata documentation.docs/examples/picture_description_api_usage.pyas a runnable and renderable example.Checklist
References