fix: apply additional_context to Documents in the document path of extract() by SuperMarioYL · Pull Request #458 · google/langextract

SuperMarioYL · 2026-04-21T10:16:04Z

Description

When calling extract() with an iterable of Document objects, the top-level
additional_context parameter was silently ignored.

Root cause: extract() has two code paths:

String path (working): wraps the string in Document(additional_context=additional_context), then calls annotate_text().
Document path (broken): called annotate_documents() directly, never forwarding additional_context.

Fix

When a global additional_context is provided alongside a document iterable,
new Document instances are created for any document that has no per-document
context set. Per-document context always takes precedence over the global
value, and original caller objects are not mutated.

Key implementation notes:

The iterable is only materialized when additional_context is not None;
when it is None, the raw iterable is forwarded unchanged to preserve
streaming behaviour for large inputs.
Copied Documents preserve _document_id and _tokenized_text from the
originals to avoid regenerating IDs or discarding cached tokenization.

Fixes #445

How Has This Been Tested?

Added 7 new unit tests to tests/extract_precedence_test.py covering:

Global context applied to documents with no per-document context
Per-document context takes precedence over the global value
None additional_context leaves documents and iterable unchanged
Explicit document IDs are preserved on copied documents
Generator/lazy-iterable inputs work correctly
Original caller Documents are not mutated
Empty-string additional_context is treated as a non-None value

All 525 existing tests pass (excluding pre-existing env-dependent plugin test).

Checklist

My code follows the style guidelines of this project
I have self-reviewed my own code
I have made corresponding changes to the documentation (n/a)
My changes generate no new warnings
I have added tests that prove my fix is effective
New and existing unit tests passed locally with my changes (pytest tests/)
I have run ./autoformat.sh on changed files

google-cla · 2026-04-21T10:16:23Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

github-actions · 2026-04-21T15:08:51Z

⚠️ Branch Update Required

Your branch is 1 commits behind main. Please update your branch to ensure CI checks run with the latest code:

git fetch origin main
git merge origin/main
git push

Note: Enable "Allow edits by maintainers" to allow automatic updates.

…tract() When calling extract() with an iterable of Document objects, the top-level additional_context parameter was silently ignored. The document code path called annotate_documents() directly, while the string path correctly embedded the context in a Document wrapper. The fix intercepts the document iterable when a global additional_context is provided. For each Document without its own additional_context, a new Document is created carrying the global value. Documents that already have per-document context are passed through unchanged, so per-document context always takes precedence. Implementation notes: - The iterable is only materialized into a list when additional_context is not None; the raw iterable is forwarded as-is when no global context is given, preserving the previous behavior for large or streaming inputs. - New Document copies preserve _document_id and _tokenized_text from the original to avoid regenerating IDs prematurely or discarding cached tokenization. - Original caller Document objects are not mutated. Fixes google#445

aksg87 · 2026-04-25T05:50:12Z

Hi @SuperMarioYL, please complete the CLA at https://cla.developers.google.com/ so we can review this PR. Without cla/google green we can't proceed. If we don't see it complete in the next couple of days, we'll close this out and handle the issue separately. Thanks!

github-actions Bot added the size/M Pull request with 150-600 lines changed label Apr 21, 2026

SuperMarioYL force-pushed the fix/additional-context-documents-path branch from aa088da to 6317471 Compare April 22, 2026 21:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: apply additional_context to Documents in the document path of extract()#458

fix: apply additional_context to Documents in the document path of extract()#458
SuperMarioYL wants to merge 1 commit intogoogle:mainfrom
SuperMarioYL:fix/additional-context-documents-path

SuperMarioYL commented Apr 21, 2026

Uh oh!

google-cla Bot commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

aksg87 commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SuperMarioYL commented Apr 21, 2026

Description

Fix

How Has This Been Tested?

Checklist

Uh oh!

google-cla Bot commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

aksg87 commented Apr 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants