Skip to content

feat: Add Grain dataset support for keras.layers.TextVectorization.adapt()#22402

Open
dataCenter430 wants to merge 2 commits intokeras-team:masterfrom
dataCenter430:feat/add-grain-dataset-support-for-TextVectorization
Open

feat: Add Grain dataset support for keras.layers.TextVectorization.adapt()#22402
dataCenter430 wants to merge 2 commits intokeras-team:masterfrom
dataCenter430:feat/add-grain-dataset-support-for-TextVectorization

Conversation

@dataCenter430
Copy link

Summary

Adds support for Grain datasets in keras.layers.TextVectorization.adapt(), so users can adapt the layer on grain.MapDataset, grain.IterDataset, or grain.DataLoader without converting to tf.data.Dataset.

Closes #22378.

Changes

  • adapt()
    • When data is a Grain dataset (grain.MapDataset, grain.IterDataset, or grain.DataLoader), it is converted to a tf.data.Dataset via GrainDatasetAdapter.get_tf_dataset() and then iterated as in the existing tf.data path.
    • steps is supported for Grain datasets (same semantics as for tf.data).
  • Batch handling
    • Introduced _extract_adapt_batch(batch) so that when a batch is (x,), (x, y), or (x, y, sample_weight), only the text input x is passed to update_state().
    • This helper is used for both tf.data and Grain branches so adapt() works with training-style batches.
  • Docs
    • Docstring of adapt() updated to mention Grain datasets and that batches may be a single tensor or a tuple (only text is used).
  • Tests
    • test_adapt_with_grain_dataset: adapt on a Grain MapDataset and assert vocabulary.
    • test_adapt_with_grain_dataset_and_steps: adapt with steps=2 and assert only first batches are used.
    • Both tests use pytest.importorskip("grain") so they are skipped when Grain is not installed.

@gemini-code-assist
Copy link
Contributor

Warning

Gemini encountered an error creating the summary. You can try again by commenting /gemini summary.

@codecov-commenter
Copy link

codecov-commenter commented Mar 12, 2026

Codecov Report

❌ Patch coverage is 30.76923% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.13%. Comparing base (d9966a5) to head (9492201).
⚠️ Report is 24 commits behind head on master.

Files with missing lines Patch % Lines
...ras/src/layers/preprocessing/text_vectorization.py 30.76% 7 Missing and 2 partials ⚠️

❗ There is a different number of reports uploaded between BASE (d9966a5) and HEAD (9492201). Click for more details.

HEAD has 2 uploads less than BASE
Flag BASE (d9966a5) HEAD (9492201)
keras 5 4
keras-tensorflow 1 0
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #22402      +/-   ##
==========================================
- Coverage   82.96%   77.13%   -5.83%     
==========================================
  Files         596      596              
  Lines       66269    66696     +427     
  Branches    10321    10386      +65     
==========================================
- Hits        54982    51449    -3533     
- Misses       8667    12575    +3908     
- Partials     2620     2672      +52     
Flag Coverage Δ
keras 77.01% <30.76%> (-5.79%) ⬇️
keras-jax 60.55% <30.76%> (-0.29%) ⬇️
keras-numpy 54.78% <30.76%> (-0.26%) ⬇️
keras-openvino 49.91% <15.38%> (+0.85%) ⬆️
keras-tensorflow ?
keras-torch 60.62% <30.76%> (-0.26%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@dataCenter430 dataCenter430 force-pushed the feat/add-grain-dataset-support-for-TextVectorization branch from 7dc5962 to 9492201 Compare March 12, 2026 06:21
@dataCenter430
Copy link
Author

Hi, @gbaned I hope you're doing well.
I'm a big fan of Keras and I'm going to contrbute here actively.
Could you pls review my first PR when you have a chance? 🙏 Each PR is so important for me to contribute to Keras.
Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Add Grain support for keras.layers.TextVectorization

3 participants