feat: Add Grain dataset support for keras.layers.TextVectorization.adapt()#22402
Conversation
|
Warning Gemini encountered an error creating the summary. You can try again by commenting |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #22402 +/- ##
==========================================
- Coverage 82.96% 77.13% -5.83%
==========================================
Files 596 596
Lines 66269 66696 +427
Branches 10321 10386 +65
==========================================
- Hits 54982 51449 -3533
- Misses 8667 12575 +3908
- Partials 2620 2672 +52
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
7dc5962 to
9492201
Compare
|
Hi, @gbaned I hope you're doing well. |
Summary
Adds support for Grain datasets in
keras.layers.TextVectorization.adapt(), so users can adapt the layer ongrain.MapDataset,grain.IterDataset, orgrain.DataLoaderwithout converting totf.data.Dataset.Closes #22378.
Changes
adapt()datais a Grain dataset (grain.MapDataset,grain.IterDataset, orgrain.DataLoader), it is converted to atf.data.DatasetviaGrainDatasetAdapter.get_tf_dataset()and then iterated as in the existingtf.datapath.stepsis supported for Grain datasets (same semantics as fortf.data)._extract_adapt_batch(batch)so that when a batch is(x,),(x, y), or(x, y, sample_weight), only the text inputxis passed toupdate_state().tf.dataand Grain branches soadapt()works with training-style batches.adapt()updated to mention Grain datasets and that batches may be a single tensor or a tuple (only text is used).test_adapt_with_grain_dataset: adapt on a GrainMapDatasetand assert vocabulary.test_adapt_with_grain_dataset_and_steps: adapt withsteps=2and assert only first batches are used.pytest.importorskip("grain")so they are skipped when Grain is not installed.