Skip to content

Commit 34066f5

Browse files
authored
fix t5x tests (#1208)
T5X tests are failing because the vocabulary specified in the dummy dataset does not match the vocabulary used by the model: https://github.com/google-research/t5x/blob/5f03619b0c5ebb44ae6adde1a2d8eea1a4b55fe0/t5x/examples/t5/t5_1_1/base.gin#L20-L21. This PR updates the dataset vocab to make it match the model's. --------- Signed-off-by: ashors1 <[email protected]>
1 parent 26451c0 commit 34066f5

File tree

2 files changed

+3
-2
lines changed

2 files changed

+3
-2
lines changed

.github/container/manifest.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ t5x:
3131
mirror/patch/partial-checkpoint-restore: file://patches/t5x/mirror-patch-partial-checkpoint-restore.patch # pull/1392/head # https://github.com/google-research/t5x/pull/1392: Add support for partial checkpoint restore
3232
mirror/patch/dali-support: file://patches/t5x/mirror-patch-dali-support.patch # pull/1393/head # https://github.com/google-research/t5x/pull/1393: Adds DALI support to t5x
3333
mirror/patch/t5x_te_in_contrib_noindent: file://patches/t5x/mirror-patch-t5x_te_in_contrib_noindent.patch # pull/1391/head # https://github.com/google-research/t5x/pull/1391: Adds transformer engine support and GPU optimizations to T5x (enables H100)
34+
mirror/patch/fix-default-vocab: file://patches/t5x/mirror-patch-fix-default-vocab.patch # pull/1609/head # https://github.com/google-research/t5x/pull/1609: Fixes seqio vocab mismatch
3435
paxml:
3536
url: https://github.com/google/paxml.git
3637
mirror_url: https://github.com/nvjax-svc-0/paxml.git

.github/container/test-t5x.sh

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -175,10 +175,10 @@ seqio.TaskRegistry.add(
175175
],
176176
output_features=dict(
177177
inputs=seqio.Feature(
178-
vocabulary=t5.data.get_default_vocabulary(), add_eos=True, required=False
178+
vocabulary=seqio.SentencePieceVocabulary(sentencepiece_model_file="gs://t5-data/vocabs/cc_all.32000.100extra/sentencepiece.model"), add_eos=True, required=False
179179
),
180180
targets=seqio.Feature(
181-
vocabulary=t5.data.get_default_vocabulary(), add_eos=True
181+
vocabulary=seqio.SentencePieceVocabulary(sentencepiece_model_file="gs://t5-data/vocabs/cc_all.32000.100extra/sentencepiece.model"), add_eos=True
182182
)
183183
),
184184
metric_fns=[]

0 commit comments

Comments
 (0)