Open
Description
I encountered an error while using the translate
method.
D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Traceback (most recent call last):
File "D:\Jandar\project\unstructured\unstructured\cleaners\translate.py", line 81, in _translate_text
translated = model.generate(
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\generation\utils.py", line 1745, in generate
model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\generation\utils.py", line 549, in _prepare_encoder_decoder_kwargs_for_generation
model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\models\marian\modeling_marian.py", line 728, in forward
embed_pos = self.embed_positions(input_shape)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\models\marian\modeling_marian.py", line 101, in forward
return super().forward(positions)
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\sparse.py", line 164, in forward
return F.embedding(
File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\functional.py", line 2267, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self
Then I found that when looping through the chunks, the parameter passed into the translation method was actually text
. Why is that? Shouldn't chunk
be used as the parameter instead?"