Skip to content

translate error: IndexError: index out of range in self #3607

Open
@vkbbkvvkb

Description

@vkbbkvvkb

I encountered an error while using the translate method.

D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
  warnings.warn(
Traceback (most recent call last):
  File "D:\Jandar\project\unstructured\unstructured\cleaners\translate.py", line 81, in _translate_text
    translated = model.generate(
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\generation\utils.py", line 1745, in generate
    model_kwargs = self._prepare_encoder_decoder_kwargs_for_generation(
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\generation\utils.py", line 549, in _prepare_encoder_decoder_kwargs_for_generation
    model_kwargs["encoder_outputs"]: ModelOutput = encoder(**encoder_kwargs)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\models\marian\modeling_marian.py", line 728, in forward
    embed_pos = self.embed_positions(input_shape)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\utils\_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\transformers\models\marian\modeling_marian.py", line 101, in forward
    return super().forward(positions)
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\modules\sparse.py", line 164, in forward
    return F.embedding(
  File "D:\soft\Anaconda3.11\envs\unstructured\lib\site-packages\torch\nn\functional.py", line 2267, in embedding
    return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
IndexError: index out of range in self

Then I found that when looping through the chunks, the parameter passed into the translation method was actually text. Why is that? Shouldn't chunk be used as the parameter instead?"

translated_chunks.append(_translate_text(text, model, tokenizer))

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions