Con't use docker image to train

Issue 1: pull docker image on linux server ( No GPU mechine, Just for testing), and run `source ~/argos-train-init` will be success, but run command: argos-train, That will display: No command .... argos-train, 
If run: bin/argos-train, will throw module not found error: ModuleNotFoundError: No module named 'argostrain'

Issue 2: Copy bin/argos-train script to parent path and run: `./argos-train`, will be run and can input arguments, but will error occured: 

```
From code (ISO 639): ja
To code (ISO 639): en
From name: Japanese
To name: English
Version: 2.2
cnkdj-jp-addr-ja_en
['\n', '\n', 'Data compiled by [Opus](https://opus.nlpl.eu/).\n', '\n', 'Dictionary data from Wiktionary using [Wiktextract](https://github.com/tatuylonen/wiktextract).\n', '\n', 'Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).\n', '\n', 'Credits:\n', '\n']
Done splitting data
sentencepiece_trainer.cc(78) LOG(INFO) Starts training with :
trainer_spec {
  input: run/split_data/all.txt
  input_format:
  model_prefix: run/sentencepiece
  model_type: UNIGRAM
  vocab_size: 50000
  self_test_sample_size: 0
  character_coverage: 0.9995
  input_sentence_size: 1000000
  shuffle_input_sentence: 1
  seed_sentencepiece_size: 1000000
  shrinking_factor: 0.75
  max_sentence_length: 4192
  num_threads: 16
  num_sub_iterations: 2
  max_sentencepiece_length: 16
  split_by_unicode_script: 1
  split_by_number: 1
  split_by_whitespace: 1
  split_digits: 0
  pretokenization_delimiter:
  treat_whitespace_as_suffix: 0
  allow_whitespace_only_pieces: 0
  required_chars:
  byte_fallback: 0
  vocabulary_output_piece_score: 1
  train_extremely_large_corpus: 0
  seed_sentencepieces_file:
  hard_vocab_limit: 1
  use_all_vocab: 0
  unk_id: 0
  bos_id: 1
  eos_id: 2
  pad_id: -1
  unk_piece: <unk>
  bos_piece: <s>
  eos_piece: </s>
  pad_piece: <pad>
  unk_surface:  ⁇
  enable_differential_privacy: 0
  differential_privacy_noise_level: 0
  differential_privacy_clipping_threshold: 0
}
normalizer_spec {
  name: nmt_nfkc
  add_dummy_prefix: 1
  remove_extra_whitespaces: 1
  escape_whitespaces: 1
  normalization_rule_tsv:
}
denormalizer_spec {}
trainer_interface.cc(353) LOG(INFO) SentenceIterator is not specified. Using MultiFileSentenceIterator.
trainer_interface.cc(185) LOG(INFO) Loading corpus: run/split_data/all.txt
trainer_interface.cc(409) LOG(INFO) Loaded all 873 sentences
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: <unk>
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: <s>
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: </s>
trainer_interface.cc(430) LOG(INFO) Normalizing sentences...
trainer_interface.cc(539) LOG(INFO) all chars count=28413
trainer_interface.cc(550) LOG(INFO) Done: 99.9507% characters are covered.
trainer_interface.cc(560) LOG(INFO) Alphabet size=404
trainer_interface.cc(561) LOG(INFO) Final character coverage=0.999507
trainer_interface.cc(592) LOG(INFO) Done! preprocessed 873 sentences.
unigram_model_trainer.cc(265) LOG(INFO) Making suffix array...
unigram_model_trainer.cc(269) LOG(INFO) Extracting frequent sub strings... node_num=14236
unigram_model_trainer.cc(312) LOG(INFO) Initialized 2485 seed sentencepieces
trainer_interface.cc(598) LOG(INFO) Tokenizing input sentences with whitespace: 873
trainer_interface.cc(609) LOG(INFO) Done! 1143
unigram_model_trainer.cc(602) LOG(INFO) Using 1143 sentences for EM training
unigram_model_trainer.cc(618) LOG(INFO) EM sub_iter=0 size=1324 obj=19.6514 num_tokens=4413 num_tokens/piece=3.33308
unigram_model_trainer.cc(618) LOG(INFO) EM sub_iter=1 size=1239 obj=15.5659 num_tokens=4415 num_tokens/piece=3.56336
trainer_interface.cc(687) LOG(INFO) Saving model: run/sentencepiece.model
spm_train_main.cc(282) [_status.ok()] Internal: src/trainer_interface.cc(662) [(trainer_spec_.vocab_size()) == (model_proto->pieces_size())] Vocabulary size too high (50000). Please set it to a value <= 1309.
Program terminated with an unrecoverable error.
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:48: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, dim=0):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:68: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:13: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, target):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:37: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:397: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(self, u, x, bias, init=None, mask_h=None):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:443: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(self, grad_h, grad_last):
Corpus corpus_1's weight should be given. We default it to 1 for you.
Traceback (most recent call last):
  File "/home/argosopentech/env/bin/onmt_build_vocab", line 33, in <module>
    sys.exit(load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_build_vocab')())
  File "/home/argosopentech/OpenNMT-py/onmt/bin/build_vocab.py", line 71, in main
    build_vocab_main(opts)
  File "/home/argosopentech/OpenNMT-py/onmt/bin/build_vocab.py", line 32, in build_vocab_main
    transforms = make_transforms(opts, transforms_cls, fields)
  File "/home/argosopentech/OpenNMT-py/onmt/transforms/transform.py", line 235, in make_transforms
    transform_obj.warm_up(vocabs)
  File "/home/argosopentech/OpenNMT-py/onmt/transforms/tokenize.py", line 147, in warm_up
    load_src_model.Load(self.src_subword_model)
  File "/home/argosopentech/env/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
    return self.LoadFromFile(model_file)
  File "/home/argosopentech/env/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
OSError: Not found: "run/sentencepiece.model": No such file or directory Error #2
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:48: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, dim=0):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:68: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:13: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(ctx, input, target):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:37: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:397: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  def forward(self, u, x, bias, init=None, mask_h=None):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:443: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  def backward(self, grad_h, grad_last):
[2024-08-20 06:20:54,713 WARNING] Corpus corpus_1's weight should be given. We default it to 1 for you.
[2024-08-20 06:20:54,714 INFO] Parsed 2 corpora from -data.
Traceback (most recent call last):
  File "/home/argosopentech/env/bin/onmt_train", line 33, in <module>
    sys.exit(load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_train')())
  File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 172, in main
    train(opt)
  File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 106, in train
    checkpoint, fields, transforms_cls = _init_train(opt)
  File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 58, in _init_train
    ArgumentParser.validate_prepare_opts(opt)
  File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 197, in validate_prepare_opts
    cls._validate_fields_opts(opt, build_vocab_only=build_vocab_only)
  File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 151, in _validate_fields_opts
    cls._validate_file(opt.src_vocab, info='src vocab')
  File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 18, in _validate_file
    raise IOError(f"Please check path of your {info} file!")
OSError: Please check path of your src vocab file!
Traceback (most recent call last):
  File "/home/argosopentech/argos-train/./argos-train", line 18, in <module>
    train.train(from_code, to_code, from_name, to_name, version, package_version, argos_version, data_exists)
  File "/home/argosopentech/argos-train/argostrain/train.py", line 163, in train
    str(opennmt_checkpoints[-2].f),
IndexError: list index out of range
```

What the list index out of range? I prepare the training data total 2436 rows.
Please help me to resolve the error.
Thanks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Con't use docker image to train #38

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Con't use docker image to train #38

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions