Issue 1: pull docker image on linux server ( No GPU mechine, Just for testing), and run source ~/argos-train-init will be success, but run command: argos-train, That will display: No command .... argos-train,
If run: bin/argos-train, will throw module not found error: ModuleNotFoundError: No module named 'argostrain'
From code (ISO 639): ja
To code (ISO 639): en
From name: Japanese
To name: English
Version: 2.2
cnkdj-jp-addr-ja_en
['\n', '\n', 'Data compiled by [Opus](https://opus.nlpl.eu/).\n', '\n', 'Dictionary data from Wiktionary using [Wiktextract](https://github.com/tatuylonen/wiktextract).\n', '\n', 'Includes pretrained models from [Stanza](https://github.com/stanfordnlp/stanza/).\n', '\n', 'Credits:\n', '\n']
Done splitting data
sentencepiece_trainer.cc(78) LOG(INFO) Starts training with :
trainer_spec {
input: run/split_data/all.txt
input_format:
model_prefix: run/sentencepiece
model_type: UNIGRAM
vocab_size: 50000
self_test_sample_size: 0
character_coverage: 0.9995
input_sentence_size: 1000000
shuffle_input_sentence: 1
seed_sentencepiece_size: 1000000
shrinking_factor: 0.75
max_sentence_length: 4192
num_threads: 16
num_sub_iterations: 2
max_sentencepiece_length: 16
split_by_unicode_script: 1
split_by_number: 1
split_by_whitespace: 1
split_digits: 0
pretokenization_delimiter:
treat_whitespace_as_suffix: 0
allow_whitespace_only_pieces: 0
required_chars:
byte_fallback: 0
vocabulary_output_piece_score: 1
train_extremely_large_corpus: 0
seed_sentencepieces_file:
hard_vocab_limit: 1
use_all_vocab: 0
unk_id: 0
bos_id: 1
eos_id: 2
pad_id: -1
unk_piece: <unk>
bos_piece: <s>
eos_piece: </s>
pad_piece: <pad>
unk_surface: ⁇
enable_differential_privacy: 0
differential_privacy_noise_level: 0
differential_privacy_clipping_threshold: 0
}
normalizer_spec {
name: nmt_nfkc
add_dummy_prefix: 1
remove_extra_whitespaces: 1
escape_whitespaces: 1
normalization_rule_tsv:
}
denormalizer_spec {}
trainer_interface.cc(353) LOG(INFO) SentenceIterator is not specified. Using MultiFileSentenceIterator.
trainer_interface.cc(185) LOG(INFO) Loading corpus: run/split_data/all.txt
trainer_interface.cc(409) LOG(INFO) Loaded all 873 sentences
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: <unk>
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: <s>
trainer_interface.cc(425) LOG(INFO) Adding meta_piece: </s>
trainer_interface.cc(430) LOG(INFO) Normalizing sentences...
trainer_interface.cc(539) LOG(INFO) all chars count=28413
trainer_interface.cc(550) LOG(INFO) Done: 99.9507% characters are covered.
trainer_interface.cc(560) LOG(INFO) Alphabet size=404
trainer_interface.cc(561) LOG(INFO) Final character coverage=0.999507
trainer_interface.cc(592) LOG(INFO) Done! preprocessed 873 sentences.
unigram_model_trainer.cc(265) LOG(INFO) Making suffix array...
unigram_model_trainer.cc(269) LOG(INFO) Extracting frequent sub strings... node_num=14236
unigram_model_trainer.cc(312) LOG(INFO) Initialized 2485 seed sentencepieces
trainer_interface.cc(598) LOG(INFO) Tokenizing input sentences with whitespace: 873
trainer_interface.cc(609) LOG(INFO) Done! 1143
unigram_model_trainer.cc(602) LOG(INFO) Using 1143 sentences for EM training
unigram_model_trainer.cc(618) LOG(INFO) EM sub_iter=0 size=1324 obj=19.6514 num_tokens=4413 num_tokens/piece=3.33308
unigram_model_trainer.cc(618) LOG(INFO) EM sub_iter=1 size=1239 obj=15.5659 num_tokens=4415 num_tokens/piece=3.56336
trainer_interface.cc(687) LOG(INFO) Saving model: run/sentencepiece.model
spm_train_main.cc(282) [_status.ok()] Internal: src/trainer_interface.cc(662) [(trainer_spec_.vocab_size()) == (model_proto->pieces_size())] Vocabulary size too high (50000). Please set it to a value <= 1309.
Program terminated with an unrecoverable error.
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:48: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, dim=0):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:68: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:13: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, target):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:37: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:397: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(self, u, x, bias, init=None, mask_h=None):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:443: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(self, grad_h, grad_last):
Corpus corpus_1's weight should be given. We default it to 1 for you.
Traceback (most recent call last):
File "/home/argosopentech/env/bin/onmt_build_vocab", line 33, in <module>
sys.exit(load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_build_vocab')())
File "/home/argosopentech/OpenNMT-py/onmt/bin/build_vocab.py", line 71, in main
build_vocab_main(opts)
File "/home/argosopentech/OpenNMT-py/onmt/bin/build_vocab.py", line 32, in build_vocab_main
transforms = make_transforms(opts, transforms_cls, fields)
File "/home/argosopentech/OpenNMT-py/onmt/transforms/transform.py", line 235, in make_transforms
transform_obj.warm_up(vocabs)
File "/home/argosopentech/OpenNMT-py/onmt/transforms/tokenize.py", line 147, in warm_up
load_src_model.Load(self.src_subword_model)
File "/home/argosopentech/env/lib/python3.10/site-packages/sentencepiece/__init__.py", line 961, in Load
return self.LoadFromFile(model_file)
File "/home/argosopentech/env/lib/python3.10/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
OSError: Not found: "run/sentencepiece.model": No such file or directory Error #2
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:48: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, dim=0):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_activations.py:68: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:13: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(ctx, input, target):
/home/argosopentech/OpenNMT-py/onmt/modules/sparse_losses.py:37: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(ctx, grad_output):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:397: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
def forward(self, u, x, bias, init=None, mask_h=None):
/home/argosopentech/OpenNMT-py/onmt/models/sru.py:443: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
def backward(self, grad_h, grad_last):
[2024-08-20 06:20:54,713 WARNING] Corpus corpus_1's weight should be given. We default it to 1 for you.
[2024-08-20 06:20:54,714 INFO] Parsed 2 corpora from -data.
Traceback (most recent call last):
File "/home/argosopentech/env/bin/onmt_train", line 33, in <module>
sys.exit(load_entry_point('OpenNMT-py', 'console_scripts', 'onmt_train')())
File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 172, in main
train(opt)
File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 106, in train
checkpoint, fields, transforms_cls = _init_train(opt)
File "/home/argosopentech/OpenNMT-py/onmt/bin/train.py", line 58, in _init_train
ArgumentParser.validate_prepare_opts(opt)
File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 197, in validate_prepare_opts
cls._validate_fields_opts(opt, build_vocab_only=build_vocab_only)
File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 151, in _validate_fields_opts
cls._validate_file(opt.src_vocab, info='src vocab')
File "/home/argosopentech/OpenNMT-py/onmt/utils/parse.py", line 18, in _validate_file
raise IOError(f"Please check path of your {info} file!")
OSError: Please check path of your src vocab file!
Traceback (most recent call last):
File "/home/argosopentech/argos-train/./argos-train", line 18, in <module>
train.train(from_code, to_code, from_name, to_name, version, package_version, argos_version, data_exists)
File "/home/argosopentech/argos-train/argostrain/train.py", line 163, in train
str(opennmt_checkpoints[-2].f),
IndexError: list index out of range
What the list index out of range? I prepare the training data total 2436 rows.
Please help me to resolve the error.
Thanks.
Issue 1: pull docker image on linux server ( No GPU mechine, Just for testing), and run
source ~/argos-train-initwill be success, but run command: argos-train, That will display: No command .... argos-train,If run: bin/argos-train, will throw module not found error: ModuleNotFoundError: No module named 'argostrain'
Issue 2: Copy bin/argos-train script to parent path and run:
./argos-train, will be run and can input arguments, but will error occured:What the list index out of range? I prepare the training data total 2436 rows.
Please help me to resolve the error.
Thanks.