Skip to content
This repository was archived by the owner on Oct 31, 2023. It is now read-only.
This repository was archived by the owner on Oct 31, 2023. It is now read-only.

Process get killed #9

@zqx1609

Description

@zqx1609

I run this model on the ibp dataset and it gets killed when loading the training data.My GPU is 2070s and environment is pytorch1.3 devel. Does anybody know how to deal with the problem?
My command:
python main.py --exp_name first_train --fp16 true --amp 2 --tasks "prim_ibp" --reload_data "prim_ibp,prim_ibp.train,prim_ibp.valid,prim_ibp.test" --reload_size 40000000 --emb_dim 1024 --n_enc_layers 6 --n_dec_layers 6 --n_heads 8 --optimizer "adam,lr=0.0001" --batch_size 32 --epoch_size 300000 --validation_metrics valid_prim_fwd_acc
Reaction:INFO - 07/02/20 06:56:43 - 0:00:00 - The experiment will be stored in ./dumped/first_train/dgv5zq039m

INFO - 07/02/20 06:56:43 - 0:00:00 - Running command: python main.py --exp_name first_train --fp16 true --amp 2 --tasks prim_ibp --reload_data 'prim_ibp,prim_ibp.train,prim_ibp.valid,prim_ibp.test' --reload_size '-1' --emb_dim 128 --n_enc_layers 1 --n_dec_layers 1 --n_heads 1 --optimizer 'adam,lr=0.0001' --batch_size 8 --epoch_size 300000 --validation_metrics valid_prim_fwd_acc

WARNING - 07/02/20 06:56:43 - 0:00:00 - Signal handler installed.
INFO - 07/02/20 06:56:43 - 0:00:00 - Unary operators: []
INFO - 07/02/20 06:56:43 - 0:00:00 - Binary operators: ['add', 'sub']
INFO - 07/02/20 06:56:43 - 0:00:00 - words: {'': 0, '': 1, '': 2, '(': 3, ')': 4, '<SPECIAL_5>': 5, '<SPECIAL_6>': 6, '<SPECIAL_7>': 7, '<SPECIAL_8>': 8, '<SPECIAL_9>': 9, 'pi': 10, 'E': 11, 'x': 12, 'y': 13, 'z': 14, 't': 15, 'a0': 16, 'a1': 17, 'a2': 18, 'a3': 19, 'a4': 20, 'a5': 21, 'a6': 22, 'a7': 23, 'a8': 24, 'a9': 25, 'abs': 26, 'acos': 27, 'acosh': 28, 'acot': 29, 'acoth': 30, 'acsc': 31, 'acsch': 32, 'add': 33, 'asec': 34, 'asech': 35, 'asin': 36, 'asinh': 37, 'atan': 38, 'atanh': 39, 'cos': 40, 'cosh': 41, 'cot': 42, 'coth': 43, 'csc': 44, 'csch': 45, 'derivative': 46, 'div': 47, 'exp': 48, 'f': 49, 'g': 50, 'h': 51, 'inv': 52, 'ln': 53, 'mul': 54, 'pow': 55, 'pow2': 56, 'pow3': 57, 'pow4': 58, 'pow5': 59, 'rac': 60, 'sec': 61, 'sech': 62, 'sign': 63, 'sin': 64, 'sinh': 65, 'sqrt': 66, 'sub': 67, 'tan': 68, 'tanh': 69, 'I': 70, 'INT+': 71, 'INT-': 72, 'INT': 73, 'FLOAT': 74, '-': 75, '.': 76, '10^': 77, 'Y': 78, "Y'": 79, "Y''": 80, '0': 81, '1': 82, '2': 83, '3': 84, '4': 85, '5': 86, '6': 87, '7': 88, '8': 89, '9': 90}
INFO - 07/02/20 06:56:43 - 0:00:00 - 20001 possible leaves.
INFO - 07/02/20 06:56:43 - 0:00:00 - Checking expressions in [0.01, 0.1, 0.3, 0.5, 0.7, 0.9, 1.1, 2.1, 3.1, -0.01, -0.1, -0.3, -0.5, -0.7, -0.9, -1.1, -2.1, -3.1]
INFO - 07/02/20 06:56:43 - 0:00:00 - Training tasks: prim_ibp
INFO - 07/02/20 06:56:43 - 0:00:00 - Number of parameters (encoder): 734464
INFO - 07/02/20 06:56:43 - 0:00:00 - Number of parameters (decoder): 800859
INFO - 07/02/20 06:56:47 - 0:00:03 - Found 51 parameters in model.
INFO - 07/02/20 06:56:47 - 0:00:03 - Optimizers: model
Selected optimization level O2: FP16 training with FP32 batchnorm and FP32 master weights.

Defaults for this optimization level are:
enabled : True
opt_level : O2
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : True
master_weights : True
loss_scale : dynamic
Processing user overrides (additional kwargs that are not None)...
After processing overrides, optimization options are:
enabled : True
opt_level : O2
cast_model_type : torch.float16
patch_torch_functions : False
keep_batchnorm_fp32 : True
master_weights : True
loss_scale : dynamic
INFO - 07/02/20 06:56:47 - 0:00:03 - Creating train iterator for prim_ibp ...
INFO - 07/02/20 06:56:47 - 0:00:03 - Loading data from prim_ibp.train ...
Killed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions