Skip to content

Running process stopped at “compiling cuda operations” #7

@sudanl

Description

@sudanl

Hello! I successfully run the code. However, when the running process reaches this step, it stops and does not continue without any error. Do you have any advice or opinion about this problem?

2022-10-18 17:05:27 | INFO | fairseq.utils | ***********************CUDA enviroments for all 4 workers***********************
2022-10-18 17:05:27 | INFO | fairseq_cli.train | training on 4 devices (GPUs/TPUs)
2022-10-18 17:05:27 | INFO | fairseq_cli.train | max tokens per device = 2048 and max sentences per device = None
2022-10-18 17:05:27 | INFO | fairseq.trainer | Preparing to load checkpoint ./model/checkpoint_last.pt
2022-10-18 17:05:27 | INFO | fairseq.trainer | No existing checkpoint found ./model/checkpoint_last.pt
2022-10-18 17:05:27 | INFO | fairseq.trainer | loading train data for epoch 1
2022-10-18 17:05:28 | INFO | fairseq.data.data_utils | loaded 4,500,966 examples from: ./bin_data/WMT16/train.en-de.en
2022-10-18 17:05:28 | INFO | fairseq.data.data_utils | loaded 4,500,966 examples from: ./bin_data/WMT16/train.en-de.de
2022-10-18 17:05:28 | INFO | fairseq.tasks.translation | ./bin_data/WMT16 train en-de 4500966 examples
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:34 | WARNING | fairseq.tasks.fairseq_task | 1,391 samples have invalid sizes and will be skipped, max_positions=(128, 1024), first few sample ids=[3749843, 2629309, 3912533, 2428533, 3659653, 4231852, 3663212, 2382171, 3373663, 4175821]
2022-10-18 17:05:35 | INFO | fairseq.data.iterators | grouped total_num_itrs = 1278
2022-10-18 17:05:35 | INFO | fairseq.trainer | begin training epoch 1
2022-10-18 17:05:35 | INFO | fairseq_cli.train | Start iterating over samples
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)
Start compiling cuda operations for DA-Transformer...(It usually takes a few minutes for the first time running.)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions