Skip to content

Training on Wikidata (huge dataset) using OpenKE #406

@unmeshvrije

Description

@unmeshvrije

As my research project, I am trying to use OpenKE for loading Wikidata truthy NT file

However, when I reach the following step in openke/config/Trainer.py file

        if self.use_gpu:
            self.model.cuda()

I get the CUDA out of memory error.

  File "/home/myname/OpenKE/openke/config/Trainer.py", line 58, in run
    self.model.cuda()
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 749, in cuda
    return self._apply(lambda t: t.cuda(device))
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/var/scratch/anaconda3/envs/py3.7/lib/python3.7/site-packages/torch/nn/modules/module.py", line 749, in <lambda>
    return self._apply(lambda t: t.cuda(device))
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 1614.71 GiB (GPU 0; 47.54 GiB total capacity; 0 bytes already allocated; 47.17 GiB free; 0 bytes reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My question is: Has someone tried training embeddings for huge datasets like Wikidata ?
Any pointers would be appreciated

For the above dataset, there were
7794277662 Number of Triples (7 billion)
6235422129 (80% training triples = 6 billion)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions